Sunday, 15 November 2009

Indexes cost money

All this industry talk about cloud computing is, more than ever, placing under the spotlight the true cost of IT infrastructure. Combined with the current economic climate, this is sparkling an interesting and refreshing debate on what is the actual cost of supporting and growing an IT infrastructure. And more and more, a realistic cost breakdown of how much each piece of the infrastructure is emerging.

And one of the surprises of this exercise is the exploding costs of mass storage.  Data centre storage is under an increasing pressure. Data Warehouses keep storing more and more data points and history. ERPs continue to take on more business processes. Web front ends generate incredible amounts of detailed user activity information. Graphics and multimedia file formats are taxing file servers and document repositories at the far end of their capacity. Mail is now an accepted medium for conducting business, and mailbox size is increasing without an apparent limit.

On top of that, legal regulations demand that most everything has to be kept during five, seven, or in extreme cases I've seen, fourteen years.

Technology is also evolving to catch up with all those demands. From SAN/NAS storage, tiered data management, storage outsourcing, you name it. But, except for some notable examples (Amazon EBS, Google) the business IT environment at large is failing to meet that demand.

This is even more difficult to explain to the non-IT executive levels, since at the consumer level storage, there is an expanding and ever increasing amount of USB and firewire devices offering amazing capacities at prices that are getting lower and lower. Not to mention that the average desktop PC now comes with a 1GB hard disk as the standard size.

Usually, the point that non-IT business folks fail to realize is that it is cheap and easy to increase the raw amount of storage available.  Not so cheap is to back it up daily, keep that archive during five years and make sure that is recoverable.

My usual response in the typical elevator conversation about storage cost with business types (yes, those conversations actually happen) is to answer, "yes, you can purchase a couple of TB for a few bucks. Now, tell me what you do if those disks fail. Will you be making back ups? Will you be able to recover them in five years? Will you be able to provide a replacement in 4 hours if they fail?" It's not that they are trying to outsmart you, it's simply that all those dimensions of the support infrastructure costs are simply invisible for them.

Oh, I hear you cry, is this one of these posts of late that do not have anything to do with database tuning? No, this time my point is actually related to DB tuning. In fact, all of the above is essential to introduce the topic. The topic is "just create an index and this will be faster"

This is so wrong, and I think that I've blogged it before, that deserves to be refuted in all its possible dimensions. First, yes, there are situations where it actually makes sense to create an index.

But an index is not free. First, an extra index in your database will penalize inserts and updates on the affected table. Second, the index takes storage space, that expensive and scarce resource. Third, sometimes there are better alternatives to adding an index. Two of the typical are:

- Caching values read from the database, instead of querying for them each time they are needed. This has to do with a tendency during maintenance to prefer always adding new code rather than change existing code. The more difficult to maintain the code is, the higher the chance that maintainers simply do not want to touch it. This is safer for them, but also imposes a performance overhead as it cannot take advantage of work already done by the existing code.

- Converting procedural code to SQL data set code. Query execution plans favor working with whole tables and result sets over slowly iterating over them.

So, next time someone proposes an index on that pesky 4TB table, ask, are there any other alternatives? Tuning exercises should not provide a single answer to a problem, but a balanced cost/benefit analysis of different alternatives. It's not enough to answer "just create an index", you have to say "creating an index is the best option as opposed to...."  That puts the conversation with the paying customer in the right level.

No comments:

Post a Comment