Tuesday, 20 May 2008

Database tuning myths

Businesses are always looking for predictability. From financial plans that span three years to operational plans for next year, to five year strategic plans, there is a good amount of time and resources at any business devoted just to predict what the future is going to look like. Entire business functions are needed to create the plans and track its evolution. And the stock market regularly punishes companies that do not meet their financial plans. Ironically, they do it only the ones were the objectives are not met. Someday they should look at some medical disciplines and discover why growing faster than you should is not always good for your health, but that's another story.

It's not surprising that all those processes create a culture of predictability, where everything must be planned in advance and coordinated. Over time, and by repetition, those processes become so embedded in the company culture that people forget why things are done in a particular way. This has many benefits, as the process of planning itself becomes predictable. But it also transpire to other areas of the company, and everything that is either important or big enough ends up having a detailed process that everyone has to follow. Detailed and predictable outcomes reinforce the hearts and souls of the financial people because that full fills one of the basic tenets of capitalism: trust in the future.

Unfortunately, this model does not scale well if you want to keep an eye in all the details, even at the business process level. At some point, you have to forget about the details of how something is manufactured, or how the accounting is done. That gives you abstractions, and those are an essential tool for scaling up the complexity of any model, because without them you would get lost in the details very quickly.

Out of those abstractions, things that are know as "rules of thumb", or more corporate/engineering "best practices" emerge over time. Those are supposed to be generic rules or techniques that can be applied to any problem and give satisfactory, predictable results.

But as many are discovering, abstractions are not perfect. By their very nature, they ignore the details and focus on a few areas of interest to make them reasonably simple to use, its usefulness given precisely because they ignore the details.

If you've kept reading up to now (thank you very much), you may be wondering how does all this relates to database performance tuning. Well, this is one of the places where I see daily rules of thumb applied, often with disastrous results. Computer systems have evolved from the centralized mainframe to the distributed storage, processing and client access environment of today. Where in the past you had a machine and storage system with a thin operating system layer on top now you have to deal with anoperating system that attempts to coordinate a wide range of things connected to the system, plus a network stack, a database persistence layer, a business process model layer, a presentation layer and a client (thin or thick) layer.

What were in the past valid "rules of thumb" in one environment cannot simply be applied to a different, more complex one. Well, in fact, some of them were never applicable in any environment at all. Here's my "best practices hell" top three:

1- We just need to use fancy new hardware

The number one, because there is natural tendency to associate something physical with the performance of the system. And that's the box sitting in the data center. Just as you can get a faster car, why not just get a faster computer?

The answer is, yes as long as you make sure that you actually need it. Adding a faster CPU for a system that is disk bound is not going to improve its performance. Putting a faster disk to a machine that runs processes that eat up all its RAM will only make it swap faster. Adding more memory to a system that spends most of its time sending data across the network does not improve its performance.

In short, you need to diagnose before healing. You need to know your system and you need to know your application and how it's using the system. Without that knowledge, you are just guessing, and spending a lot of money at it. And I've watched this happening more than a few times. It's very frustrating to spend money and resources in an expensive hardware upgrade only to find that the results are unnoticeable.

The reason behind that is that you've reached some hard limit and that your whole system cannot be easily improved by throwing more hardware at it. Then it's time to examine your application and move its performance boundaries to other components. This sounds complicated, and it sometimes is. But often it means checking the application and making sure it's using the database resources with measure. The classic case is the 10.000 item grid that is completely filled up on applicationstart up . The network round trips necessary for this are huge and the application changes usually minor. The irony is that after this exercise, you'll probably discover that you actually did not need any new hardware at all.

But the myth stays firmly planted in the industry because often a hardware replacement means performance increases of all its components. Therefore, whatever the performance boundaries, the system has better performance after the hardware upgrade.

2- We just need to create a few indexes

This is one of the most popular, yet dangerous myths. Come on, let's finish this discussion taking the myth to the extreme. Go ahead, index everything in your database. And I mean every single column and every different combination of columns in each table. Now try to do a few queries. Wow, some of them are probably faster. Now try to do any INSERT or UPDATE on it and watch the extreme performance degradation. Convinced?

The explanation for this is that an index is just a table with a special organization that allows you to find things quickly in it. Yes, it's a small table because it contains only the indexed columns, but it's a table nonetheless. Each time you add an index you're creating one of those tables and increasing the overhead of keeping all of them up to date, not to mention the concurrency issues that your database has to deal with each time it's updating a table and it's indexes. Oh yes, I hear you saying, ha!, cluster indexes do not have this problem! Of course, but the problem with cluster indexes is that you cannot have more than one for each table, so it's very unlikely that you're going to cover all cases with them.

Remember, performance tuning is a trade off. It's about using available resources where they are more useful. And this is clearly not a way to profit from your system.

This is not to mean that indexes are not useful. Yes, they are. But anyone that is applying rules of thumb is sooner or later going to fall into this trap. Remember, you've been warned.

3- We did this the last time and it improved performance

As explained in the introduction, "rules of thumb" do not account for complexity. Face it, the systems you're dealing with are very, very complex. the performance of a system is dictated as much by its external environment and constraints as by its own internal characteristics.

So, unless the previous one was identical to the new one, this assumption is likely going to be false. In particular, this myth is fimrly implanted in single product specialists and consultants and is very visible when they move to something different. Remember, reject any changes that do not come with good background. Just saying "because I've a book that says so", "because the vendor recommends it" or "because in my last project at customer xxxx it worked well" are not valid reasons per se.

No comments:

Post a Comment