Friday, 18 April 2008

Tuning tools: beware

Older readers may remember the MSDOS extended and expanded memory standards. Back in the pre Windows 95 era, Intel machines were limited in the memory address space they could use. Not only because of the hardware, that until the 386 processor was limited to 16MB (yes, MB, not GB!) of RAM, but also limited by the MS-DOS compatibility constraints. There's a famous quote misattributed to Bill Gates saying that nobody would ever need more than 640K memory. This was, obviously, false. And the industry was busily developing solutions for the compatibility limits imposed by MS-DOS initial design. A number of software makers, from Quarterdeck to even Microsoft, started to release memory tuning utilities that could, in theory, maximize the usage of that precious 640K low memory region.

And they mostly worked. You would not even needed knowledge of what low and high memory was. Just running them gave you a system that was close to optimal in their low memory usage. And you could try to boast about your low memory size with your friends, only to find that they got exactly the same results as you. Trying them was very easy. You just ran the optimizer and accepted its suggestions. The you rebooted. After that, either your system hung or worked. In the first case, you booted with a diskette and undo the changes. In the second case, you checked low available memory. If you had a better value than before, then bingo.

Databases are no exception to the configuration game. Correct settings can give you a 10 times performance improvement over inadequate settings. And the cost of the change itself is close to zero. Just adjust some registry setting or configuration file, restart and it's done. Knowing the right value, however, is not that easy.

As databases continue to have more sophisticated features, the complexity of correctly managing them is increasing accordingly. What was in the past a mere job of checking a few memory parameters has become for the most part in a very delicate set of adjustments that balance the application behaviour and the machine resources and configuration. Database vendors recognized long ago that for their products to become a commodity they needed to somehow allow the average user to enjoy the product features without requiring from them a four month DBA course. Their solution was two fold.

Sensible defaults

First, experience derived a set of sensible default values depending on the intended usage (such as "OLAP" versus "OLTP") that allowed the product to work by default at least reasonably well. In my experience that is the part that has been most beneficial for performance. Having some sensible formulas, such as "allocate 80% of RAM to disk cache" mostly work, but those formulas are not designed to cater any special system or application needs and/or behaviour. A couple of my favourite examples are:

- A complex query that took more time in the parse and analysis stage than on data retrieval itself. The database did not have enough memory to cache the parsing and execution plan stages after a few more queries executed. The symptoms were easy to spot. A customer had a query that, when issued multiple times, was executing very quickly after the first time. However, as time passed on, execution times was equal to the time of the first query. Simply, the cached parse stage and execution plan were being recalculated too often.
- A single machine was hosting the database server and the application server. The database buffers were simply eating up available physical memory and the application server was also claiming its share of the RAM. The symptoms? The database system performed very well before starting the application server.

It's very unlikely that default values will ever be able to cover all the possible usage scenarios. Even if they could, they cannot control resource allocation of other components in the machine. That is, if your environment matches what the developers had in mind when defining the "defaults", those settings are not going to be too bad. And they usually provide a good starting point or baseline to measure against further changes you make.

Automated tools

Second, they started to release the database version of the MS-DOS memory optimizer. Software that looks at database response times and recommend changes based on that. There are many variations of those, from simple query analyzers to big brother-style services that collect the logs and watch activity in real time. These automated tools usually come as part of bigger packages and are intended to help the DBA keep a database system healthy and responsive.

These tools are very good at taking a lot of information and present nicely reported results that give an overview of system performance. The best ones even suggest improvements based on the information they collect. And therein lies their biggest danger. Much like the MSDOS memory optimizers, following their suggestion does not always improve performance. Unlike the memory optimizers, those suggestions when implemented can mean serious business disruptions. Those cannot easily be fixed by booting with a diskette and undoing the changes.

My favorite example are the query analyzers. One of the most heated discussions I had with a customer was around index creation for a medium size table. They had a long running SQL query over medium size tables and the advice from the tool was to create some indexes. However, without taking into account index selectivity, half of them were essentially not worth having.

This is an example, but you must keep in your head the global picture. Just taking a single query and inspecting its access plan just can, in the best case, improve the performance of that single query. But that does not mean overall system performance improvement. In fact, it can mean overall worse system performance. On some cases, I've watched also something that's even worse. Having one such tool that is sanctioned and supported by the application vendor creates an effect where the suppport staff is completely blindsided to anything not reported by the tool. The tool recommendations are implemented without a second thought, even if they don't improve performance at all or mean large downtime windows. The true performance problems are hidden behind the reasoning "this is what the vendor recommends, therefore by following their recommendations we have the best possible performance" Sorry, I cannot tell you the application vendor or the tool, but if you're running a big ERP package with thousands of database tables you know who it is.

Like the "sensible defaults" option, those tools are not bad per se, but in this case they are even more dangerous. If you're going to take their advice, be sure to have a good test environment where you can replicate in as much as possible your live environment conditions. Test, test, and test, before implementing the changes.

It's all in the context

While RDBMSs have been converging to what it's called "commodization", which is a term used to describe a market segment where different brands are interchangeable one with another because they have easily comparable features, the same cannot be said of applications that use the RDBMS as its storage layer. That makes the context wildy different, and thus, those tools alone cannot be used to improve performance.
Why you say that? It logically should follow that, by following their advice, my system will have better performance. To refute this, apparently, perfect logic reasoning, just look at the context. If there's a common flaw in both tool categories it's the context. In the sensible defaults case, the context is the rest of the processes running on the machine. The automated tools, being more ambitious, have even more context to take into account, from other processes in the machine to usage patterns, different modes of operation, changing data densities and business seasonality and priorities.
In the end, like all tools, they need to be in capable hands to be effective. So next time you hear someone proposing a change in your database, ask why. If the answer is "because this tool has recommended it", just reject the change until the proposal has a solid background.

No comments:

Post a Comment