Wednesday 7 May 2008

Sometimes a tenth of a second is important

As any other field where a certain skill level is reached, performance tuning for the uninitiated becomes some kind of magical black art beyond certain point. It's then part of your skill set to try to explain in terms that a non specialist can understand, what you're really doing. (and they are paying for)

This is the point where excessive simplification can, and in fact does, give you problems. A case in point was a tuning exercise done on a large import process for a database. The import process itself was iteratively doing on a row by row basis an SQL query that retrieved a single row. The application design was not optimal, as this access was not even partially indexed and the import process was agonizingly slow.

The data model was "customized" from the original, big application vendor, design to adapt it to business needs, and that customization had not taken into account all possible impacts of changing some meta data. But since the import process was also supplied by the vendor there was no way to change it. Neither calling the vendor for support was an option.

Ah, I said, no problem, we can create the necessary index and be done with it. But first, remembering that performance tuning is always a balancing exercise, I took the extra caution of checking the cardinality of the involved fields. The query in question had four columns in the WHERE criteria, but only three of them were significant in its selectivity. The fourth one just had a 2:1 ratio, while the other three together had more than 10.000:1 ratio. Thus, I created an index on the first three fields and left the fourth alone.

It is always a good idea to check the selectivity of your indexes, having unnecessary columns on them simply degrades performance without any benefit in the access times. It's also a very common, and wrong, performance optimization "rule" among non specialists that says "just add more indexes" as an answer to almost any database performance problem. The cardinality check is usually a good way to dispell that myth, because not always more indexes equate better performance.

After I did that, the results were good, the single row select was down from 10 seconds per row to .2 secs per row.

Well, that's it, I said. Only until later on the application developer came in and asked if I could further improve the performance. Since it was already 50 times better, I was in denial of adding the remaining column to the index. And when I did that, performance was down to .1 secs per row instead of .2. This kind of performance improvement, while spectacular in itself (how many times can you double something for such a small price?) was nowhere near the order of magnitude gained by the first one, and probably not worth doing in the general case.

Unless of course the single row SELECT is performed 100.000 times. Then it really makes a difference to wait 10.000 seconds or 20.000. In the first case your process will finish within normal working hours. In the second, you'll have to stay in the office late until it finishes. Case closed.

No comments:

Post a Comment