Database Performance Tuning: The great NoSQL debate

After some background, we can now dissect the reasons that have created the sudden interest for NoSQL data stores.

First, let's separate interest from hype. Hype is the part that says that NoSQL is good for everything, that everyone not using it has no future. The phenomenon of hype in technology is so interesting and complex that it deserves its own post. Well, a collection of them, really.

But as usual, the hype is wrong. NoSQL is not good for everything. Some of the half joking half serious virulence on the original post from Ted Dzuiba is really a response to the overhype surrounding NoSQL. But there are actually good reasons for using it.

Scalability - it's more about price per concurrent IO operation

Yes, this is one thing that they got right. The advantage of NoSQL engines is that they are cheap on the huge scale. What's important to remember is what "huge" means in this context. Think "huge" in terms of sustaining millions of reads per minute, thousands of writes and peaks ten times well above that.

The core components of mainstream RDBMSs were designed twenty years ago. Twenty years ago the Internet did not exist. The biggest database consumers were banks, financial institutions, retail companies and a few selected research facilities. None of them even approached or hinted at those levels of scalability.

RSBMSs of course evolved to support those loads. They added scalability features such as clustering, fail over, or load balancing. But they did it by bolting on additional features on the existing engines. The end result, as anyone that is involved on setting up or maintain one of those can assess, is nothing but simple to maintain and fine tune.

NoSQL databases were designed with this scalability in mind. Plus, being open source or in house solutions, they are free from vendor licence or support charges. That's a big plus if you need to set up thousands of them.

In summary, it's not that conventional relational solutions cannot achieve this level of scale. It's that by the time they reach it, the hardware and licencing costs are way above their NoSQL counterparts.

Integration - it's about the language you develop in

With the emergence of object oriented languages, in all their scripted and compiled forms, the gap between object and relational models has become more and more visible. Yes, there's a wealth of tools to help close that gap, to the point of them being a mere nuisance rather than a major roadblock.

But all those tools add overhead. Overhead in terms of performance but also overhead in terms of maintenance. When you are looking for extremely flexible programming environments, you want to get rid of as many obstacles as possible between the programmer and the problem that has to be solved.

While you cannot get rid completely of some persistence link with your programming environment, you can make it a lot smaller than a generic mapping layer designed for a generic database. NoSQL databases are usually tightly integrated with a few programming languages, ignoring all the rest. There's no ODBC for NoSQL engines, and the people using NoSQL do not need this kind of openess, for they have a small set of languages and environments to cater for. Don't expect a NoSQL bridge to your COBOL ISAM files or an API to access it from FORTRAN.

Structure - the world is not that relational

NoSQL databases are much less strict than their relational counterparts. Some of the relaxation comes from the desire to break away from too main constraints, much like the "latest" (read, ten years old at most) and the most popular scripting languages are typeless.

Of course, there are very intelligent people out there, me being one of them, thinking that above some complexity level, typeless languages and structures are simply not maintainable unless you're a gifted genius. But, and this is an important distinction, they are talking about the number of entities and its relationships in the conceptual model, not about how many instances of them exists.

Thus, in their right context, which usually are relatively simple web applications accessed by an awful lot of people, complexity is not that big of a problem. Scale is.

Don't forget that behind NoSQL claims of how nice relaxed structural requirements are there's an ugly hidden truth. Dropping referential integrity, constraints, type coercion or domains is not only done on the name of freedom. It's also done because it has a cost on performance and complexity of the engine.

As a conclusion, if your application needs to scale massively, it's not too complex in terms of what objects you need to persist and their relationships and you are comfortable developing in some lanuage invented 20 years ago or less, you could potentially benefit from using a NoSQL solution.

Database Performance Tuning

Monday, 5 April 2010

The great NoSQL debate - what NoSQL is good for

Scalability - it's more about price per concurrent IO operation

Integration - it's about the language you develop in

Structure - the world is not that relational

No comments:

Post a Comment

Useful links