Monday, 23 March 2009

GAE - Abstracting complexity and sharding counters

The history of application development is littered with inventions addressing the problem of filling the conceptual gap between a machine serially executing some instructions acting on data and a business person trying to automate some process. Over the years, layer upon layer of abstractions have been developed, from assemblers, compilers, operating systems to the highly abstracted REST and SOA specs of today.

If you're exploring the Google AppEngine (GAE) you probably have already come across sharded counters.

Basic counters explained

Many applications present some kind of counter to the end user: message boards tell you the number of responses to a topic. Auction systems give you how many people are bidding for an item. Digg tells you how many people have rated an item. And so on, there are too many examples of counters used everywhere.

For performance reasons, it is a common practice to keep those counts stored in the database. It's not that they cannot be calculated at page rendering time, but in some cases the performance hit of retrieving those records and counting them is considered to be excessive.

As always with performance tuning, there is a compromise here, as keeping the counter accurate each time something in the database related to the count changes is expensive and complex in terms of database operations. But for web apps where the count is being retrieved hundreds of times more than needs updating, the trade off is usually worthwile.

Let me remark that as always, "usually" is the pivotal concept. There are specific situations where this does not apply and is not worth performance wise to keep the counter updated. Always apply common sense first, apply some commmon sense second, and third do some performance testing.

The Google Datastore and counters

Since the Google Datastore does not have any analytical capabilities and given its constraints in response times, any counters except the most trivial ones, cannot be calculated at page rendering time. There is simple no other option, as the GAE puts sever restrictions on the amount of data that can be fetched and the time it should take a page to be served.

This is all well and good, Google in this case is simply watching your back so that your users get good response times and its infrastructure is not abused.

However, due to the way the Datastore engine works, a single data item is stored in a single machine. That can create problems if lots of application instances try to update the counter at the same time. Google's advice to solve that is that you "Shard" your counter. "Sharding" a counter means to create many counters and incrementing one of them randomly. Each time you need to present the counter value, you issue a query to retrieve all the counters and add them together.

This nicely avoids the botteleneck in high concurrency situations, but notice how this has increased the overhead of retrieving the counter, and more important, forces the designer to make a decision on the data type of a simple counter.

This breaks the rule of abstractions making things simpler. Instead of the engine being able to nicely scale to your requirements, you need to tweak the engine to adapt to the expected workload. Of course, if someone has to consider that decision is probably happy to face scalability problems, as that can only mean that his/her site is being successful.

I'm not picking up on Google for this, but this is another example where GAE strikes me as unbalanced. On one side it offers excellent tools to quickly and scalably put together web applications, with a low barrier of entry in terms of learning. On the other side, it makes you face design considerations that seem to be more the province of specialized parallel processing engineers.

No comments:

Post a Comment