Thursday 29 October 2009

GAE: the good, the bad and the irrelevant

From what I read, the general feeling is that there are two irreconcilable factions in the cloud/no cloud debate: those that go for it and those that do not.

For such a highly technical topic, I'm surprised to see GAE being debated at almost emotional levels. In one side, there are folks that look like Google fanboys and see nothing but advantages for your business to host things in GAE. The other side seems to be in total denial of even consideration of using GAE for anything even remotely related to their line of work.

My view is that as with any question complex enough in life, the answer is "it depends". That's why I put together the following points that I think you should consider before betting for GAE.

Note that Amazon EC2 offers you a much lower level (system instance) hosting, these considerations still partially apply. And note that they are based more on experience with application hosting and development in general than my experience in GAE, which still is below what I wish it were. Bombs away.

The irrelevant


This may sound provocative, and in fact this is exactly the feeling I want to evoke. There a couple of things that I hear over and over against GAE but I'm still trying to understand the logic behind them.

Data Protection and security

GAE detractors say that they will lose control of their data by using GAE. Your data will be in Google's hands and that is bad. Let's face it, and please be realistic. Depending on how many degrees of outsourcing you have in place, your data is already in many hands, and is as safest as the weakest link on the chain. Very few business take even elementary precautions with their business data when dealing with it internally. Simple things like backup encryption are not being done for business sensitive data.

If anything, Google's massive scale and focus on technology will give the average business a much higher degree of data protection than they have today, if only because they can spread the huge capital cost of setting up all that infrastructure across thousands of customers. In theory, Google could potentially do whatever they wanted with your valuable business data, but it would be probably against their own interest to do so, if only for the lost business due to lack of confidence. And who do you think it has the greater incentive to use your data for nefarious purposes, the guy that takes the daily backup tapes away from your (shared? hosted?) data center or Google? Have you already taken any measures to encrypt your laptop hard disk? Do you allow for USB disks to be plugged on your workstations? Both thinks pose a greater risk to your data, at least judging by public reports and security experts, than anything that can happen inside Google's cloud.

I'm not saying that the folks at Google are perfect. They will, at some point, have a problem with the data they keep for third parties. Nobody knows when, but it will happen, if only because in any security scheme the weakest link in the chain is the human at the keyboard, and no matter how well Google recruits staff, they are not perfect.

I've yet to read any reports of security breaches at Google, and I'm sure that any hacker wishing to raise its market value would be quick to announce any kind of success with Google infrastructure. Security is a multiple layered onion and that the two topmost layers, the users and the application, are still being nursed by you. So look again at all those expensive firewalls and IDSs in your data center (that is, if you have any of those), guess how many of those Google has and start worrying instead about the most likely risks as of today: an application programmer leaving any SQL injection holes in your application or a user saving a spreadsheet with a list of all your customers in his laptop and the forgetting it in the back seat of a cab or in a train.

Uptime

This comes down to two reasons. The first is that they don't have the same level of control over scheduled downtime windows as they have on their own environment. I'm fine with that reason, and to some degree I'm ready to accept that as valid. But looking at those windows in GAE, while some of them could be inconvenient for your business, I'm sure that will compare favorably in terms of quantity and duration with the scheduled downtime that any local application you may have. Google environment is redundant and fail safe by design, that is, their need to bring the whole thing down to perform any chances is much, much less often than the typical environment. As for scheduled downtime due to application changes, this is more a factor of your application than the environment you choose to run it.

You also need to remember that GAE downtime history has been up until now more governed by it being a beta system than anything else. You should not expect to have similar degrees of downtime, scheduled or not, once they are fully up and running.

The second reason is truly, really, genuinely irrelevant. Unscheduled maintenance windows, that is, the time when the system crashes when nobody expects to, are the wors nightmares for any IT organization. And frankly, at any big consumer facing web business, be it Microsoft, Yahoo, eBay or Google downtime is way lower than anybody else in the closed environment data center world. Each time any of Google services, be it gmail or GAE, goes down, the critics raise their voices and say "that's why I'd never trust the cloud" I find it very hard to find environments where uptime is on par with the web-facing consumer giants, not even closer. Those folks have economies of scale that allow them to set up redundancy and fault tolerance on levels that the ordinary IT budgets cannot even dream of. So their unscheduled uptime is going to be lower than yours, period.

The most ironic part of it is that the people that make these kind of arguments against GAE are completely out of the GAE radar. The GAE proposition does not make sense in those enterprise environments that are above Google levels of security and uptime, that is, if they are actually meeting them outside of their dreams. But it's hard for me to find as many 99.99% uptime environments as GAE critics exist. My only possible explanation is that the criticism come mostly from armchair strategists. It is easy to "design" security and availability on paper. It is not easy or cheap to implement those designs.

And please, please, remember: I consider those points not being relevant in the context of discussing going with GAE or not. But that does not mean that they are not relevant. By all means they are, so always make sure that you have the right clauses in place whatever direction you choose.

The Bad

And yet I think there are genuine concerns you should be aware of when commiting to GAE, yet I don't see them widely discussed, perhaps because the bulk of the arguments belong to the the irrelevant ones.

Lock-in what you buy when you choose GAE is lock in. Good old lock in, of the deep variety. Any application you write for GAE will remain running under GAE forever as soon as it reaches any reasonable level of complexity. Don't fool yourself thinking that if you use Java your chances of being free are higher, because they are not. The GAE storage engine is different enough from anything else that you cannot run your application anywhere else. Period. If you want a good example, just look at GAE own framework, Django. The Django GAE has so many modifications over the standard Django framework that even Google is not making a sales argument of their Django heritage. Django is mentioned, but not promoted.

Rigid environment: by desing, GAE is not well suited to some applications. You may start your application with certain functionality in mind and later on want to expand it. In that expansion process, you may run into some of the limits imposed by GAE. By that time, by virtue of the lock-in, you'll have already invested some time and money in your application and you will then face the choice of basically throwing away most of that investment and switch to another, less restrictive platform, or start fitting your functionality to the GAE mindset: no analytical queries, no big batched transactions, no incoming mailbox, no …. I'm not completely aware of all the things you cannot do with GAE (or are extremely difficult and cumbersome to do as to not being practical) but it's likely that you'll run into any of them at some point. The problem with that, as I've already written somewhere else, is that Google itself does not have to deal with those as they have access to MapReduce and other technologies that allow them to sidestep any GAE limitations nicely. But you will not.

The good


Scalability. if there's one single thing they have mastered is how to massively scale everything they do. Massively. If you need that kind of scalability, Google is the way to go. And they keep commiting resources and capital to make their infrastructure even more big and scalable all the time. They have simply no competition in that front at this moment.

Uptime. Hey, all of us have heard of that server that has been running for the last eight years without a single reboot. Like urban legends, you never hear that from someone that has actually seen this machine, but only form people who know someone who know… While not breaking any records at this moment, Google uptime is among the best in the industry. Much more so taking into consideration the size of their operations. Also, I have the feeling that Google knows that this point is key to win the customer hearts.

In summary, I think that you should take a hard look at the bad and good before going to GAE. And, while not completely ignoring the irrelevant, take the time to measure them against what it is going to cost you to attain similar levels of uptime and security on your own. My bet is that what I've classified as irrelevant will actually look like, well, irrelevant, unless there are up front requirements (legal, for example) that prevent you to even pondering them in the decision. Then decide.