Database Performance Tuning: 2014

Sunday, 2 November 2014

Accidental complexity, Excel and shadow IT

You probably have felt like this. You've devoted some intense time to solve a very complex and difficult problem. In the process, you've researched the field, made attempts to solve it in a few different ways. You've came across and tested some frameworks as a means of getting close to the solution without having to do it all by yourself.

You've discarded some of these frameworks and kept others. Cursed and blessed the framework's documentation, Google and StackOverflow all at the same time. You have made a few interesting discoveries along the way, as always, the more interesting ones coming from your failures.

You've learned a lot a now are ready to transfer that knowledge to your customer, so you start preparing documentation and code so that it is in a deliverable state. Your tests become more robust and reach close to 100% coverage. Your documents start growing and get data from your notebook, spreadsheets and test results. Everything comes together nicely and ready to be delivered as a coherent and integrated package, something that your customer will value, appreciate and use for some time in the future (and pay for, of course)

And along the way, you've committed to your version control of choice all your false starts. All the successes. You've built a rather interesting history.

Of course, at this point your customer will not see any of the failures, except when you need to refer to them as supporting evidence for taking the approach you propose. But that's ok, because the customer is paying for your results and your time, and he's not really interested in knowing how to get these. If your customer knew how to do it in the first place, you would not be there doing anything, after all.

But it is exactly on these final stages where the topic of accidental versus essential complexity raises its head. You've spent some time solving a problem, solved it and yet you still need to wrap up the deliverables so that they are easily consumed by your customer(s). This can take many different shapes, from a code library, a patch set or a set of documents stating basic guidelines and best practices. Or all of them.

The moment you solved the problem you mastered the the essential complexity, yet you have not even started mastering the accidental complexity. And that still takes some time. And depending on the project, the problem, and the people and organization(s) involved this can take much more time and cost than solving the problem itself.

Which nicely ties to the "Excel" part of the post, which at this point you're likely asking yourself what exactly Excel has to do with accidental complexity.

The answer is: Excel has nearly zero accidental complexity. Start Excel, write some formulas, some headers, add some format, perhaps write a few data export/import VBA scripts, do a few test runs and you can proudly claim you're done. Another problem solved, with close to 100% of your time devoted to the essential complexity of the problem. You did not write any tests. You did not used any kind of source control. You did not analyzed code coverage. You did not document why cell D6 has a formula like =COUNTIF(B:B;"="&A4&TEXT(A6)) You did not document how many DLLs, add-ons or JDBC connections your spreadsheet needs. You did not cared about someone using it on a different language or culture where dates are expressed differently. Yet all of it works. Now. Today. With your data.

That is zero accidental complexity. Yes, it has its drawbacks, but it works. These kind of solutions are usually what is described as "shadow IT", and hardly a day passes without you coming across one of these solutions.

What I've found empirically is that the amount of shadow IT on an organization remains roughly proportional to the size of the organization. You may assume that larger organizations should be more mature, and by virtue of that maturity would have eliminated or reduced shadow IT. Not true. And that is because the bigger the organization, the bigger the accidental complexity is.

If you look at the many layers of accidental complexity, you'll have some of them common to organizations of all size, mostly at the technical level: your compiler is accidental complexity. Test Driven Development is accidental complexity. Ant, make, cmake and maven are all accidental complexity. Your IDE is accidental complexity. Version control is accidental complexity.

But then there's organizational accidental complexity. Whereas in a small business you'll likely have to talk to very few individuals in order to roll out your system or change, the larger the organization the thickest the layers of control are going to be. So you'll have to have your thing reviewed by some architect. Some coding standards may apply. You may have to use some standard programming language, IDE and/or framework, perhaps particularly unsuited to the problem you are solving. Then you'll have to go thru change control, and then... hell may freeze before you overcome the accidental complexity, and that means more time and more cost.

So at some point, the cost of the accidental complexity is way higher than the cost of the essential complexity. That is when you fire up Excel/Access and start doing shadow IT.

Monday, 18 August 2014

The code garage - What to do with old code?

From time to time I have to cleanup my hard disk. No matter how big my partitions are, or how bigger the hard disk is, there always comes a point where I start to be dangerously close to run out of disk space.

It is In these moments when you find that you forgot to delete the WAV tracks of that CD you ripped. That you don't need to have duplicate copies of everything you may want to use from both Windows and Linux because you can keep these in an NTFS partition and Linux will be happy to use them without prejudice.

And it is in these moments when I realise how much code I've abandoned over the years. Mainly in exploratory endeavours, I've written sometimes what in retrospective seem to be substantial amounts of code.

Just looking at abandoned Eclipse and Netbeans folders I find unfinished projects from many years ago. Sometimes I recognise them instantly, and always wonder at how subjective the perception of time is: in my mind that code is fairly fresh, but then looking at the timestamps I realize that I wrote that code seven years ago. Sometimes I wonder why I even thought that the idea was worth even trying at the time.

Yet here they are: a JPEG image decoder written in pure Java whose performance is about only 20% slower than a native C implementation. A colour space based image search algorithm complete with a web front end and back end for analysis. A Python arbitration engine that can scrape websites and alert of price differences applying Levenshtein comparisons across item descriptions. Enhancements to a remote control Android app that is able to drive a Lego Mindstorm vehicle over Bluetooth. That amalgamation of scripts that read EDI messages and extracts key data from them. Like seven different scripts to deal with different media formats, one for each camera that I've owned. And many more assorted pieces of code.

The question is, what I should do with this code? I'm afraid of open sourcing it, not because of patents or lawyers but because its quality is diverse. From slightly above alpha stage to close to rock solid. Some has test cases, some does not. In short, I don't feel it is production quality.

And I can't evade the thought that everything one writes starts in that state: we tend to judge the final product and tend to think that it was conceived in that pristine shape and form from the beginning. I know that's simply false: just look at the version control history of any open source project. But I want to have that smooth finish, clean formatting, impeccable documentation and fully automated build, test and deploy scripts from day one.

Yet some of this could be potentially useful to someone, even to me at some time in the future. So it is a shame to throw it away. So it always ends up surviving the disk cleanup. And I'll see it again in a few years and make myself the same question... why not have the equivalent of the code garage? Some place where you could throw all the stuff you no longer use or you don't think are going to use again and leave it there so anyone passing by can take a look and get the pieces if he/she is interested in them?

Monday, 14 April 2014

Heartbleed: the root cause

I can't resist on commenting this, because Heartbleed is the subject of countless debates in forums. In case you've been enjoying your privately owned tropical island for the past week or so, Heartbleed is the name given to a bug discovered in the OpenSSL package. OpenSSL is an Open Source package that implements the SSL protocol, and is used across many, many products and sites to encrypt communications between two endpoints across insecure channels (that is, anything connected by the internet is by definition insecure)

The so-called Heartbleed bug accidentally discloses part of the server memory contents, and thus can leak information that is not intended to be known by anyone else but the OpenSSL server. Private keys, passwords, anything stored in a memory region close to the one involved in the bug can potentially be transmitted back to an attacker.

This is serious. Dead serious. Hundreds of millions of affected machines serious. Thousands of million of password resets serious. Hundreds of thousands of SSL certificates renewed serious. Many, many man years of work serious. Patching and fixing this is going to cost real money, not to mention the undisclosed and potential damage arising from the use of the leaked information.

Yet the the bug can be reproduced in nine lines of code. That's all it takes to compromise a system.
Yet with all its dire consequences, the worst part around Heartbleed for me is what we're NOT learning from it. Here are a few of the wrong learnings that interested parties extract:

Security "experts" : this is why you need security "experts", because you can't never be safe and you need their "expertise" to mitigate this and prevent such simple mistakes to surface and audit everything right and left and write security and risk assesment statements.
Programmers: this Heartbleed bug happened because the programmer was not using memory allocator X, or framework Y, or programming language Z. Yes, all these could have prevented this mistake, yet none of them were used, or could be retrofitted easily into the existing codebase.
Open Source opponents: this is what you get when you trust the Open Source mantra "given enough eyeballs, all bugs are shallow" Because in this case a severe bug was introduced without no one realizing that, hence you can't trust Open Source code.

All these arguments are superficially coherent, yet they are at best wrong but well intentioned and at worst simply lies.

In the well intentioned area we have the "Programmers" perspective. Yes, there are more secure frameworks and languages, yet no single programmer in his right mind would want want to rewrite something of this complexity caliber without at least a sizeable test case baseline to verify it. Where's that test case baseline? Who has to write it? Some programmer around there, I guess, yet no one seems to have bothered with it. In the decade or so that OpenSSL has been around. So these suggestions are similar to saying that you will not be involved in a car crash if you rebuild all roads so that they are safer. Not realistic.

Then we have the interested liars. Security "experts" were not seen anywhere during the two years that the bug has existed. None of them analyzed the code, assuming of course that they were qualified to even start understanding it. None of them had a clue that OpenSSL had a bug. Yet they descend like vultures on a dead carcass on this and other security incidents the demonstrate how necessary they are. Which in a way is true, they were necessary much earlier ago, when the bug was introduced. OpenSSL being open source means anyone at any time could have "audited" the code and highlighted all the flaws -of which there could be more of this kind- and raised all the alerts. None did that. Really, 99% of these "experts" are not qualified to do such a thing. All bugs are trivial when exposed, yet to expose them one needs code reading skills, test development skills and theoretical knowledge. Which is something not everyone has.

And we finally have in the deep end of the lies area we have the Open Source opponents perspective. Look at how this Open Source thing is all about a bunch of amateurs pretending that they can create professional level components that can be used by the industry in general. Because you know, commercial software is rigurously tested and has the backing support of commercial entities whose best interest is to deliver a product that works as expected.

And that is the most dangerous lie of all. Well intentioned programmers can propose unrealistic solutions, the "security" experts can parasite the IT industry a bit more but that creates at best inconvenience and at worst a false sense of security. But assuming that these kinds of problems will disappear using commercial software puts everyone in danger.

First, because all kind of sotfware has security flaws. Ever heard of patch Tuesday? Second, because when there is no source code, there is no way of auditing anything and you rely on trusting the vendor. And third, because the biggest OpenSSL users are precisely commercial entities.

However, as easy it is to say if after the fact, it remains true that there are ways of preventing future Heartbleed-class disasters: more testing, more tooling and more auditing could have prevented this. And do you know what is the prerequisite to do all these things? Resources. Currently the core OpenSSL team consists of ... two individuals. None of which are paid directly for development of OpenSSL. So the real root cause of Heartbleed is lack of money, because there could be a lot more people that could be auditing and crash proofing OpenSSL, if only they were paid to do it.

But ironically, it seems that there is plenty of money on some OpenSSL users, whose business relies heavily on a tool that allows to securely communicate over the Internet. Looking from this perspective, Heartbleed could have prevented if any of the commercial entities using OpenSSL had invested some resources on auditing or improving OpenSSL instead of profitting from it.

So the real root cause of Hearbleed lies in these entities taking away without giving back. And when you look at the list, boy, how they could have given back to OpenSSL. A lot. Akamai, Google, Yahoo, Dropbox, Cisco or Juniper, to name a few, have been using OpenSSL for years, benefitting from the package yet not giving back to the community some of what they got. So think twice before basing part of your commercial success on unpaid volunteer effort, because you may not have to pay for it at the beginning, but later on could bite you. A few hundred of millions of bites. And don't think that holding the source code secret you're doing it better, becase in fact you're doing it much worse.