Database Performance Tuning: 2023

Tuesday 28 November 2023

Trunk based development sounds great. Here's why you should probably not do it

Mostly via Dave Farley's YouTube channel I'm lately involved in a lot of discussion regarding the "Trunk Based" development practice, touted together with continuous integration (CI) and continuous delivery (CD) as the next step in software development productivity.

As a short summary, this practice means that your branches are never older than a day, because you commit changes to main very, very frequently, sometimes more than once a day. This has a number of advantages, but the main one is that it mostly avoid merge conflicts. If your changes are small there is a correlated small likelihood of conflicting with others, and if conflicts arose, they should be easy to resolve. Plus, other team members are always up to date on your development so their own short lived branches are not going to conflict when they are merged.

Sounds good, in principle, but herein lies a danger that I call the "21 method parameter" danger.

Let's see. Dev 1 creates a method that takes, say, six parameters. All is well and good. Dev 2 needs a slight variation of that method and adds another parameter... ten changes later, you have a method with 15 parameters. Of course, everybody thinks that is excessive, but to refactor it means going back to changes made by Dev 2... Dev X which means a lot of refactoring, changing tests, etc.

All this runs against the "Trunk based" approach because it needs a long lived branch that is going to have merge conflicts. So nobody changes anything and they keep happily coding ahead.

So there it goes, until you have, and I've seen it in the real world, 21 parameters.

The context has not changed, and living in a "trunk based" world means this will likely won't change. It will be even worse.

A few people during the debate pointed out that there are tools to prevent that kind of thing to happen. Yes, you may use linters or other static analysis tools to detect such quality issues cropping up. But then you run into Bob adding the 7th parameter to the constructor asking you why you can't make an exception because your commit will became much older because there's no time to refactor this... so you allow it and go ahead towards the 8th parameter.

Then there were people saying that this kind of nightmare situation cannot be prevented with any methodology, to which I'd say that at least you should not use a methodology that almost encourages it. Due to the above, "trunk based" development is one of them, outside a very few scenarios. Let's see.

Early stages of development, where in fact there is nothing settled down but you know for sure that your code is going to have some shape. Yes, both Alice and Bob know the user role is going to contain a "Accounts Payable" member and an "Accounts Receivable" member so it is ok for each one to do a commit adding the role.
Elite teams made up of senior people capable of having the judgement of stopping before adding the 22th parameter to the method. They know something is wrong and set out to fix it steering of the "trunk based" approach.
Code that is so badly structured that even the simplest of user story implementations need lots of changes in lots of modules. In this case, most developers are statistically very likely to induce merge conflicts because you can't touch the accounts receivable module without touching the user module and the database repository interface and who knows what else.

Reality is, rarely one has the chance of starting something from scratch. Even if we are so lucky, that stage won't last more than a few months.

Reality is, we regular people are not part of elite teams. We're regular developers that have to meet deadlines and are willing to trade off a bit more technical debt (there are already 16 parameters, what's wrong with having 17 instead?) in order to fulfil our commitments.

So if you're in one of the three cases, perhaps trunk based development is right for you. For the rest of the cases, trunk based development merely rewards the behaviours that end up in the third stage: a sprawling code base of layers upon layers of patches made on behalf of "making progress" that won't be easy to fix. These code bases are usually the ones that are deemed to be a "total rewrite" over time.

And no, it's not good to be proud of doing "trunk based" development because you're in case (3) I feel sorry for you because the brittleness of the system makes it very, very difficult to make any progress with it.

As a final note, the solution of long lived branches creating merge conflicts sounds easy to me: it is not the age of the branch that matters, it is how often you merge the latest changes from main onto it. Do it daily and your branch will never, ever, be more than a day older in comparison with a branch created yesterday. So simple solution that it is hard to understand why the "trunk based" proponents keep ignoring it.

Wednesday 21 June 2023

On uniqueness and MD5 hashes

Came across this today: someone needs an unique key, but thinks that Python's uuid4 is not unique enough. The proposed solution? Just get the MD5 hash of the uuid.

No, passing your non unique identifier thru the MD5 formula won't make it more unique. In fact, it will create something equally non unique as your initial uuid. Only larger.

Sunday 5 March 2023

The "You do not need foreing keys" crowd

Can't help but notice a current trend advocating that foreing keys on database schemas are just a burden that you can do without. The last argument I read about it was along the lines of:

FKs are enabled on dev, testing and staging. Your automated test suites and your manual tests of your application in those environments and FKs help you catch data integrity problems.
Once you fix all problems, you just can deploy in production dropping all FKs, after all, you've taken care of all problems in the other environments, why you need FKs there? They just add overhead, don't they?

Ok, let's start admitting a basic truth: is strictly true that you can run your database without foreign keys. Yes, they add some performance overhead. Yes, they force you to do database operations in a certain order.

But beyond that, all these arguments are just showing lack of experience and ignorance. Let's see why.

For a start, anyone who thinks that all possible problems can be anticipated by means of testing in dev/staging environments is simply not experienced enough to have seen code bases with 100% test coverage and complete manual end user testing failing due to... unexpected reasons. Anyone that does not know that he/she "does not know what does not know" is simply lacking maturity and experience. Thinking that you can anticipate all possible error states, all possible system states is just hubris.

But that is just the beginning. Anyone who has watched a code base evolve under different hands and eyes knows that anything that is left as optional will, at some point in the development cycle, be ignored. Data integrity checks included.

And, anyone who has worked in anything more complex than a web shopping cart knows that parallelism is hard, concurrency is even harder, and locking is even harder than that, and these topics intersect between them. You just cannot easily roll your own transactional engine or your own locking mechanism. It takes a lot of work from a lot of talented people to create truly bullet proof database and transaction engines. Not saying that you cannot do it, but thinking that you will do better with your limited resources and experience in your project is really putting yourself very, very, high on the developer competence scale.

It is useful to compare these arguments with a topic that, while apparently unrelated, is an argument that has the exact same kind of flaws: strongly typed vs. loosely typed vs. untyped languages.

Yes, you can in theory create big and complex systems without using any data types. But the end result will be much more difficult to understand, harder to evolve and test, way more error prone and expensive in the long run than using a strongly typed language to do it.

Why is that? Because when using a strongly typed language, the compiler is acting a as first test bed for the validity of your assumptions about the shape of things that get passed around your code. Without that validation layer, you're simply more exposed to problems, introducing more chances of getting things wrong and forcing the reader of your code (including your future self) to deep dive into each and everything function call just to see what the callee is expecting to receive. Time consuming and error prone, to say the least.

So, data types are like foreign keys: a device that is used to make your code more robust, consistent and changeable. You can do without them, but be prepared to pay a cost much higher than having to declare types and relationships has.

In summary, "you don't need foreign keys in production" is the terraplanism of software development. It only shows how much you don't know and how little real world experience you have. Don't embarass yourself.