Sunday 5 March 2023

The "You do not need foreing keys" crowd

Can't help but notice a current trend advocating that foreing keys on database schemas are just a burden that you can do without. The last argument I read about it was along the lines of:

  • FKs are enabled on dev, testing and staging. Your automated test suites and your manual tests of your application in those environments and FKs help you catch data integrity problems.
  • Once you fix all problems, you just can deploy in production dropping all FKs, after all, you've taken care of all problems in the other environments, why you need FKs there? They just add overhead, don't they?

Ok, let's start admitting a basic truth: is strictly true that you can run your database without foreign keys. Yes, they add some performance overhead. Yes, they force you to do database operations in a certain order.

But beyond that, all these arguments are just showing lack of experience and ignorance. Let's see why.

For a start, anyone who thinks that all possible problems can be anticipated by means of testing in dev/staging environments is simply not experienced enough to have seen code bases with 100% test coverage and complete manual end user testing failing due to... unexpected reasons. Anyone that does not know that he/she "does not know what does not know" is simply lacking maturity and experience. Thinking that you can anticipate all possible error states, all possible system states is just hubris.

But that is just the beginning. Anyone who has watched a code base evolve under different hands and eyes knows that anything that is left as optional will, at some point in the development cycle, be ignored. Data integrity checks included.

And, anyone who has worked in anything more complex than a web shopping cart knows that parallelism is hard, concurrency is even harder, and locking is even harder than that, and these topics intersect between them. You just cannot easily roll your own transactional engine or your own locking mechanism. It takes a lot of work from a lot of talented people to create truly bullet proof database and transaction engines. Not saying that you cannot do it, but thinking that you will do better with your limited resources and experience in your project is really putting yourself very, very, high on the developer competence scale.

It is useful to compare these arguments with a topic that, while apparently unrelated, is an argument that has the exact same kind of flaws: strongly typed vs. loosely typed vs. untyped languages.

Yes, you can in theory create big and complex systems without using any data types. But the end result will be much more difficult to understand, harder to evolve and test, way more error prone and expensive in the long run than using a strongly typed language to do it.

Why is that? Because when using a strongly typed language, the compiler is acting a as first test bed for the validity of your assumptions about the shape of things that get passed around your code. Without that validation layer, you're simply more exposed to problems, introducing more chances of getting things wrong and forcing the reader of your code (including your future self) to deep dive into each and everything function call just to see what the callee is expecting to receive. Time consuming and error prone, to say the least.

So, data types are like foreign keys: a device that is used to make your code more robust, consistent and changeable. You can do without them, but be prepared to pay a cost much higher than having to declare types and relationships has.

In summary, "you don't need foreign keys in production" is the terraplanism of software development. It only shows how much you don't know and how little real world experience you have. Don't embarass yourself.