Database Performance Tuning: Django and Python best practices

After a few weeks reading an extensive, and quite complex, Python/Django code base, I’ve realized that there are a few simple practices that can make a significant difference in how effectively and quickly one can pick up an application. Not being up to now an intensive Python user, I was expecting to catch up on the code with more or less the same level of effort it takes me to grasp a piece of C, SQL, or Java.

But it hasn’t happened as quickly as I expected.

I’ve found myself tracing with a debugger the application, not looking for bugs, but trying to understand what it does. In my mind, this is an admission of defeat: I can’t understand the code by reading it, I’ve to watch it in motion to be sure my mental image of what the code does and what the code actually does match.

Debugging is the task of verifying why your mental image of the code should do is not matching with what it actually does. Not the opposite. When you don’t know what some code does, you should be able to know it by reading it.

And I’ve realized why it was taking so much time. Python is so powerful and expressive that has its own shoot-yourself-in-the-foot factor that can be, with a big enough code base, equally dangerous than C, SQL or Java shoot-yourself-in-the-foot pitfalls.

So I’ve put together this short guide with the list of things I’d want to see in code that I’ve not written myself. Which are really the list of things I want to keep an eye when I write Python code in the future.

So this being a case of either my code reading abilities being weak or the code not being well written, I of course prefer to blame the code. Not the people to wrote it, of course: I’ve real world examples of each of these entries, but the point is not to shoot blame around, but rather to make the code more readable, shareable, and in the future, easier for newcomers to understand without resorting to tracing it with a debugger.

Don’t fight or otherwise reinvent Django or the standard library

Django provides built-in functionality to validate your data. To enable referential integrity. To deliver web pages. To do a lot of things. Django has been around for years, and improved over and over by a lot of people. So before implementing anything, think twice. Look around in the Django docs and check if there’s something already built in to do that.

In particular, use Django forms to validate data. Use Django validators in models. Use Django ForeingKey, use clean_data to actually … clean data. Don’t reinvent the wheel if there’s a perfectly good, already debugged and reusable, wheel available.

Use the Python standard comment syntax

Document every single parameter your function accepts. If your function has side effects, document them. If your function throws exceptions, state so in the documents.

The only acceptable exception for this rule is for methods that override or implement an existing Django convention. That would be an unnecessary restatement of what is already said. Except of course if your override a standard framework method and add some special contract.

It is much, much worse to have misleading documentation than no documentation at all because it creates cognitive dissonance. If you’re changing a method and not updating the documentation, you’re just confusing future readers that will discover sooner or later that the documentation does not match what the code does, and they’ll throw away the documentation anyway and throw a few expletives at you or your family. So it is best to throw away the documentation than to keep it obsolete before someone else loses time unnecessarily discovering that it is outdated.

Use Python parameter passing to… actually pass parameters

Python is famous for its readability, and its duck typing prevents a lot of mistakes. Named parameters and default values are a convenient way to plainly state what a method does. Method signatures can also be read by your IDE and used in code completion.

There are native ways to pass arguments to method calls. Don’t use JSON or HttpRequest to pass parameter values to a function that is not an URL handler. Period.

See the point on kwargs for more details.

Be explicit. Be defensive

When you consider augmenting the signature of a function by adding more parameters, just add them and provide sensible defaults.

You may think that you’re just making your code future-proof by using variable parameter lists. You are wrong. You’re just making it more confusing and difficult to follow.

Consider this code

 class base(object):  
   def blah(self, param1, **kwargs)  
   ...

 class derived(base):  
   def blah(elf, param1, **kwargs)

Don’t do it. If when you create it, blah does not need more than 1 argument, declare it like so. Leaving **kwargs forces the reader of the code to go thru the whole function body to verify if you’re actually using it. Don’t worry about future proofing your code, any half decent IDE will tell you which methods could have issues by your change in much less time than you can think about it. So just declare this:

 class base(object):  
   def blah(self, param1, param2=None)  
   ....

And future readers of your code will be able to tell what your function accepts.

MVC: the whole application is NOT a web page

According to Django’s own site:

In our interpretation of MVC, the “view” describes the data that gets presented to the user. It’s not necessarily how the data looks, but which data is presented. The view describes which data you see, not how you see it. It’s a subtle distinction.

So, in our case, a “view” is the Python callback function for a particular URL, because that callback function describes which data is presented.
Furthermore, it’s sensible to separate content from presentation – which is where templates come in. In Django, a “view” describes which data is presented, but a view normally delegates to a template, which describes how the data is presented.

Where does the “controller” fit in, then? In Django’s case, it’s probably the framework itself: the machinery that sends a request to the appropriate view, according to the Django URL configuration.

That does not mean that you have to use the same parameter passing conventions as an HTTP request. If you do that, you’re giving up on all the parameter validation and readability that Python provides.

There are native ways to pass arguments to method calls. Don’t use JSON format or HttpRequest to pass parameter values to a function that is not an URL handler. Yes, I'm repeating a sentence from the previous point here because it is very important to keep this in mind.

Avoid the temptation to create über-powerful handler() or update() methods/objects that can do everything inside a single entry point in a "util" or "lib" module (with a possibly associated evil kwargs parameter list) The fact is, this single entry point will branch to a myriad places and it will be a nightmare to follow and change in the future, becoming the dreaded single point of failure that no one wants to touch even with the end of a long sitck.

Instead, move the functionality related to each data item as close to the data as possible. Which means, to the module where it is declared. These should be much smaller and easier to test and manage than the über monsther methods.

Then, use the controller to glue together all these small pieces to build a response to your clients.

kwargs is EVIL. Deeply EVIL. Root-canal-extraction-level evil.

kwargs is a Python facility designed to provide enormous flexibility in some situations. Particularly, decorators, generators and other kind of functions benefit greatly from being able to accept an arbitrary number of parameters. But It is not a general purpose facility to call methods.

The ONLY acceptable use of kwargs in normal application development, that is, outside framework code, is when the function actually can accept an arbitrary number of arguments and its actions and results are not affected by the values received in the kwargs parameter list.

In particular, the following code is NOT ACCEPTABLE:

 def blah(**kwargs):  
   if ‘destroy_world’ in kwargs:  
     do_something()  
   if ‘save_world’ in kwargs:  
     do_something_else()

See how many things are wrong with this function? Let’s see: first, the caller does have to either read the documentation you provided in the function to know what are acceptable values to send in the kwargs dictionary. Second, a small syntax error when composing the arguments for the function call can make a significant difference in what the function does. Third, anyone reading your code will have to go thru all the function body to understand what valid kwargs arguments are.

And finally, why stop there? Why don’t you define all of your methods accepting **kwargs and be done with parameter lists? Can you imagine how completely unreadable your code will become?

Seriously, each time you use kwargs in application code, a baby unicorn dies somewhere.

DRY - Don’t repeat yourself

If you’re doing it more than twice, it is worth thinking about it. Consider this code:

 def blah(self, some_dict):  
   if ‘name’ in some_dict:  
     data[‘name’] = some_dict[‘name’]  
   if ‘address’ in some_dict:  
     data[‘address’] = some_dict[‘address’]  
   ....  
   ....  
   if ‘postal_code’ in ‘some_dict’:  
     data[‘postal_code’] = some_dict[‘postal_code’]

Why not use this instead?

 def blah(self, some_dict):  
   allowed_entries = [‘name’, ‘address’, ... ‘postal_code’]   
   for entry in allowed_entries.keys()   
     if entry in some_dict:   
       data[entry] = some_dict[entry]

Or even better and surely more pythonic and satisfying:

 def blah(self, some_dict):  
   allowed_entries = [‘name’, ‘address’, ... ‘postal_code’]   
   data = { key : some_dict[key] for entry in allowed_entries if key in some_dict }

There are a lot of advantages of doing this. You can arbitrarily extend the list of things that you transfer. You can easily test this code. The code coverage report will keep giving you 100% no matter how many values you include in some_dict. The code is explicit and simple to understand.

And even better, someone reading your code will not have to go thru a page or two of if statements just to see what you’re doing.

Avoid micro optimizations

You may code this thinking you’re just writing efficient code by saving a function call:

 if a in some_dict:  
   result = some_dict[a]  
 else  
   result = some_default_value

instead of

 result = some_dict.get(a, some_default_value)

Now, go back to your console and time these two examples executing a few thousand times. Measure the difference and think how many .0001’s of a seconds you’re saving, if any. Now, go back to your app and remember the point about using the Python standard library and the Django provided functionality.

Database Performance Tuning

Monday, 22 June 2015

Django and Python best practices