Think in Sets

A short one today and just an opportunity, not to blow my own trumpet, but to bang a familiar drum that has been banged many times before by loads of people, including myself.

One of the most critical factors that will determine the performance of your database code is thinking in sets.

Yesterday my attention was drawn to a problematic and critical piece of code. Not some code that would appear on a top-N statspack report of resource-hungry SQL, but something that someone who was familiar the system knew was causing a problem in a particular area.

The fix was as easy as could be and a familiar story:

  • Take a pl/sql functional lookup that was being called hundreds of thousands of times by another piece of SQL.
  • Take the lookup out of the function, inline to the calling SQL.
  • Rewrite and optimise due to the different context (i.e. you are doing the lookup based on a set of information not on a row-by-row basis).

(In this latest example, these actions reduced running time on a 100,000 row insert from 5 hours to around 17 seconds!)

This thinking-in-set drum will require regular beating in the face of test-driven development – TDD.

I am a big fan of the database. And I am a big fan of TDD. And I think the two can, could, should, do and will co-exist nicely.

But you just have to be aware that TDD as an approach is technology agnostic, has arisen in application tiers where this set-based thinking is not either relevant or important, and can encourage you to refactor a little too much in the database at least.

So, next time you write that nice, packaged little lookup function. Take a minute to think about the overall context in which it will be called. And test it using something like runstats and with some appropriate volume (with this particular issue, the lookup was being called 100,000 times but the performance issue, which wasn’t visibly noticeably with one row lookup, was already obvious by simple clock time when called by just a ten row dataset).