September 5, 2007 6 Comments
I would normally leave my comment as just that, the comment, but as I’ve also had a couple of similar discussions very recently, I thought I might as well digress off into a bit of a rant.
Chen was pondering the age-old dilemma of natural versus artificial keys, with particular reference to an older article from Jonathan Lewis.
Jonathan was highlighting one of the disadantages of meaningless keys where in his example a query on a table4 had to go back via table3 and then table2 to get to table1 to lookup the value of what could have been one column of a natural composite key on table 4.
This led Chen to wonder whether he was right or wrong to use a customer id rather than a customer name for his primary and foreign keys and he linked to a script to update natural foreign keys when those keys change.
You can find loads of articles out there regarding the pros and cons of natural versus artifical keys. As Chen points out, the storage overhead of using natural keys that are longer than artifical keys is rarely an issue.
Not so long ago, when I was preparing to leave a former employer, a fleet of robots ;-) from a Big 5 (or however many there are these days) consultancy came in and looked at some of the data models, shook their heads and said something along the lines of “what are these natural keys here? we always use artifical keys” (throughout the models there were some artificial keys, some natural keys).
And again, recently, I’ve reviewed data model propositions proposing artifical keys where an immutable natural key was available … (actually on reflection post blog post, I think this was a tendency to use another sequence id rather than a composite key of other artifical keys which is slightly different but still relevant).
(which almost brings me on to why non-database specialists trivialise the database and yet find it hard to string two tables together in an easy efficient way …
and which nearly brings me on to why agile teams seem determined to want to use people who all know a little bit of every application tier but not necessarily enough of each).
My take has always been to take a pragmatic solution.
Never say never. Never say always.
The candidates for a primary key should be immutable. They should not change, ever, period.
It’s ok for a unique key to change but…
A child’s foreign key references either the parent’s primary key or a unique key.
A candidate foreign key should also be immutable.
If it’s likely to change then use a sequence generated key.
And when designing don’t forget about how your data is going to be accessed (if you know that you’re going to want to access a table using a component of a natural composite foreign key, think carefully about that before making the queries go via a whole bunch of other tables just to lookup the value of a bunch of artificial keys) – Designers need to develop to validate their designs.