Keys

I wrote a comment today on an article on meaningless keys on Chen Shapira’s blog, a comment that I subsequently felt was worth regurgitating in a blog article.

I would normally leave my comment as just that, the comment, but as I’ve also had a couple of similar discussions very recently, I thought I might as well digress off into a bit of a rant.

Chen was pondering the age-old dilemma of natural versus artificial keys, with particular reference to an older article from Jonathan Lewis.

Jonathan was highlighting one of the disadantages of meaningless keys where in his example a query on a table4 had to go back via table3 and then table2 to get to table1 to lookup the value of what could have been one column of a natural composite key on table 4.

This led Chen to wonder whether he was right or wrong to use a customer id rather than a customer name for his primary and foreign keys and he linked to a script to update natural foreign keys when those keys change.

You can find loads of articles out there regarding the pros and cons of natural versus artifical keys. As Chen points out, the storage overhead of using natural keys that are longer than artifical keys is rarely an issue.

Not so long ago, when I was preparing to leave a former employer, a fleet of robots 😉 from a Big 5 (or however many there are these days) consultancy came in and looked at some of the data models, shook their heads and said something along the lines of “what are these natural keys here? we always use artifical keys” (throughout the models there were some artificial keys, some natural keys).

And again, recently, I’ve reviewed data model propositions proposing artifical keys where an immutable natural key was available … (actually on reflection post blog post, I think this was a tendency to use another sequence id rather than a composite key of other artifical keys which is slightly different but still relevant).

(which almost brings me on to why non-database specialists trivialise the database and yet find it hard to string two tables together in an easy efficient way …

and which nearly brings me on to why agile teams seem determined to want to use people who all know a little bit of every application tier but not necessarily enough of each).

My take has always been to take a pragmatic solution.

Never say never. Never say always.

The candidates for a primary key should be immutable. They should not change, ever, period.

It’s ok for a unique key to change but…

A child’s foreign key references either the parent’s primary key or a unique key.

A candidate foreign key should also be immutable.

If it’s likely to change then use a sequence generated key.

And when designing don’t forget about how your data is going to be accessed (if you know that you’re going to want to access a table using a component of a natural composite foreign key, think carefully about that before making the queries go via a whole bunch of other tables just to lookup the value of a bunch of artificial keys) – Designers need to develop to validate their designs.

6 Responses to Keys

  1. Have you ever seen a non-trivial natural key that is REALLY immutable?

    Social Security number? The SSA alters Social Security numbers when people have major identity theft problems.

    Passport number? Depends on the country… US passport numbers change every time you get a new one.

    Well, Passport Number is an immutable primary key for a passport, right? Say the guy typing in my number transposes two digits and then you use it in some Foreign Keys? How do you fix it?

    What about currency codes? Those are immutable, right? Um.. well… if no one ever makes a mistake in your company then they are immutable. But what happens if the guy putting in Danish Kroner writes DKR instead of the (correct) DKK and some people enter data before anyone realizes it?

  2. dombrooks says:

    Thanks Michael.

    >Have you ever seen a non-trivial natural key that is REALLY immutable?
    Yep.

    You might prefer to always use artificial keys. Fair enough. That’s your perogative.

    I don’t. That’s my perogative.

    That’s not to say that I don’t use a hell of a lot of artificial keys – because I do.

    But I have and do use, where I believe appropriate, natural keys.

    Sure, it tends not to be in the sort of situation where there is a guy putting in Danish Kroner as DKR instead of DKK.

  3. “What about currency codes? Those are immutable, right? Um.. well… if no one ever makes a mistake in your company then they are immutable. But what happens if the guy putting in Danish Kroner writes DKR instead of the (correct) DKK and some people enter data before anyone realizes it?”

    If you allow for that you also have to allow for the user who puts in DKR with a description of Danish Kroner, gets it wrong and adds a new ‘correct’ entry. Then you end up with 2 “unique” records for the same thing.

    FWIW I do tend to prefer natural keys, but I don’t believe they are immutable – and certainly this is a matter of choice (hence the interminable arguments) rather than of theory (since primary keys don’t get mentioned there). Choosing a primary key that is volatile is certainly problematic for all available systems, but it isn’t insurmountable, and my experience is that natural keys aren’t *that* volatile, people change their names for example, but not frequently – the same goes for the examples above of passports and SSN – though my understanding is that SSN is not guaranteed unique so isn’t a good candidate anyway.

  4. prodlife says:

    Thanks for the very interesting and detailed comment!

    Regarding the almost related topic of agile teams – they are probably the biggest obstacle in adopting natural keys. They are so happy to “embrace change”, that even if the requirements say that a field is immutable, I have to be ready for the possibility that this will change next week.

  5. Dom Brooks says:

    >>Have you ever seen a non-trivial natural key that is REALLY immutable?
    >Yep.

    Um… could you give examples?

    Niall Litchfield says:

    >though my understanding is that SSN is not guaranteed unique so isn’t a good candidate anyway.

    Good point… that’s another problem with natural primary keys – people often do not understand special cases that make them non-unique.

    Real world things tend to be messy.

    For example, wouldn’t you agree that at least an employee ID generated by your own application can be assumed to be guaranteed unique and immutable?

    Ummm… no.

    There have been real life cases where employees have sued or brought union grievances over things like employee ID codes that end in 666. Courts and arbitration panels tend to consider changing an id code to be a relatively minor accommodation to an employee’s religious preferences and have ordered such changes.

    Heck, depending on where your company is, your top management may have religious sensibilities that make them sympathetic to the 666-a-phobe so they don’t even need to go to court.

    I have no objections to natural IDs if they are really immutable and unique… it’s just that I can’t think of any outside of the trivial like Y/N for a Yes / No lookup table.

  6. dombrooks says:

    >>>Have you ever seen a non-trivial natural key that is REALLY immutable?
    >>Yep.
    >Um… could you give examples?

    I didn’t /haven’t because, as you’ve done with your examples, you want to come up with things that anyone reading the article can identify with.

    And it’s much easier to come up with examples such as SSN, passport, email , name which are not suitable than vice versa.

    That very fact emphasises your point that there are few appropriate natural keys but without undermining mine that if you have one you should use it.

Leave a reply to Michael Friedman Cancel reply