Client-side database connection timeout insanity

Attributed to Einstein but considered a dubious association, the quote is something like “Insanity is doing the same thing over and over and expecting different results”.

Probably … hopefully… it’s a familiar story.

I can’t be the only one, I’ve seen it so many places.

Connecting applications raise incidents against your database saying their application is erroring repeatedly whilst running a query and receiving an ORA-01013: user requested cancel of current operation.

Investigation shows that yes, sql monitoring can show a bunch of sequential executions of a particular query erroring with ORA-01013 – a really nice feature of sql monitoring that it captures these more often than not.

Nearly all of these errors occur around the 20 minute mark.

Do you have a 20 minute timeout?

Yes.

Why?

I’m not sure whether the mindset is inherited from other areas of the architecture. In the past, I happened to see this much more with applications involved in distributed transactions involving queues etc and which bring with them a whole lot more complexity.

Of course it all depends on what “normal” is for any query but an awful lot of these tend to have “normal” as not a million miles away from timeout and it doesn’t take much to nudge them over the limit – a different data distribution, some other load on the database, etc, etc

But I never understand what value that timeout brings.

No, it’s not “stuck”.

No, it’s not “hanging”.

It’s executing. It’s just not executing as fast as your expectations, possibly baseless, thought.

Of course sometimes we get a plan “flip” and something which was relatively fast becomes relatively slow.

But even then I think the proper application-level alerting and notification of that whilst allowing it to complete the execution where possible is infinitely better than spending hours getting hold of a DBA, raising the right sort of ticket, often telling them what you want profiled/baslined/patched all potentially done under a high priority because the timeout won’t let the relevant process run to completion.

Madness.

Leave a comment