Deadlock trace file enhancements

Just browsing some deadlock trace files on 11gR2.

I normally look at sections:

  • Deadlock Graph
  • Information for the OTHER waiting session
  • Information for THIS session

and very rarely go beyond section PROCESS_STATE.

But, casually browsing, I came across these sections relating to the killed session:

  • Session Wait History
  • Sampled Session History of session

Nice additions. When did that happen?

WFG – mode 5 TM deadlock

Just a brief note to preserve some observations on a deadlock situation reported a while back on the OTN forums.

The symptoms of the problem were deadlocks on a RAC system and this example partial extract of a wait-for-graph from an LMD trace file:

Global Wait-For-Graph(WFG) at ddTS[0.5] :
BLOCKED 0x3d2438198 4 wq 2 cvtops x1 TM 0x1df87.0x0(ext 0x0,0x0)[41000-0001-000004A5] inst 1 
BLOCKER 0x3d7846108 4 wq 2 cvtops x1 TM 0x1df87.0x0(ext 0x0,0x0)[38000-0002-00002C68] inst 2 
BLOCKED 0x3d7846108 4 wq 2 cvtops x1 TM 0x1df87.0x0(ext 0x0,0x0)[38000-0002-00002C68] inst 2 
BLOCKER 0x3d75a4fd8 4 wq 2 cvtops x1 TM 0x1df87.0x0(ext 0x0,0x0)[43000-0003-000009C1] inst 3 
BLOCKED 0x3d75a4fd8 4 wq 2 cvtops x1 TM 0x1df87.0x0(ext 0x0,0x0)[43000-0003-000009C1] inst 3 
BLOCKER 0x3d2438198 4 wq 2 cvtops x1 TM 0x1df87.0x0(ext 0x0,0x0)[41000-0001-000004A5] inst 1 

The details of a WFG are slightly different compared to a non-RAC/LMD deadlock graph, not least how to interpret lock mode.

Column 8 indicates a TM lock on the object in column 9 – 0x1df87 (hex that can be converted to dec and used in dba_objects)

The lock mode in column 3 is slightly misleading in that “4″ does not mean mode 4 but mode 5, thanks to the following resources:

and this:

#define KJUSERNL 0         /* no permissions */   (Null)
#define KJUSERCR 1         /* concurrent read */ (Row-S (SS))
#define KJUSERCW 2         /* concurrent write */ (Row-X (SX))
#define KJUSERPR 3         /* protected read */    (Share)
#define KJUSERPW 4         /* protected write */ (S/Row-X (SSX))
#define KJUSEREX 5         /* exclusive access */ (Exclusive)

“wq 2″ is apparently the convert queue, “wq 1″ being the grant queue.

So, in summary, deadlock issues regarding mode 5 TM locks.

In this particular case, the usual culprit was an unindexed foreign key, albeit a less common variation with a self-referencing foreign key in conjunction with updates to the primary key column – a questionable habit which in my mind then raises questions about the design.

E.g.

Session 1>create table t1
  2  (col1         number       not null
  3  ,col2         varchar2(10) not null
  4  ,related_col1 number
  5  ,constraint pk_t1 primary key (col1)
  6  ,constraint fk_t1 foreign key (related_col1) references t1 (col1));

Table created.

Session 1>insert into t1 values (1,'X',null);

1 row created.

Session 1>insert into t1 values (2,'X',null);

1 row created.

Session 1>commit;

Commit complete.

Session 1>

If we try to update the pk then session 2 blocks:

Session 1:

Session 1>update t1 set col1 = 1  where col1 = 1;

1 row updated.

Session 1>

Session 2:

Session 2>update t1 set col1 = 2 where col1 = 2;
... hangs...

Blocking on a mode 5 TM:

Session 1>select process,
  2         l.sid,
  3         type,
  4         lmode,
  5         request,
  6         do.object_name
  7  from   v$locked_object lo,
  8         dba_objects do,
  9         v$lock l
 10  where  lo.object_id   = do.object_id
 11  AND    l.sid          = lo.session_id
 12  AND    do.object_name IN ('T1');

PROCESS           SID TY      LMODE    REQUEST OBJECT_NAME
---------- ---------- -- ---------- ---------- -----------
6052:1248        1010 AE          4          0 T1
6052:1248        1010 TM          0          5 T1
496:4000         1640 AE          4          0 T1
496:4000         1640 TM          3          0 T1
496:4000         1640 TX          6          0 T1

Session 1>

It’s not hard to go and engineer a deadlock scenario but as it would be so artificial, I won’t bother.

And as expected, an index addresses the issue.

Session 1:

Session 1>rollback;

Rollback complete.

Session 1>create index i_t1 on t1 (related_col1);

Index created.

Session 1>update t1 set col1 = 1  where col1 = 1;

1 row updated.

Session 1>

Session 2:

Session 2>rollback;

Rollback complete.

Session 2>update t1 set col1 = 2 where col1 = 2;

1 row updated.

Session 2>

Deadlock detection – unindexed foreign keys

Recently I have had a shortage of things to blog about. Nevertheless I have resisted the urge to join the cacophony by just announcing that 11g is in fact available for download on Linux.

Sure enough, eventually an opportunity has presented itself. Well, two opportunities but more of the other elsewhere later, but in this instance – deadlocks, specifically regular and numerous deadlocks caused by regular and numerous jobs, jobs which should have been non-conflicting.

However, dear reader, I shall spoil the surprise by revealing that these deadlocks were caused by unindexed foreign keys – more detail further down. It’s an old problem that’s been around for a long time, but it still happens (although, granted, more often than not in the databases that I see these days, this tends not to be a problem because people haven’t bothered with any sort of constraint in their database, but that’s a rant for another day).

I’m reluctant to admit it for fear of being shot down by the BAAG police ;-) , but at the time of my proclaimation that unindexed foreign keys was the issue, it was a hunch, a gut feeling backed by a decade of experience and the briefest of brief glances at the resulting trace file from the deadlock (is it just me or is there a blurred line between a guess and a hunch / gutfeeling / shortcut that experience suggests?) . Fortunately, a longer look at the trace file backed me up and the subsequent indexing of the columns in the foreign keys was a massive success.

From an end user perspective (including both developers connecting to the database via whatever tool and also application users), the symptoms initially were of an unresponsive development database, seemingly hanging every couple of minutes for a couple of minutes.

Immediately, to the alert log, where there were deadlock entries and in the background dump dest, where there were trace files from the jobs with more detailed information on the deadlocks detected.

Deadlock detection takes some time and so whilst waiting for these to be detected, the sessions on both sides of the deadlock were hogging the limelight and in the meantime bringing the database to its knees for everyone else.

In the trace files, the regular entries were all very similar, such as this:


*** SESSION ID:(37.18266) 2007-08-09 03:22:47.678
DEADLOCK DETECTED
Current SQL statement for this session:
UPDATE TABLE_A RC SET COLUMN_A = :B2 WHERE COLUMN_B = :B1
----- PL/SQL Call Stack -----
object line object
handle number name
000006FBE7909428 375 package body PACKAGE_A
000006FBE7909428 1443 package body PACKAGE A
000006FBF4399740 1 anonymous block
The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:
Deadlock graph:
---------Blocker(s)-------- ---------Waiter(s)---------
Resource Name process session holds waits process session holds waits
TM-00024f42-00000000 17 37 SX SSX 24 22 SX SSX
TM-00024f42-00000000 24 22 SX SSX 17 37 SX SSX
session 37: DID 0001-0011-00023C06 session 22: DID 0001-0018-000032FA
session 22: DID 0001-0018-000032FA session 37: DID 0001-0011-00023C06

The key information was firstly the statement being issued. This was always the same – there was just this one statement causing all the problem:


UPDATE TABLE_A RC SET COLUMN_A = :B2 WHERE COLUMN_B = :B1

Secondly, the sessions that were deadlocking were using different values for :B1 – this was visible from the additional bespoke logging as part of this PACKAGE_A code module.

Next, the deadlocks caused by this one statement were on a variety of objects – all child tables of TABLE_A or primary key indexes on child tables of TABLE_A.

In this particular trace file:


---------Blocker(s)-------- ---------Waiter(s)---------
Resource Name process session holds waits process session holds waits
TM-00024f42-00000000 17 37 SX SSX 24 22 SX SSX
TM-00024f42-00000000 24 22 SX SSX 17 37 SX SSX

the two sessions are trying to escalate a held TM lock from SX to SSX. A TM lock, a DML Enqueue, makes sure that the structure of a table is not altered whilst, in this example, you are updating a row in a table.
From the concepts guide:

A row exclusive table lock (also called a subexclusive table lock, SX) generally indicates that the transaction holding the lock has made one or more updates to rows in the table.
...
A share row exclusive table lock (also sometimes called a share-subexclusive table lock, SSX) is more restrictive than a share table lock... Only one transaction at a time can acquire a share row exclusive table lock on a given table. A share row exclusive table lock held by a transaction allows other transactions to query or lock specific rows using SELECT with the FOR UPDATE clause, but not to update the table... A share row exclusive table lock held by a transaction prevents other transactions from obtaining row exclusive table locks and modifying the same table.

What object is this TM lock being held on?
Note the lock name – TM-00024f42-00000000 – that 00024f42 is a hex number, 151362 is its decimal converstion. Look up object_id 151362 in dba_objects and you’ll find the details of the object. In the continued interest of obscuring client code and object names, in this example it was TABLE_B, a child of TABLE_A with a foreign key enforcing the constraint but no index.

So, to the solution – to index all the foreign keys between these handful of tables. A bit of a sledgehammer to crack a nut approach, but under the circumstances and the local urgency to get this done and out, an advisable one with the caveat that there might be some impact to blindly adding indexes left, right and centre.

In terms of SQL to identify unindexed foreign keys, I was lazy and did a search on google, albeit with this caveat in mind. Interestingly I found a script on experts-exchange that I wrote to do exactly the same thing nearly a decade ago. But it was a rubbish script, and I used the one linked to from Tom Kyte’s page.

Just as a conclusion on this topic, note some points that Tom Kyte makes on how things change, the subject matter of which is exactly this problem. Also, I managed to deadlock myself in a single session last week by using an autonomous transaction to update a table, the child of which had been modified by the outer transaction – not clever.

Follow

Get every new post delivered to your Inbox.

Join 62 other followers