Modelling a “simple” ITL problem

Today I have been looking at an application problem.

Last night important client processing missed its SLA with the application reporting ORA-02049: timeout: distributed timeout waiting for lock.

This post isn’t about the distributed lock timing out.

But consider the evidence that further investigation into these distributed locks threw up.

  1. The parameter distributed_lock_timeout is set to the default of 60 (seconds) – this means that any distributed transaction that waits for an enqueue for 60 seconds will be picked off by a sniper, similar to the standard deadlock detection after 3 seconds.
  2. The 1-in-10 second ASH samples in DBA_HIST_ACTIVE_SESS_HISTORY show clear waits on “enq: TX – allocate ITL entry”
  3. These same ASH samples show one particular sql id as being involved in the ITL waits.
  4. This particular sql id is a DELETE on a single table, no triggers, no on delete cascade referential integrity.

Sounds easy right?

But…

  1. V$SEGMENT_STATISTICS and DBA_HIST_SEG_STAT show 0 ITL waits for the segment/s related to the DELETE sql id above.
  2. In the ASH data, for the event “enq: TX – allocate ITL entry”, the CURRENT_OBJ# column indicates a variety of objects that might show ITL waits under concurrent DML but, again, all unrelated to the sql id identified by ASH.

So…

Time for some theories:

  1. ASH could be misleading us about the sql ids and/or
  2. ASH could be misleading about current_obj# and/or
  3. If our enqueue times out because of the distributed lock timeout, then perhaps the ITL wait is not recorded – i.e. segment stats are misleading.

Theories about ASH being misleading under certain circumstances are incredibly convoluted to test but, as far as I am concerned, even up to the latest 11gR2 this is a known known. There are certain values that are not cleared out or not recorded properly or disappear under certain conditions.

But perhaps we can test theory 3 relatively easily?

I like recycling – I’m going to reuse Randolf Geist’s ITL test case here.

SQL> get itl_case.sql
  1  create table t1 (c1 varchar2(1336)) pctfree 10;
  2  insert into t1 values ('a');
  3  insert into t1 values ('b');
  4  insert into t1 values ('c');
  5  insert into t1 values ('d');
  6  insert into t1 values ('e');
  7  insert into t1 values ('f');
  8  commit;
  9  update t1 set c1 = rpad(c1, 1335, c1) where length(c1) = 1;
 10* commit;
 11  .
SQL> @itl_case

Table created.
1 row created.
1 row created.
1 row created.
1 row created.
1 row created.
1 row created.
Commit complete.
6 rows updated.
Commit complete.

SQL>

Now I’m going to delete some rows across three sessions.
The first two session simple uncommitted deletes. The third will do a select over a db link in order to start the distributed transaction followed by the same simple delete.

First session – delete some rows and leave uncommitted so the ITL cannot be cleared:

SQL_SESS1>select distinct sid from v$mystat;

       SID
----------
      2056

SQL_SESS1>delete from t1 where c1 like 'a%';

1 row deleted.

SQL_SESS1>

Second session – delete some rows and leave uncommitted so the ITL cannot be cleared:

SQL_SESS2>select distinct sid from v$mystat;

       SID
----------
      2126

1 row selected.

SQL_SESS2>select to_char(sysdate,'DD-MON-YYYY HH24:MI') from dual;

TO_CHAR(SYSDATE,'
-----------------
01-NOV-2012 23:32

1 row selected.

SQL_SESS2>delete from t1 where c1 like 'b%';

1 row deleted.

SQL_SESS2>

Finally – third session – try to delete some rows within a distributed transaction:

SQL_SESS3>select distinct sid from v$mystat;

       SID
----------
      1884

1 row selected.

SQL_SESS3>select 1 from dual@mydba;

         1
----------
         1

1 row selected.

SQL_SESS3>delete from t1 where c1 like 'c%';
delete from t1 where c1 like 'c%'
*
ERROR at line 1:
ORA-02049: timeout: distributed transaction waiting for lock


SQL_SESS3>

Right, so what does ASH tell us about what happened?

SQL_SESS2>l
  1  select to_char(sample_time,'DD-MON-YYYY HH24:MI:SS'), sql_id, event, blocking_session, current_obj#
  2  from   v$active_Session_history
  3  where  event = 'enq: TX - allocate ITL entry'
  4  and    sample_time >= to_date('01-NOV-2012 23:32','DD-MON-YYYY HH24:MI')
  5* order  by sample_time desc
SQL_SESS2>/

TO_CHAR(SAMPLE_TIME, SQL_ID        EVENT                          BLOCKING_SESSION CURRENT_OBJ#
-------------------- ------------- ------------------------------ ---------------- ------------
01-NOV-2012 23:35:14 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:35:13 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:35:12 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:35:11 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:35:10 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:35:09 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:35:08 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:35:07 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:35:06 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:35:05 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:35:04 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:35:03 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:35:02 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:35:01 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:35:00 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:59 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:58 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:57 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:56 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:55 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:54 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:53 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:52 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:51 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:50 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:49 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:48 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:47 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:46 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:45 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:44 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:43 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:42 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:41 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:40 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:39 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:38 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:37 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:36 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:35 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:34 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:33 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:32 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:31 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:30 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:29 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:28 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:27 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:26 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:25 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:24 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:23 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:22 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:21 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:20 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:19 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:18 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:17 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:16 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:15 430xqxrrucgc5 enq: TX - allocate ITL entry               2056           -1
01-NOV-2012 23:34:14 430xqxrrucgc5 enq: TX - allocate ITL entry               2056           -1
01-NOV-2012 23:34:13 430xqxrrucgc5 enq: TX - allocate ITL entry               2056           -1
01-NOV-2012 23:34:12 430xqxrrucgc5 enq: TX - allocate ITL entry               2056           -1
01-NOV-2012 23:34:11 430xqxrrucgc5 enq: TX - allocate ITL entry               2056           -1

71 rows selected.

Elapsed: 00:00:00.04
SQL_SESS2>

So, we are reporting wait event “enq: TX – allocate ITL entry” which was the aim.

And look at column current_obj# – seems perhaps that it is not set properly for this wait event (even though my rows in my real world example were not -1)

Sql id 430xqxrrucgc5 is my delete statement so that is bang on the money.

Finally, what do segment statistics say about ITL waits for this object t1?

SQL_SESS3>l
  1  select statistic_name, value
  2  from   v$segment_statistics
  3  where  object_name = 'T1'
  4* and    statistic_name = 'ITL waits'
SQL_SESS3>/

STATISTIC_NAME                                                        VALUE
---------------------------------------------------------------- ----------
ITL waits                                                                 0

1 row selected.

SQL_SESS3>

Because the statement was sniped by the distributed transaction timeout, the segment ITL statistics are not incremented and so these (and AWRs snaps thereof) miss the significance.

A good point made by Coskan is that I did not include the blocking session information in my ASH output.

Simply this is because it made the output too wide.

One of the sessions is reported as blocker but in reality there may be many blockers – it’s as many as there are ITL slots.

Also, depending on how many sessions are waiting, you may find that your blocked statement is unlucky and gets superceded by other sessions who get the ITL just through good timing of when they tried to get the enqueue. This is what was happening in my particular situation where an unlucky session was blocked for 1 minute – up to the distributed lock timeout – by a variety of blockers who at other times in that minute were busy doing other stuff not all waiting on this enqueue.

Follow

Get every new post delivered to your Inbox.

Join 62 other followers