Blocking Prepared XA Transaction

There was an oracle-l thread last month about blocking sessions which could not be identified.

I replied back about a very niche scenario which I’d seen a few times before. And it’s just happened again so I thought it would be useful to just post some evidence on this.

Here are a couple of articles posted previously on XA transactions:

First, a reminder, from the posts above, that a transaction doesn’t have to have a session.

Normal “vanilla” sessions, there’s a 1:1 relationship. But with an XA transaction, a session can attach and detach – but only one session can be attached to a transaction at any one time.

And a bit of context about these XA transactions.

This is a complex Java application with multiple resources participating in the XA transaction. There was HornetQ involved and HornetQ was apparently timing out and was bounced, the taking down being down probably with a kill -9.

And an XA transaction does what is called a 2-phase commit where the first phase involves the transaction co-ordinator instructing the participating resources to prepare to commit and subsequently the second phase is to commit.

What I have seen many times on this application is that something happens to the transaction co-ordinator in between the PREPARE and the COMMIT and we are left we an orphaned transaction.

This transaction will eventually time-out – depending on the properties set on the transaction by the co-ordinator – at which point it will become an in-doubt transaction and be visible in dba_2pc_pending.

Back to reality… there is nothing in DBA_2PC_PENDING:

select * from dba_2pc_pending;

no rows selected

But we can see there is a transaction in prepared state in V$TRANSACTION, a transaction which started yesterday afternoon:

select * from v$transaction where status = 'PREPARED';
ADDR                 XIDUSN    XIDSLOT     XIDSQN     UBAFIL     UBABLK     UBASQN     UBAREC STATUS           START_TIME           START_SCNB START_SCNW START_UEXT START_UBAFIL START_UBABLK START_UBASQN START_UBAREC SES_ADDR               FLAG SPACE RECURSIVE NOUNDO PTX NAME                                                                                                                                                                                                                                                             PRV_XIDUSN PRV_XIDSLT PRV_XIDSQN PTX_XIDUSN PTX_XIDSLT PTX_XIDSQN     DSCN-B     DSCN-W  USED_UBLK  USED_UREC     LOG_IO     PHY_IO     CR_GET  CR_CHANGE START_DATE            DSCN_BASE  DSCN_WRAP  START_SCN DEPENDENT_SCN XID              PRV_XID          PTX_XID        
---------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------------- -------------------- ---------- ---------- ---------- ------------ ------------ ------------ ------------ ---------------- ---------- ----- --------- ------ --- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- -------------------- ---------- ---------- ---------- ------------- ---------------- ---------------- ----------------
0000004A833D6868        691         32     315541         22    1085802     -28624         18 PREPARED         01/19/18 15:39:45    3454176189       2913         13           22      1085802       -28624            9 0000004B82E584A0    4201987 NO    NO        NO     NO                                                                                                                                                                                                                                                                            0          0          0          0          0          0          0          0          1         12     -40287       -163  -66597824      -1385 19-JAN-2018 15:39:45          0          0    1.3E+13             0 B302200095D00400 0000000000000000 0000000000000000 


And this is the script which I mentioned on the oracle-l thread which is one I use to see what locks transaction are holding, when the transaction started and what sessions are attached:

select s.machine
,      lo.inst_id
,      lo.object_id
,      lo.session_id
,      lo.os_user_name
,      lo.process
,      ob.owner
,      ob.object_name
,      ob.subobject_name
,      tx.addr
,      tx.start_time txn_start_time
,      tx.status
,      tx.xid
,      s.*
from   gv$locked_object lo
,      dba_objects      ob
,      gv$transaction    tx
,      gv$session        s
where  ob.object_id = lo.object_id
and    tx.xidusn    (+) = lo.xidusn
and    tx.xidslot   (+) = lo.xidslot
and    tx.xidsqn    (+) = lo.xidsqn
and    s.taddr      (+) = tx.addr
order by txn_start_time, session_id, object_name;

For privacy reasons and as this is a real-world situation and not an isolated test case, I won’t share the output of the script.

But it shows that:

  • the transaction is holding mode 6 exclusive TX row locks on a number of objects
  • that the transaction is in PREPARED
  • and that there is no session attached to the transaction (although v$locked_object does tell us what the SID was when it was there)

Now, from the application perspective, something has apparently rolled back a message perhaps because now HornetQ has been bounced, everything is back up and running and it seems like the message that resulted in our orphaned transaction is being reattempted and is being blocked by the exclusive locks still being held.

From an ACTIVE SESSION HISTORY perspective, this is what we see from this script from which I’ve removed columns for brevity and privacy but left them in so you can see what I run normally:

select count(*) over (partition by h.sample_time) sess_cnt
--,      h.user_id
--,      (select username from dba_users u where u.user_id = h.user_id) u, h.service_hash
,      xid--, sample_id
, sample_time, session_state, session_id, session_serial#,--sql_id,
sql_exec_id, sql_exec_start, event, --p1,
mod(p1,16), blocking_session,blocking_session_serial#--, current_obj#
--,      (select object_name||' - '||subobject_name from dba_objects where object_id = current_obj#) obj
--,      (select sql_fulltext from v$sql s where s.sql_id = h.sql_id and rownum = 1) sqltxt
--,      (select sql_text from dba_hist_sqltext s where s.sql_id = h.sql_id and rownum = 1) sqltxt
--, h.*
from   v$active_session_history h
where event = 'enq: TX - row lock contention'
order by h.sample_id desc;
XID              SESSION_STATE SESSION_ID SESSION_SERIAL# EVENT                                                            MOD(P1,16) BLOCKING_SESSION BLOCKING_SESSION_SERIAL#
---------------- ------------- ---------- --------------- ---------------------------------------------------------------- ---------- ---------------- ------------------------
4F021700A3C00600 WAITING              232           53035 enq: TX - row lock contention                                             6
FC010B00196E0A00 WAITING              471            5205 enq: TX - row lock contention                                             6
670320004FA50300 WAITING             2652           11791 enq: TX - row lock contention                                             6
640204005BA40500 WAITING             4300           49665 enq: TX - row lock contention                                             6

So, you can see we have four sessions being blocked on exclusive mode 6 row level locks and that the blocking session is null. If I showed you the full history then you would see that these sessions have been repeatedly trying for many hours.

BTW, this is RAC but all these sessions are intentionally on the same instance so there’s none of that jiggery-pokery involved.

I imagine at some point there was an internal conversation in Oracle about whether to report blocking session or blocking transaction.

At this point, it’s just a normal lock held by a “normal” transaction which hasn’t committed yet and actually the application just keeps trying to run the transaction waits for 1 minute until they hit the default distributed transaction timeout:

ORA-02049: timeout: distributed transaction waiting for lock

which will be logged somewhere obscure in the application logs – and there’ll be a brief pause and then it starts all over again.

Anyway at this point what to do?

At this point, the transaction hasn’t timed out.

In a few hours, the transaction will time out and become an in-doubt transaction.

Once it does, the application will receive a message:

ORA-01591: lock held by in-doubt distributed transaction

At which time, it can be manually resolved.

Typically, this is what we usually do:

begin
for txn in (select local_tran_id from dba_2pc_pending)
loop
-- if txn is not already forced rollback
execute immediate 'rollback force '''||txn.local_tran_id||'''';
commit;
dbms_transaction.purge_lost_db_entry(txn.local_tran_id);
commit;
end loop;
end;
/

If we were going to intervene right now and resolve it, what could we do?

We could bounce the database. Then the application would become in-doubt and see above.

But bouncing the database is quite drastic.

A lot of the advice in the manual is about what to do once it is IN-DOUBT. Some of it might work now – not sure.

What I advise is that if we know we want to rollback or commit this transaction now and we don’t want to wait for it to become IN-DOUBT which often we can’t wait for then programatically we can attach to the transaction using DBMS_XA and do something with it.

First of all, we need some information about the transaction.
CAVEAT DBA!!! This isn’t exactly well documented but I have found that what we tend to need is to identify the transactions in V$GLOBAL_TRANSACTION which are in COLLECTING state:

select state
,      UTL_RAW.CAST_TO_BINARY_INTEGER (globalid)
,      UTL_RAW.CAST_TO_BINARY_INTEGER (branchid)
,      t.* 
from v$global_transaction t where state = 'COLLECTING';
STATE                                  UTL_RAW.CAST_TO_BINARY_INTEGER(GLOBALID) UTL_RAW.CAST_TO_BINARY_INTEGER(BRANCHID)   FORMATID GLOBALID                                                                                                                         BRANCHID                                                                                                                           BRANCHES   REFCOUNT PREPARECOUNT STATE                                       FLAGS COUPLING      
-------------------------------------- ---------------------------------------- ---------------------------------------- ---------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- ---------- ---------- ------------ -------------------------------------- ---------- ---------------
COLLECTING                                                                   49                                       45     131075 312D2D35363832376566363A393236643A35613562363664363A633738353036                                                                 2D35363832376566363A393236643A35613562363664363A633738353065                                                                              1          1            1 COLLECTING                                      1 TIGHTLY COUPLED 


Then we can replace the formatid, global id and branch id in the script below. Whether you require numbers or raw depends on the signature to DBMS_XA_XID – see documentation.

set serveroutput on
DECLARE
l_xid     DBMS_XA_XID :=
DBMS_XA_XID(131075,
'312D2D35363832376566363A393236643A35613562363664363A633738353036',
'2D35363832376566363A393236643A35613562363664363A633738353065');
l_return  PLS_INTEGER;
BEGIN
l_return := SYS.dbms_xa.xa_rollback(xid =>  l_xid);
dbms_output.put_line(l_return);
END;
/

This approach above comes with no guarantees.
But it has worked for me several times in the past.

INSERT ALL caveat

Why you might want to think twice about using INSERT ALL.

One of those things I knew and then forgot.

So, let’s say you’ve got three tables or a partitioned table or something like that.

Let’s use regional tables for simplicity.

drop table t1_r1;
drop table t1_r2;
drop table t1_r3;

create table t1_r1
(col1 varchar2(2) not null
,col2 number not null
,check( col1 in ('R1')));

create table t1_r2
(col1 varchar2(2) not null
,col2 number not null
,check( col1 in ('R2')));

create table t1_r3
(col1 varchar2(2) not null
,col2 number not null
,check( col1 in ('R3')));

insert into t1_r1 values ('R1',1);
insert into t1_r2 values ('R2',1);
insert into t1_r3 values ('R3',1);

commit;

And you want a routine that will insert into one of those tables depending on region.

And you’re a simple fellow, so you go with an IF statement:

create or replace procedure p1 (
  col1 in varchar2, 
  col2 in number
)
as
begin
  if col1 = 'R1'
  then
      insert into t1_r1 values(col1,col2);
  elsif col1 = 'R2'
  then
      insert into t1_r3 values(col1,col2);
  else 
      insert into t1_r3 values(col1,col2);
  end if;
end p1;
/

Procedure P1 compiled

And then in the same session you run this uncommitted:

exec p1('R1',2);

PL/SQL procedure successfully completed.

And then in another session you decide to truncate table T1_R3:

truncate table t1_r3;

Table T1_R3 truncated.

No problem.
None was expected.

However…

Let’s say that we decide to tidy up that procedure and get rid of some of the repetition by using an INSERT ALL statement.
I will use a standalone sql statement just to demonstrate a further minor aspect rather than using a procedure with a bound parameter.

insert all
when col1 = 'R1' then into t1_r1
when col1 = 'R2' then into t1_r2
when col1 = 'R3' then into t1_r3
select 'R1' col1,2 col2
from dual;

1 row inserted.

Let’s revisit the truncate:

truncate table t1_r3;

SQL Error: ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired
00054. 00000 -  "resource busy and acquire with NOWAIT specified or timeout expired"
*Cause:    Interested resource is busy.
*Action:   Retry if necessary or increase timeout.

TM share locks from the INSERT ALL on all three possible targets prevent the TRUNCATE.

So, a simple/simplisitic illustration of why you might want to think twice about whether INSERT ALL is the best feature for your use case, based on a real life problem.

Conditional uniqueness

A quick fly through the options for conditional uniqueness.

Requirement #1: I want uniqueness on a column but only under certain conditions.

For example, I have an active flag and I want to make sure there is only one active record for a particular attribute but there can be many inactive rows.

Initial setup:

create table t1
(col1      number       not null
,col2      varchar2(24) not null
,is_active number(1)    not null
,constraint pk_t1 primary key (col1)
,constraint ck_t1_is_active check (is_active in (1,0)));

Solution #1: A unique index on an expression which evaluates to null when the condition is not met.

create unique index i_t1 on t1 (case when is_active = 1 then col2 end);

unique index I_T1 created.

insert into t1 values(1,'SHAGGY',1);

1 rows inserted.

insert into t1 values(2,'SHAGGY',1);

SQL Error: ORA-00001: unique constraint (I_T1) violated
00001. 00000 -  "unique constraint (%s.%s) violated"
*Cause:    An UPDATE or INSERT statement attempted to insert a duplicate key.
           For Trusted Oracle configured in DBMS MAC mode, you may see
           this message if a duplicate entry exists at a different level.
*Action:   Either remove the unique restriction or do not insert the key.

Only one active SHAGGY allowed.
But multiple inactives allowed:

insert into t1 values(2,'SHAGGY',0);

1 rows inserted.

insert into t1 values(3,'SHAGGY',0);

1 rows inserted.

Solution #2: A virtual column with a unique constraint

drop index i_t1;

index I_T1 dropped.

alter table t1 add (vc_col2 varchar2(24) generated always as (case when is_active = 1 then col2 end));

table T1 altered.

alter table t1 add constraint uk_t1 unique (vc_col2);

table T1 altered.

Note that now we have a virtual column we have to be very aware of insert statements with no explicit column list:

insert into t1 values(4,'SCOOBY',1);

SQL Error: ORA-00947: not enough values
00947. 00000 -  "not enough values"

Unless we’re lucky enough to be on 12c and use the INVISIBLE syntax:

alter table t1 add (vc_col2 varchar2(24) invisible generated always as (case when is_active = 1 then col2 end));

But as this example is on 11.2.0.3:

insert into t1 (col1, col2, is_active) values(4,'SCOOBY',1);

1 rows inserted.

insert into t1 (col1, col2, is_active) values(5,'SCOOBY',1);

SQL Error: ORA-00001: unique constraint (UK_T1) violated
00001. 00000 -  "unique constraint (%s.%s) violated"
*Cause:    An UPDATE or INSERT statement attempted to insert a duplicate key.
           For Trusted Oracle configured in DBMS MAC mode, you may see
           this message if a duplicate entry exists at a different level.
*Action:   Either remove the unique restriction or do not insert the key.

insert into t1 (col1, col2, is_active) values(5,'SCOOBY',0);

1 rows inserted.

insert into t1 (col1, col2, is_active) values(6,'SCOOBY',0);

1 rows inserted.

Requirement #2: Sorry we forgot to tell you that we insert the new row first and the update the old one to be inactive so we need deferred constraint (hmmm!)

In which case, you can’t have deferred uniqueness on an index so the only option is the virtual column.

alter table t1 drop constraint uk_t1;

table T1 altered.

alter table t1 add constraint uk_t1 unique (vc_col2) deferrable initially deferred;

table T1 altered.

insert into t1 (col1, col2, is_active) values(7,'FRED',1);

1 rows inserted.

insert into t1 (col1, col2, is_active) values(8,'FRED',1);

1 rows inserted.

commit;

SQL Error: ORA-02091: transaction rolled back
ORA-00001: unique constraint (.UK_T1) violated
02091. 00000 -  "transaction rolled back"
*Cause:    Also see error 2092. If the transaction is aborted at a remote
           site then you will only see 2091; if aborted at host then you will
           see 2092 and 2091.
*Action:   Add rollback segment and retry the transaction.

insert into t1 (col1, col2, is_active) values(7,'FRED',1);

1 rows inserted.

insert into t1 (col1, col2, is_active) values(8,'FRED',1);

1 rows inserted.

update t1 set is_active = 0 where col1 = 7;

1 rows updated.

commit;

committed.

See previous post on similar approach for conditional foreign key

Failed Logins can cause row cache lock on dc_users

One “feature” to be aware of in old-fashioned Oracle 11g – took me a while to figure it out, so just in case anyone else gets hit by it…

I changed the password of a schema in a test database to stop anything connecting whilst that schema was being logically refreshed.

Following this change, this environment started crawling but, despite being heavily used, no-one was complaining apart from me.

I couldn’t connect, just seemed to be “hanging” and for any existing connections certain statements were very slow.

I could see lots of row cache locks, the p1 of which decoded to dc_users.

Spent a while trying to figure it out, and eventually asked for a couple of second opinions on at the environment – they had no problems connecting, could see the row cache locks and eventually bounced the database.

Nothing got any better.

Another strange thing was that when looking at the sessions via ASH, they were all SYS sessions but with a session module that could not have been SYS – i.e application connections, etc.

So, that pointed at recursive SQL.

select count(*) over (partition by sample_time) sess_cnt
,      (select username from dba_users u where u.user_id = h.user_id) u
,      sample_time, session_state, session_id, sql_id
,      module, event, p1, blocking_session
from   dba_hist_active_sess_history h
where  event = 'row cache lock'
order by sample_time desc;
SESS_CNT U     SAMPLE_TIME                   SESSION_STATE SESSION_ID MODULE                  EVENT             P1 BLOCKING_SESSION
-------- ----- ----------------------------- ------------- ---------- ----------------------- ----------------- -- ----------------
      19 SYS   11-JUL-13 10.19.15.935000000  WAITING             1828 sqlplus@xx (TNS V1-V3)  row cache lock     7                  
      19 SYS   11-JUL-13 10.19.15.935000000  WAITING             2172 JDBC Thin Client        row cache lock     7             1898 
      19 SYS   11-JUL-13 10.19.15.935000000  WAITING             1837 JDBC Thin Client        row cache lock     7             2144 
      19 SYS   11-JUL-13 10.19.15.935000000  WAITING             1848 JDBC Thin Client        row cache lock     7             1898 
      19 SYS   11-JUL-13 10.19.15.935000000  WAITING             1880 sqlplus@xx (TNS V1-V3)  row cache lock     7             2144 
      19 SYS   11-JUL-13 10.19.15.935000000  WAITING             1886 JDBC Thin Client        row cache lock     7             1909 
      19 SYS   11-JUL-13 10.19.15.935000000  WAITING             2012 JDBC Thin Client        row cache lock     7             2144 
      19 SYS   11-JUL-13 10.19.15.935000000  WAITING             2026 sqlplus@xx (TNS V1-V3)  row cache lock     7             1898 
      19 SYS   11-JUL-13 10.19.15.935000000  WAITING             2031 JDBC Thin Client        row cache lock     7             1898 
      19 SYS   11-JUL-13 10.19.15.935000000  WAITING             2041 JDBC Thin Client        row cache lock     7             2144 
      19 SYS   11-JUL-13 10.19.15.935000000  WAITING             2058 sqlplus@xx (TNS V1-V3)  row cache lock     7             1898 
      19 SYS   11-JUL-13 10.19.15.935000000  WAITING             2063 JDBC Thin Client        row cache lock     7             2144 
      19 SYS   11-JUL-13 10.19.15.935000000  WAITING             2064 SQL Developer           row cache lock     7             2144 
      19 SYS   11-JUL-13 10.19.15.935000000  WAITING             2073 sqlplus@xx (TNS V1-V3)  row cache lock     7             1898 
      19 SYS   11-JUL-13 10.19.15.935000000  WAITING             2084 sqlplus@xx (TNS V1-V3)  row cache lock     7                  
      19 SYS   11-JUL-13 10.19.15.935000000  WAITING             2105 sqlplus@xx (TNS V1-V3)  row cache lock     7             1898 
      19 SYS   11-JUL-13 10.19.15.935000000  WAITING             2123 sqlplus@xx (TNS V1-V3)  row cache lock     7             1898 
      19 SYS   11-JUL-13 10.19.15.935000000  WAITING             2144 JDBC Thin Client        row cache lock     7             2041 
      19 SYS   11-JUL-13 10.19.15.935000000  WAITING             1832 sqlplus@xx (TNS V1-V3)  row cache lock     7             2144 

 19 rows selected 
select trunc(sample_time), count(*), min(sample_time), max(sample_time)
from   dba_hist_active_Sess_history
where  event = 'row cache lock'
group by trunc(sample_time)
order by trunc(sample_time);
TRUNC(SAMPLE_TIME) COUNT(*) MIN(SAMPLE_TIME)              MAX(SAMPLE_TIME)            
------------------ -------- ----------------------------- -----------------------------
02-JUL-13                10 02-JUL-13 04.08.48.955000000  02-JUL-13 16.12.30.128000000  
03-JUL-13                 2 03-JUL-13 11.42.30.367000000  03-JUL-13 22.02.38.886000000  
04-JUL-13                 6 04-JUL-13 04.10.25.568000000  04-JUL-13 22.02.52.536000000  
05-JUL-13                16 05-JUL-13 04.09.00.444000000  05-JUL-13 22.01.13.067000000  
06-JUL-13                 2 06-JUL-13 01.06.31.261000000  06-JUL-13 14.07.15.208000000  
08-JUL-13                26 08-JUL-13 03.10.53.909000000  08-JUL-13 22.00.45.416000000  
09-JUL-13                19 09-JUL-13 04.06.55.191000000  09-JUL-13 20.23.16.801000000  
10-JUL-13             48293 10-JUL-13 04.00.41.732000000  10-JUL-13 19.00.04.750000000  
11-JUL-13              5412 11-JUL-13 09.20.40.833000000  11-JUL-13 10.41.49.610000000  

 9 rows selected 

Eventually the penny dropped …

See bug 7715339 – Logon failures causes “row cache lock” waits – Allow disable of logon delay [ID 7715339.8]

In 11g there is an intentional delay between allowing failed logon
attempts to retry. For some specific application types this can cause
a problem as the row cache entry is locked for the duration of the
delay . This can lead to excessive row cache lock waits for DC_USERS
for specific users / schemas.

After 3 successive failures a sleep delay is introduced starting
at 3 seconds and extending to 10 seconds max. During each delay
the user X row cache lock is held in exclusive mode preventing
any concurrent logon attempt as user X (and preventing any
other operation which would need the row cache lock for user X).

That explains a) the problem and b) why the impact was limited.

ORs, IN lists and LNNVL

I’ve previously written about manually rewriting an OR condition into a UNION ALL using LNNVL.

This is a description of a performance issue observed in the real world from the optimizer coming up with a CONCATENATION operation against many child operations including an INLIST operator and other children which then have to use LNNVL as a filter.

Really this is nothing more than an illustration of our old performance killing friend the function call from within SQL, particularly from within predicates, and the problem of context switching.

Ok. So, the real world example is a crazy query.
It contains 1091 OR clauses each stipulating a unique key lookup.
Yes, “1091” OR clauses.
I know, I know.
Probably itself a workaround to avoid the 1000 limit on an IN list, who knows?
Anyway.

I’m going to use a table just to illustrate the execution plan and then use some metrics from the execution of the real world example.

drop table t1;

create table t1
(col1 number
,col2 number
,col3 varchar2(10)
,constraint pk_t1 primary key (col1, col2));

insert into t1
select mod(rownum,10000)+1 col1
,      ceil(rownum/10000)  col2
,      rpad('X',10,'X')    col3
from   dual
connect by rownum <= 100000
order by col1, col2;

commit;

If we use a simple multi-column OR condition, we don’t get the desired LNNVLs.

explain plan for
select *
from   t1
where (col1 = 1
and    col2 = 1)
or    (col1 = 2
and    col2 = 2)
or    (col1 = 3
and    col2 = 3)
or    (col1 = 3
and    col2 = 4);

select * from table(dbms_xplan.display);

The optimizer just uses an INLIST iterator:

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |  1129 | 37257 |     5   (0)| 00:00:01 |
|   1 |  INLIST ITERATOR             |       |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |  1129 | 37257 |     5   (0)| 00:00:01 |
|*  3 |    INDEX UNIQUE SCAN         | PK_T1 |     7 |       |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - access(("COL1"=1 AND "COL2"=1 OR "COL1"=2 AND "COL2"=2 OR "COL1"=3 AND 
              "COL2"=3 OR "COL1"=3 AND "COL2"=4))

However, if we combine an IN list with some of those ORs:

explain plan for
select *
from   t1
where (col1 = 1
and    col2 = 1)
or    (col1 = 2
and    col2 = 2)
or    (col1 = 3
and    col2 IN (3,4));

select * from table(dbms_xplan.display);

Which gives

---------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |  1140 | 37620 |     2   (0)| 00:00:01 |
|   1 |  CONCATENATION                |       |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID | T1    |    11 |   363 |     1   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN           | PK_T1 |     1 |       |     2   (0)| 00:00:01 |
|   4 |   INLIST ITERATOR             |       |       |       |            |          |
|   5 |    TABLE ACCESS BY INDEX ROWID| T1    |  1129 | 37257 |     1   (0)| 00:00:01 |
|*  6 |     INDEX UNIQUE SCAN         | PK_T1 |     1 |       |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - access("COL1"=3)
       filter("COL2"=3 OR "COL2"=4)
   6 - access(("COL1"=1 AND "COL2"=1 OR "COL1"=2 AND "COL2"=2))
       filter(LNNVL("COL1"=3) OR LNNVL("COL2"=3) AND LNNVL("COL2"=4))

We get the CONCATENATION of the simpler OR conditions which are combined into a single INLIST ITERATOR operations with another child operation which applies LNNVLs in the filter predicate.

Now imagine we have a much larger number of the simpler OR predicates combined with liberal scatterings of such IN clauses throughout our 1091 OR’d predicates.

Perhaps we’d be a bit concerned about the context switching and increased cpu usage?

Looking at my real world example right here… there’s no point me pasting in the thousands of lines.

If I run the real world behemoth:

Client elapsed time 108 seconds

DBMS_XPLAN.DISPLAY_CURSOR tells me it ran in 1.26 seconds:

plan hash value: 4262066066  
 
-------------------------------------------------------------------------------------------------------------------- 
| Id  | Operation                     | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | 
--------------------------------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT              |                 |      1 |        |   1628 |00:00:01.26 |    3352 |    418 | 
|   1 |  CONCATENATION                |                 |      1 |        |   1628 |00:00:01.26 |    3352 |    418 |  
|   2 |   INLIST ITERATOR             |                 |      1 |        |    966 |00:00:00.94 |    2735 |    389 | 
|   3 |    TABLE ACCESS BY INDEX ROWID| XXXXXXXXXX      |    966 |    624 |    966 |00:00:00.94 |    2735 |    389 | 
|*  4 |     INDEX RANGE SCAN          | XXXXXXXXXX_UK01 |    966 |    624 |    966 |00:00:00.86 |    1915 |    380 | 
|   5 |   TABLE ACCESS BY INDEX ROWID | XXXXXXXXXX      |      1 |      1 |      2 |00:00:00.01 |       4 |      0 |  
|*  6 |    INDEX RANGE SCAN           | XXXXXXXXXX_UK01 |      1 |      1 |      2 |00:00:00.01 |       3 |      0 |  
....
| 253 |   TABLE ACCESS BY INDEX ROWID | XXXXXXXXXX      |      1 |      2 |     47 |00:00:00.01 |      14 |      1 |  
|*254 |    INDEX RANGE SCAN           | XXXXXXXXXX_UK01 |      1 |      2 |     47 |00:00:00.01 |       3 |      1 |   
--------------------------------------------------------------------------------------------------------------------  

Predicate Information (identified by operation id):
--------------------------------------------------- 
   4 - access(((("XXXXXXXXXX"."VERSION"=1 AND "XXXXXXXXXX"."ID_XXXXXXXXXX"=1611722) OR             
              ("XXXXXXXXXX"."VERSION"=1 AND "XXXXXXXXXX"."ID_XXXXXXXXXX"=1611795) OR ("XXXXXXXXXX"."VERSION"=1 AND                                       
              "XXXXXXXXXX"."ID_XXXXXXXXXX"=1611863) OR ("XXXXXXXXXX"."VERSION"=1 AND
              "XXXXXXXXXX"."ID_XXXXXXXXXX"=1612023) OR ("XXXXXXXXXX"."VERSION"=1 AND 			  .....
  6 - access "XXXXXXXXXX"."ID_XXXXXXXXXX"=2046939) 
       filter(((LNNVL("XXXXXXXXXX"."VERSION"=1) OR LNNVL("XXXXXXXXXX"."ID_XXXXXXXXXX"=1611722)) AND 
           (LNNVL("XXXXXXXXXX"."VERSION"=1) OR LNNVL("XXXXXXXXXX"."ID_XXXXXXXXXX"=1611795)) AND 
              (LNNVL("XXXXXXXXXX"."VERSION"=1) OR LNNVL("XXXXXXXXXX"."ID_XXXXXXXXXX"=1611863)) AND 
.....

DBMS_MONITOR tells me the query ran in 1.68 seconds:

Global Information
 Status              :  DONE (ALL ROWS)     
 Instance ID         :  2                   
 Session ID          :  1220                
 SQL ID              :  5fvt3tfbgmqp3       
 SQL Execution ID    :  33554432            
 Plan Hash Value     :  4262066066          
 Execution Started   :  04/05/2013 14:53:42 
 First Refresh Time  :  04/05/2013 14:53:42 
 Last Refresh Time   :  04/05/2013 14:53:45 

--------------------------------------------------------------------
| Elapsed |   Cpu   |    IO    | Cluster  | Fetch | Buffer | Reads |
| Time(s) | Time(s) | Waits(s) | Waits(s) | Calls |  Gets  |       |
--------------------------------------------------------------------
|    1.68 |    0.50 |     1.13 |     0.05 |     4 |   3352 |   418 |
--------------------------------------------------------------------

Fortunately V$SQL tells me the real story:

select sql_id, executions, rows_processed, elapsed_time/1000/1000, cpu_time/1000/1000 
from v$sql where sql_id = '5fvt3tfbgmqp3';

Gives

SQL_ID        EXECUTIONS ROWS_PROCESSED ELAPSED_TIME/1000/1000 CPU_TIME/1000/1000
------------- ---------- -------------- ---------------------- ------------------
5fvt3tfbgmqp3          1           1628              105.81511         104.581102

Then again, if you write nasty code, nasty things can happen.

Fun with distributed transactions

Three items within:
Part 1 – Subquery materialisation not possible (fixed in 12c)
Part 2 – Materialised views
Part 3 – Lock scripts

Part 1 – Subquery materialisation not possible
Yesterday I was reminder of a familiar issue, officially listed as a bug, that can occur when combining subquery materialisation with distributed transactions.

I had previously tuned a query for a colleague, the best approach for which was using implicit materialised subquery factoring (implicit because it wasn’t hinted as /*+ materalize */ but was referenced multiple times in the main query) and I had provided such a solution, without even a thought to the performance threat that the query might be running in an XA transaction.

As a result, unsurprisingly with the benefit of hindsight, the query was not performing as expected from the application in said XA transaction.

Let me provide a refresher on the issue.

create table t1
as
select rownum col1
from   dual
connect by rownum <= 100000;

alter session set statistics_level = all;

with x as  
(select count(*) col1  
 from   t1)  
select x1.*, x2.*
from x x1, x x2;  

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

provides

PLAN_TABLE_OUTPUT 
---------------------------------------------------------------------------------------------
SQL_ID  aqsc5m0wtj7wn, child number 0
-------------------------------------                              
with x as   (select count(*) col1    from   t1)   select x1.*, x2.*
from x x1, x x2

Plan hash value: 1115620710 

---------------------------------------------------------------------------------------------
| Id  | Operation                  | Name                        | Starts | E-Rows | A-Rows |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                             |      1 |        |      1 |
|   1 |  TEMP TABLE TRANSFORMATION |                             |      1 |        |      1 |
|   2 |   LOAD AS SELECT           |                             |      1 |        |      1 |
|   3 |    SORT AGGREGATE          |                             |      1 |      1 |      1 |
|   4 |     TABLE ACCESS FULL      | T1                          |      1 |    122K|    100K|
|   5 |   MERGE JOIN CARTESIAN     |                             |      1 |      1 |      1 |
|   6 |    VIEW                    |                             |      1 |      1 |      1 |
|   7 |     TABLE ACCESS FULL      | SYS_TEMP_0FD9D6717_BCA95716 |      1 |      1 |      1 |
|   8 |    BUFFER SORT             |                             |      1 |      1 |      1 |
|   9 |     VIEW                   |                             |      1 |      1 |      1 |
|  10 |      TABLE ACCESS FULL     | SYS_TEMP_0FD9D6717_BCA95716 |      1 |      1 |      1 |
---------------------------------------------------------------------------------------------

Note
-----  
   - dynamic sampling used for this statement
                  

 27 rows selected 

As you might expect, subquery materialisation using a temp table.

But if you execute the query within a distributed transaction, it will probably not use a temporary table (see original post above about why I say probably).

I’m using a simple insert script over a db link to start the distributed transaction but in my real world occurrence it was the JDBC XA driver.

drop table t2;

create table t2
(col1 number);

alter system flush shared_pool;

select count(*) from gv$sql where sql_text like 'with x%';

alter session set statistics_level = all;

insert into t2@mydba values(1);

with x as  
(select count(*) col1  
 from   t1)  
select x1.*, x2.*
from x x1, x x2;  

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

rollback;

Which gives:

PLAN_TABLE_OUTPUT  
---------------------------------------------------------------------------------------
SQL_ID  aqsc5m0wtj7wn, child number 0
-------------------------------------
with x as   (select count(*) col1    from   t1)   select x1.*, x2.*
from x x1, x x2

Plan hash value: 3433669518

---------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |      1 |00:00:00.08 |     314 |
|   1 |  NESTED LOOPS        |      |      1 |      1 |      1 |00:00:00.08 |     314 |
|   2 |   VIEW               |      |      1 |      1 |      1 |00:00:00.04 |     157 |
|   3 |    SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.04 |     157 |
|   4 |     TABLE ACCESS FULL| T1   |      1 |    122K|    100K|00:00:00.01 |     157 |
|   5 |   VIEW               |      |      1 |      1 |      1 |00:00:00.04 |     157 |
|   6 |    SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.04 |     157 |
|   7 |     TABLE ACCESS FULL| T1   |      1 |    122K|    100K|00:00:00.01 |     157 |
--------------------------------------------------------------------------------------- 

Note
-----
   - dynamic sampling used for this statement


 24 rows selected 

As mentioned, the distributed transaction bypasses materialisation which could have a very negative effect on the performance of a particular query where that materialisation was key.

Part II – Materialized views.

This trouble with distributed transactions reminds me of a particular bugbear with materialised views.

This restriction is listed here in the documentation
http://docs.oracle.com/cd/E11882_01/server.112/e10706/repmview.htm

I’m surprised that a bigger deal is not made of it, given the proliferation of XA drivers and distributed transactions in many enterprise applications.

This is demonstrated quite easily, again using a db link to do the distributed transaction.

drop table t2;

create table t2
(col1 number
,constraint pk_t2 primary key(col1));

create materialized view log on t2;

create materialized view mv2
build immediate
refresh fast on commit
as
select * from t2;

insert into t2@mydba values (1);

commit;

Which, on commit, gives:

SQL Error: ORA-02050: transaction 26.8.810414 rolled back, some remote DBs may be in-doubt
ORA-02050: transaction 26.8.810414 rolled back, some remote DBs may be in-doubt
ORA-02051: another session in same transaction failed
ORA-02063: preceding 2 lines from MYDBA
02050. 00000 -  "transaction %s rolled back, some remote DBs may be in-doubt"
*Cause:    network or remote failure in 2PC.
*Action:   Notify operations; remote DBs will automatically re-sync when the
           failure is repaired.

I keep meaning to double check that the same applies when using XA driver and Java but there’s no reason to think it wouldn’t be the same.

Tom Kyte has mentioned this several times on AskTom, “recently” here.

I’m not quite sure I get his comment about OLTP databases and distributed transactions but, not for the first time, I’ve found myself wanting to use MVs to mitigate certain performance problems and have run into this restriction

Part III – Locking scripts
There are a fair number of scripts out there on t’internet for showing / diagnosing locking.

During my special distributed transaction day yesterday, I was reminded that anything lock related that starts with V$SESSION or depends on / makes assumptions about SID/SESSION_ID can be misleading.

I initially thought something a little unusual was up because neither V$SESSION or ASH could tell me who was blocking a bunch of requested mode 6 TX locks. But I thought some of the blocking diagnosis was missing because of cross-node RAC complications.

But it actually was because there were transactions holding locks but no sessions.

Admittedly, this was a slightly strange situation, but essentially what happened was that an XA JDBC transaction that was also mixing with ActiveMQ ended up causing what I suppose would essentially be orphaned transactions, and locks held by transactions that had no associated sessions.

This may be an “exotic” situation but V$LOCKED_OBJECT, for example, exposes SESSION_ID but no SESSION_SERIAL# so itself can be very misleading if you go off joining willy nilly by SESSION_ID because SIDs get reused quickly.

Similarly, DBA_BLOCKERS and DBA_WAITERS both showed zilch.

A more reliable starting point was to ignore V$LOCKED_OBJECT.SESSION_ID and pay more attention to the transction information, joining to V$TRANSACTION and then outer joining to V$SESSION, like so:

select lo.inst_id
,      lo.object_id
,      lo.session_id
,      lo.os_user_name
,      lo.process
,      lo.locked_mode
,      ob.owner
,      ob.object_name
,      tx.addr
,      tx.start_time txn_start_time
,      tx.status
,      s.*
from   gv$locked_object lo
,      dba_objects      ob
,      gv$transaction    tx
,      gv$session        s
where  ob.object_id = lo.object_id
and    tx.xidusn    (+) = lo.xidusn
and    tx.xidslot   (+) = lo.xidslot
and    tx.xidsqn    (+) = lo.xidsqn
and    s.taddr      (+) = tx.addr; 

Which showed me, as mentioned, a whole bunch of locks in these orphaned, PREPARED distributed transactions dating back a few hours.

Modelling a “simple” ITL problem

Today I have been looking at an application problem.

Last night important client processing missed its SLA with the application reporting ORA-02049: timeout: distributed timeout waiting for lock.

This post isn’t about the distributed lock timing out.

But consider the evidence that further investigation into these distributed locks threw up.

  1. The parameter distributed_lock_timeout is set to the default of 60 (seconds) – this means that any distributed transaction that waits for an enqueue for 60 seconds will be picked off by a sniper, similar to the standard deadlock detection after 3 seconds.
  2. The 1-in-10 second ASH samples in DBA_HIST_ACTIVE_SESS_HISTORY show clear waits on “enq: TX – allocate ITL entry”
  3. These same ASH samples show one particular sql id as being involved in the ITL waits.
  4. This particular sql id is a DELETE on a single table, no triggers, no on delete cascade referential integrity.

Sounds easy right?

But…

  1. V$SEGMENT_STATISTICS and DBA_HIST_SEG_STAT show 0 ITL waits for the segment/s related to the DELETE sql id above.
  2. In the ASH data, for the event “enq: TX – allocate ITL entry”, the CURRENT_OBJ# column indicates a variety of objects that might show ITL waits under concurrent DML but, again, all unrelated to the sql id identified by ASH.

So…

Time for some theories:

  1. ASH could be misleading us about the sql ids and/or
  2. ASH could be misleading about current_obj# and/or
  3. If our enqueue times out because of the distributed lock timeout, then perhaps the ITL wait is not recorded – i.e. segment stats are misleading.

Theories about ASH being misleading under certain circumstances are incredibly convoluted to test but, as far as I am concerned, even up to the latest 11gR2 this is a known known. There are certain values that are not cleared out or not recorded properly or disappear under certain conditions.

But perhaps we can test theory 3 relatively easily?

I like recycling – I’m going to reuse Randolf Geist’s ITL test case here.

SQL> get itl_case.sql
  1  create table t1 (c1 varchar2(1336)) pctfree 10;
  2  insert into t1 values ('a');
  3  insert into t1 values ('b');
  4  insert into t1 values ('c');
  5  insert into t1 values ('d');
  6  insert into t1 values ('e');
  7  insert into t1 values ('f');
  8  commit;
  9  update t1 set c1 = rpad(c1, 1335, c1) where length(c1) = 1;
 10* commit;
 11  .
SQL> @itl_case

Table created.
1 row created.
1 row created.
1 row created.
1 row created.
1 row created.
1 row created.
Commit complete.
6 rows updated.
Commit complete.

SQL>

Now I’m going to delete some rows across three sessions.
The first two session simple uncommitted deletes. The third will do a select over a db link in order to start the distributed transaction followed by the same simple delete.

First session – delete some rows and leave uncommitted so the ITL cannot be cleared:

SQL_SESS1>select distinct sid from v$mystat;

       SID
----------
      2056

SQL_SESS1>delete from t1 where c1 like 'a%';

1 row deleted.

SQL_SESS1>

Second session – delete some rows and leave uncommitted so the ITL cannot be cleared:

SQL_SESS2>select distinct sid from v$mystat;

       SID
----------
      2126

1 row selected.

SQL_SESS2>select to_char(sysdate,'DD-MON-YYYY HH24:MI') from dual;

TO_CHAR(SYSDATE,'
-----------------
01-NOV-2012 23:32

1 row selected.

SQL_SESS2>delete from t1 where c1 like 'b%';

1 row deleted.

SQL_SESS2>

Finally – third session – try to delete some rows within a distributed transaction:

SQL_SESS3>select distinct sid from v$mystat;

       SID
----------
      1884

1 row selected.

SQL_SESS3>select 1 from dual@mydba;

         1
----------
         1

1 row selected.

SQL_SESS3>delete from t1 where c1 like 'c%';
delete from t1 where c1 like 'c%'
*
ERROR at line 1:
ORA-02049: timeout: distributed transaction waiting for lock


SQL_SESS3>

Right, so what does ASH tell us about what happened?

SQL_SESS2>l
  1  select to_char(sample_time,'DD-MON-YYYY HH24:MI:SS'), sql_id, event, blocking_session, current_obj#
  2  from   v$active_Session_history
  3  where  event = 'enq: TX - allocate ITL entry'
  4  and    sample_time >= to_date('01-NOV-2012 23:32','DD-MON-YYYY HH24:MI')
  5* order  by sample_time desc
SQL_SESS2>/

TO_CHAR(SAMPLE_TIME, SQL_ID        EVENT                          BLOCKING_SESSION CURRENT_OBJ#
-------------------- ------------- ------------------------------ ---------------- ------------
01-NOV-2012 23:35:14 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:35:13 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:35:12 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:35:11 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:35:10 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:35:09 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:35:08 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:35:07 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:35:06 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:35:05 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:35:04 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:35:03 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:35:02 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:35:01 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:35:00 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:59 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:58 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:57 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:56 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:55 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:54 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:53 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:52 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:51 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:50 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:49 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:48 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:47 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:46 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:45 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:44 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:43 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:42 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:41 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:40 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:39 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:38 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:37 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:36 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:35 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:34 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:33 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:32 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:31 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:30 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:29 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:28 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:27 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:26 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:25 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:24 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:23 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:22 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:21 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:20 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:19 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:18 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:17 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:16 430xqxrrucgc5 enq: TX - allocate ITL entry               2126           -1
01-NOV-2012 23:34:15 430xqxrrucgc5 enq: TX - allocate ITL entry               2056           -1
01-NOV-2012 23:34:14 430xqxrrucgc5 enq: TX - allocate ITL entry               2056           -1
01-NOV-2012 23:34:13 430xqxrrucgc5 enq: TX - allocate ITL entry               2056           -1
01-NOV-2012 23:34:12 430xqxrrucgc5 enq: TX - allocate ITL entry               2056           -1
01-NOV-2012 23:34:11 430xqxrrucgc5 enq: TX - allocate ITL entry               2056           -1

71 rows selected.

Elapsed: 00:00:00.04
SQL_SESS2>

So, we are reporting wait event “enq: TX – allocate ITL entry” which was the aim.

And look at column current_obj# – seems perhaps that it is not set properly for this wait event (even though my rows in my real world example were not -1)

Sql id 430xqxrrucgc5 is my delete statement so that is bang on the money.

Finally, what do segment statistics say about ITL waits for this object t1?

SQL_SESS3>l
  1  select statistic_name, value
  2  from   v$segment_statistics
  3  where  object_name = 'T1'
  4* and    statistic_name = 'ITL waits'
SQL_SESS3>/

STATISTIC_NAME                                                        VALUE
---------------------------------------------------------------- ----------
ITL waits                                                                 0

1 row selected.

SQL_SESS3>

Because the statement was sniped by the distributed transaction timeout, the segment ITL statistics are not incremented and so these (and AWRs snaps thereof) miss the significance.

A good point made by Coskan is that I did not include the blocking session information in my ASH output.

Simply this is because it made the output too wide.

One of the sessions is reported as blocker but in reality there may be many blockers – it’s as many as there are ITL slots.

Also, depending on how many sessions are waiting, you may find that your blocked statement is unlucky and gets superceded by other sessions who get the ITL just through good timing of when they tried to get the enqueue. This is what was happening in my particular situation where an unlucky session was blocked for 1 minute – up to the distributed lock timeout – by a variety of blockers who at other times in that minute were busy doing other stuff not all waiting on this enqueue.

Plan_hash_value and internal temporary table names

Here’s a little thing about plan_hash_2 that’s come via a pointer from Jonathan Lewis to bug 10162430.

From querying DBA_HIST_SQLSTAT I happened to notice that over a period of 65 days a particular query had 63 different plans (PLAN_HASH_VALUE).

I found that the differences were down done to the different internal temporary table names used for a materialized subquery, i.e. the plans were essentially the same but for the name of SYS_TEMP_0FD9D6786_B6EF87D2, SYS_TEMP_0FD9D6825_BE8671F, SYS_TEMP_0FD9D6684_EAD00827, etc.

Exactly what bug 10162430 addresses is unclear because it is listed as fixed in 11.2.0.3 and my different plan hashes come from an 11.2.0.3 database.

But the key information in the bug description highlighted that PLAN_HASH_2 is not affected by these differing temp table names.

Exactly what plan_hash_2 is and how it differs from plan_hash_value is undocumented but:

  • it is found in V$SQL_PLAN.OTHER_XML;
  • also in the AWR repository DBA_HIST_SQL_PLAN.OTHER_XML;
  • and is used as the enforcing plan hash in sql plan baselines – not exposed in DBA_SQL_PLAN_BASELINES but internally represented in column PLAN_ID in SYS.SQLOBJ$, SYS.SQLOBJ$DATA, SYS.SQLOBJ$AUXDATA.

So, if you regularly look at DBA_HIST_SQLSTAT to look at historic execution statistics of a SQL statement and your queries use materialized subqueries then I recommend adding PLAN_HASH_2 to the output, something like this:

select sn.snap_id
,      sn.end_interval_time
,      st.module
,      st.sql_id
,      st.plan_hash_value
,      (select extractvalue(xmltype(other_xml),'other_xml/info[@type="plan_hash_2"]')
        from   dba_hist_sql_plan hp
        where  hp.sql_id          = st.sql_id
        and    hp.plan_hash_value = st.plan_hash_value
        and    hp.other_xml is not null) plan_hash_2
,      rows_processed_delta rws
,      executions_delta     execs
,      elapsed_time_delta   elp
,      cpu_time_delta       cpu
,      buffer_gets_delta    gets
,      iowait_delta         io
from   dba_hist_snapshot sn
,      dba_hist_sqlstat  st
where  st.snap_id            = sn.snap_id
and    st.sql_id             = '&sql_id'
and    st.elapsed_time_delta > 0
order by sn.snap_id desc; 

And just to show this difference between PHV and PH2:

   SNAP_ID END_INTERVAL_TIME         PLAN_HASH_VALUE PLAN_HASH_2
---------- ------------------------- --------------- ------------
      9131 12/07/2012 00:00               1987570133 3399935153
      9083 11/07/2012 00:00               1567016512 3399935153
      9035 10/07/2012 00:00               3386529786 3399935153
      8891 07/07/2012 00:00               2197008736 3399935153
      8842 05/07/2012 23:30                991904690 3399935153
      8794 04/07/2012 23:30               1331394991 3399935153
      8746 03/07/2012 23:30               2577119670 3399935153
      8699 03/07/2012 00:00               3774955286 3399935153
      8698 02/07/2012 23:30               3774955286 3399935153
      8555 30/06/2012 00:00               3308813978 3399935153
      8507 29/06/2012 00:00                796849390 3399935153
      8459 28/06/2012 00:00                917735929 3399935153
      8410 26/06/2012 23:30                139934889 3399935153
      8363 26/06/2012 00:00               1936665492 3399935153
      8219 23/06/2012 00:00                666306815 3399935153
      8171 22/06/2012 00:00               1053584101 3399935153
      8123 21/06/2012 00:00               1353471101 3399935153
...

Virtual Column

Stated requirement from the forum:
To have a foreign key to a parent table only when the status in the child table is a certain value whilst allowing for rubbish and NULLs in the child’s fk column when the status is not ACTIVE.

We can do this with a virtual column.

Here’s how.

Setup:

SQL> create table t1
  2  (col1 number primary key);

Table created.

SQL> insert into t1 values (1);

1 row created.

SQL> create table t2
  2  (col1    number primary key
  3  ,status  varchar2(12) not null
  4  ,col2    number
  5  ,col2v   number generated always as (case when status = 'ACTIVE' then col2 end) VIRTUAL
  6  ,constraint t2_fk foreign key (col2v) references t1(col1)
  7  ,constraint t2_ck check (status IN ('ACTIVE','INACTIVE') and (status = 'INACTIVE' or col2 IS NOT NULL)));

Table created.

SQL> create sequence s1;

Sequence created.

SQL> 

First requirement – insert an active row into T2 with a valid FK in col2:

SQL> insert into t2 (col1, status, col2) values (s1.nextval,'ACTIVE',1);

1 row created.

SQL> 

Second requirement – ensure that active rows have a valid foreign key:

SQL> insert into t2 (col1, status, col2) values (s1.nextval,'ACTIVE',-99);
insert into t2 (col1, status, col2) values (s1.nextval,'ACTIVE',-99)
*
ERROR at line 1:
ORA-02291: integrity constraint (RIMS.T2_FK) violated - parent key not found

SQL> 

Third requirement – allow inactive rows to have rubbish in col2 (for historic data quality reasons?):

SQL> insert into t2 (col1, status, col2) values (s1.nextval,'INACTIVE',-99);

1 row created.

SQL> 

Fourth requirement – prevent NULL foreign key values for active rows:

SQL> insert into t2 (col1, status, col2) values (s1.nextval,'ACTIVE',NULL);
insert into t2 (col1, status, col2) values (s1.nextval,'ACTIVE',NULL)
*
ERROR at line 1:
ORA-02290: check constraint (RIMS.T2_CK) violated

SQL> 

Just some minor tweaks on the requirements which came up in this forum question…

Firstly, let’s say it’s not just statuses of ACTIVE/INACTIVE. We want to ensure that data quality for ACTIVE is protected but there’s a whole bunch of other statuses not just INACTIVE.

So, change the check constraint:

SQL>  alter table t2 drop constraint t2_ck;

Table altered.

SQL> alter table t2 add constraint t2_ck check (NOT (status = 'ACTIVE' and col2 IS NULL));

Table altered.

SQL> insert into t2 (col1, status, col2) values (s1.nextval,'WHO CARES',-75);

1 row created.

SQL> 

Or what if active statuses can insert a valid foreign key or NULL?

Just lose the check constraint altogether:

SQL>  alter table t2 drop constraint t2_ck;

Table altered.

SQL> insert into t2 (col1, status, col2) values (s1.nextval,'ACTIVE',NULL);

1 row created.

SQL> insert into t2 (col1, status, col2) values (s1.nextval,'WHO KNOWS',-75);

1 row created.

SQL> 
SQL> insert into t2 (col1, status, col2) values (s1.nextval,'INACTIVE',-99);

1 row created.

SQL> 
SQL> insert into t2 (col1, status, col2) values (s1.nextval,'ACTIVE',-99);
insert into t2 (col1, status, col2) values (s1.nextval,'ACTIVE',-99)
*
ERROR at line 1:
ORA-02291: integrity constraint (RIMS.T2_FK) violated - parent key not found


SQL> 
SQL> insert into t2 (col1, status, col2) values (s1.nextval,'ACTIVE',1);

1 row created.

SQL> 

Regexp hungry for CPU? Real time sql monitoring shows just how

Not exactly a snappy blog post title…

First up, an execution plan showing the old problem of how the costs of a scalar subquery are not properly accounted for, even in the latest 11gR2 releases.
Read more of this post