Blocking Prepared XA Transaction

There was an oracle-l thread last month about blocking sessions which could not be identified.

I replied back about a very niche scenario which I’d seen a few times before. And it’s just happened again so I thought it would be useful to just post some evidence on this.

Here are a couple of articles posted previously on XA transactions:

First, a reminder, from the posts above, that a transaction doesn’t have to have a session.

Normal “vanilla” sessions, there’s a 1:1 relationship. But with an XA transaction, a session can attach and detach – but only one session can be attached to a transaction at any one time.

And a bit of context about these XA transactions.

This is a complex Java application with multiple resources participating in the XA transaction. There was HornetQ involved and HornetQ was apparently timing out and was bounced, the taking down being down probably with a kill -9.

And an XA transaction does what is called a 2-phase commit where the first phase involves the transaction co-ordinator instructing the participating resources to prepare to commit and subsequently the second phase is to commit.

What I have seen many times on this application is that something happens to the transaction co-ordinator in between the PREPARE and the COMMIT and we are left we an orphaned transaction.

This transaction will eventually time-out – depending on the properties set on the transaction by the co-ordinator – at which point it will become an in-doubt transaction and be visible in dba_2pc_pending.

Back to reality… there is nothing in DBA_2PC_PENDING:

select * from dba_2pc_pending;

no rows selected

But we can see there is a transaction in prepared state in V$TRANSACTION, a transaction which started yesterday afternoon:

select * from v$transaction where status = 'PREPARED';
ADDR                 XIDUSN    XIDSLOT     XIDSQN     UBAFIL     UBABLK     UBASQN     UBAREC STATUS           START_TIME           START_SCNB START_SCNW START_UEXT START_UBAFIL START_UBABLK START_UBASQN START_UBAREC SES_ADDR               FLAG SPACE RECURSIVE NOUNDO PTX NAME                                                                                                                                                                                                                                                             PRV_XIDUSN PRV_XIDSLT PRV_XIDSQN PTX_XIDUSN PTX_XIDSLT PTX_XIDSQN     DSCN-B     DSCN-W  USED_UBLK  USED_UREC     LOG_IO     PHY_IO     CR_GET  CR_CHANGE START_DATE            DSCN_BASE  DSCN_WRAP  START_SCN DEPENDENT_SCN XID              PRV_XID          PTX_XID        
---------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------------- -------------------- ---------- ---------- ---------- ------------ ------------ ------------ ------------ ---------------- ---------- ----- --------- ------ --- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- -------------------- ---------- ---------- ---------- ------------- ---------------- ---------------- ----------------
0000004A833D6868        691         32     315541         22    1085802     -28624         18 PREPARED         01/19/18 15:39:45    3454176189       2913         13           22      1085802       -28624            9 0000004B82E584A0    4201987 NO    NO        NO     NO                                                                                                                                                                                                                                                                            0          0          0          0          0          0          0          0          1         12     -40287       -163  -66597824      -1385 19-JAN-2018 15:39:45          0          0    1.3E+13             0 B302200095D00400 0000000000000000 0000000000000000 

And this is the script which I mentioned on the oracle-l thread which is one I use to see what locks transaction are holding, when the transaction started and what sessions are attached:

select s.machine
,      lo.inst_id
,      lo.object_id
,      lo.session_id
,      lo.os_user_name
,      lo.process
,      ob.owner
,      ob.object_name
,      ob.subobject_name
,      tx.addr
,      tx.start_time txn_start_time
,      tx.status
,      tx.xid
,      s.*
from   gv$locked_object lo
,      dba_objects      ob
,      gv$transaction    tx
,      gv$session        s
where  ob.object_id = lo.object_id
and    tx.xidusn    (+) = lo.xidusn
and    tx.xidslot   (+) = lo.xidslot
and    tx.xidsqn    (+) = lo.xidsqn
and    s.taddr      (+) = tx.addr
order by txn_start_time, session_id, object_name;

For privacy reasons and as this is a real-world situation and not an isolated test case, I won’t share the output of the script.

But it shows that:

  • the transaction is holding mode 6 exclusive TX row locks on a number of objects
  • that the transaction is in PREPARED
  • and that there is no session attached to the transaction (although v$locked_object does tell us what the SID was when it was there)

Now, from the application perspective, something has apparently rolled back a message perhaps because now HornetQ has been bounced, everything is back up and running and it seems like the message that resulted in our orphaned transaction is being reattempted and is being blocked by the exclusive locks still being held.

From an ACTIVE SESSION HISTORY perspective, this is what we see from this script from which I’ve removed columns for brevity and privacy but left them in so you can see what I run normally:

select count(*) over (partition by h.sample_time) sess_cnt
--,      h.user_id
--,      (select username from dba_users u where u.user_id = h.user_id) u, h.service_hash
,      xid--, sample_id
, sample_time, session_state, session_id, session_serial#,--sql_id,
sql_exec_id, sql_exec_start, event, --p1,
mod(p1,16), blocking_session,blocking_session_serial#--, current_obj#
--,      (select object_name||' - '||subobject_name from dba_objects where object_id = current_obj#) obj
--,      (select sql_fulltext from v$sql s where s.sql_id = h.sql_id and rownum = 1) sqltxt
--,      (select sql_text from dba_hist_sqltext s where s.sql_id = h.sql_id and rownum = 1) sqltxt
--, h.*
from   v$active_session_history h
where event = 'enq: TX - row lock contention'
order by h.sample_id desc;
XID              SESSION_STATE SESSION_ID SESSION_SERIAL# EVENT                                                            MOD(P1,16) BLOCKING_SESSION BLOCKING_SESSION_SERIAL#
---------------- ------------- ---------- --------------- ---------------------------------------------------------------- ---------- ---------------- ------------------------
4F021700A3C00600 WAITING              232           53035 enq: TX - row lock contention                                             6
FC010B00196E0A00 WAITING              471            5205 enq: TX - row lock contention                                             6
670320004FA50300 WAITING             2652           11791 enq: TX - row lock contention                                             6
640204005BA40500 WAITING             4300           49665 enq: TX - row lock contention                                             6

So, you can see we have four sessions being blocked on exclusive mode 6 row level locks and that the blocking session is null. If I showed you the full history then you would see that these sessions have been repeatedly trying for many hours.

BTW, this is RAC but all these sessions are intentionally on the same instance so there’s none of that jiggery-pokery involved.

I imagine at some point there was an internal conversation in Oracle about whether to report blocking session or blocking transaction.

At this point, it’s just a normal lock held by a “normal” transaction which hasn’t committed yet and actually the application just keeps trying to run the transaction waits for 1 minute until they hit the default distributed transaction timeout:

ORA-02049: timeout: distributed transaction waiting for lock

which will be logged somewhere obscure in the application logs – and there’ll be a brief pause and then it starts all over again.

Anyway at this point what to do?

At this point, the transaction hasn’t timed out.

In a few hours, the transaction will time out and become an in-doubt transaction.

Once it does, the application will receive a message:

ORA-01591: lock held by in-doubt distributed transaction

At which time, it can be manually resolved.

Typically, this is what we usually do:

for txn in (select local_tran_id from dba_2pc_pending)
-- if txn is not already forced rollback
execute immediate 'rollback force '''||txn.local_tran_id||'''';
end loop;

If we were going to intervene right now and resolve it, what could we do?

We could bounce the database. Then the application would become in-doubt and see above.

But bouncing the database is quite drastic.

A lot of the advice in the manual is about what to do once it is IN-DOUBT. Some of it might work now – not sure.

What I advise is that if we know we want to rollback or commit this transaction now and we don’t want to wait for it to become IN-DOUBT which often we can’t wait for then programatically we can attach to the transaction using DBMS_XA and do something with it.

First of all, we need some information about the transaction.
CAVEAT DBA!!! This isn’t exactly well documented but I have found that what we tend to need is to identify the transactions in V$GLOBAL_TRANSACTION which are in COLLECTING state:

select state
,      t.* 
from v$global_transaction t where state = 'COLLECTING';
STATE                                  UTL_RAW.CAST_TO_BINARY_INTEGER(GLOBALID) UTL_RAW.CAST_TO_BINARY_INTEGER(BRANCHID)   FORMATID GLOBALID                                                                                                                         BRANCHID                                                                                                                           BRANCHES   REFCOUNT PREPARECOUNT STATE                                       FLAGS COUPLING      
-------------------------------------- ---------------------------------------- ---------------------------------------- ---------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- ---------- ---------- ------------ -------------------------------------- ---------- ---------------
COLLECTING                                                                   49                                       45     131075 312D2D35363832376566363A393236643A35613562363664363A633738353036                                                                 2D35363832376566363A393236643A35613562363664363A633738353065                                                                              1          1            1 COLLECTING                                      1 TIGHTLY COUPLED 

Then we can replace the formatid, global id and branch id in the script below. Whether you require numbers or raw depends on the signature to DBMS_XA_XID – see documentation.

set serveroutput on
l_xid     DBMS_XA_XID :=
l_return  PLS_INTEGER;
l_return := SYS.dbms_xa.xa_rollback(xid =>  l_xid);

This approach above comes with no guarantees.
But it has worked for me several times in the past.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: