Blocking Prepared XA Transaction
January 20, 2018 Leave a comment
There was an oracle-l thread last month about blocking sessions which could not be identified.
I replied back about a very niche scenario which I’d seen a few times before. And it’s just happened again so I thought it would be useful to just post some evidence on this.
Here are a couple of articles posted previously on XA transactions:
First, a reminder, from the posts above, that a transaction doesn’t have to have a session.
Normal “vanilla” sessions, there’s a 1:1 relationship. But with an XA transaction, a session can attach and detach – but only one session can be attached to a transaction at any one time.
And a bit of context about these XA transactions.
This is a complex Java application with multiple resources participating in the XA transaction. There was HornetQ involved and HornetQ was apparently timing out and was bounced, the taking down being down probably with a kill -9.
And an XA transaction does what is called a 2-phase commit where the first phase involves the transaction co-ordinator instructing the participating resources to prepare to commit and subsequently the second phase is to commit.
What I have seen many times on this application is that something happens to the transaction co-ordinator in between the PREPARE and the COMMIT and we are left we an orphaned transaction.
This transaction will eventually time-out – depending on the properties set on the transaction by the co-ordinator – at which point it will become an in-doubt transaction and be visible in dba_2pc_pending.
Back to reality… there is nothing in DBA_2PC_PENDING:
select * from dba_2pc_pending; no rows selected
But we can see there is a transaction in prepared state in V$TRANSACTION, a transaction which started yesterday afternoon:
select * from v$transaction where status = 'PREPARED';
ADDR XIDUSN XIDSLOT XIDSQN UBAFIL UBABLK UBASQN UBAREC STATUS START_TIME START_SCNB START_SCNW START_UEXT START_UBAFIL START_UBABLK START_UBASQN START_UBAREC SES_ADDR FLAG SPACE RECURSIVE NOUNDO PTX NAME PRV_XIDUSN PRV_XIDSLT PRV_XIDSQN PTX_XIDUSN PTX_XIDSLT PTX_XIDSQN DSCN-B DSCN-W USED_UBLK USED_UREC LOG_IO PHY_IO CR_GET CR_CHANGE START_DATE DSCN_BASE DSCN_WRAP START_SCN DEPENDENT_SCN XID PRV_XID PTX_XID ---------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------------- -------------------- ---------- ---------- ---------- ------------ ------------ ------------ ------------ ---------------- ---------- ----- --------- ------ --- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- -------------------- ---------- ---------- ---------- ------------- ---------------- ---------------- ---------------- 0000004A833D6868 691 32 315541 22 1085802 -28624 18 PREPARED 01/19/18 15:39:45 3454176189 2913 13 22 1085802 -28624 9 0000004B82E584A0 4201987 NO NO NO NO 0 0 0 0 0 0 0 0 1 12 -40287 -163 -66597824 -1385 19-JAN-2018 15:39:45 0 0 1.3E+13 0 B302200095D00400 0000000000000000 0000000000000000
And this is the script which I mentioned on the oracle-l thread which is one I use to see what locks transaction are holding, when the transaction started and what sessions are attached:
select s.machine , lo.inst_id , lo.object_id , lo.session_id , lo.os_user_name , lo.process , ob.owner , ob.object_name , ob.subobject_name , tx.addr , tx.start_time txn_start_time , tx.status , tx.xid , s.* from gv$locked_object lo , dba_objects ob , gv$transaction tx , gv$session s where ob.object_id = lo.object_id and tx.xidusn (+) = lo.xidusn and tx.xidslot (+) = lo.xidslot and tx.xidsqn (+) = lo.xidsqn and s.taddr (+) = tx.addr order by txn_start_time, session_id, object_name;
For privacy reasons and as this is a real-world situation and not an isolated test case, I won’t share the output of the script.
But it shows that:
- the transaction is holding mode 6 exclusive TX row locks on a number of objects
- that the transaction is in PREPARED
- and that there is no session attached to the transaction (although v$locked_object does tell us what the SID was when it was there)
Now, from the application perspective, something has apparently rolled back a message perhaps because now HornetQ has been bounced, everything is back up and running and it seems like the message that resulted in our orphaned transaction is being reattempted and is being blocked by the exclusive locks still being held.
From an ACTIVE SESSION HISTORY perspective, this is what we see from this script from which I’ve removed columns for brevity and privacy but left them in so you can see what I run normally:
select count(*) over (partition by h.sample_time) sess_cnt --, h.user_id --, (select username from dba_users u where u.user_id = h.user_id) u, h.service_hash , xid--, sample_id , sample_time, session_state, session_id, session_serial#,--sql_id, sql_exec_id, sql_exec_start, event, --p1, mod(p1,16), blocking_session,blocking_session_serial#--, current_obj# --, (select object_name||' - '||subobject_name from dba_objects where object_id = current_obj#) obj --, (select sql_fulltext from v$sql s where s.sql_id = h.sql_id and rownum = 1) sqltxt --, (select sql_text from dba_hist_sqltext s where s.sql_id = h.sql_id and rownum = 1) sqltxt --, h.* from v$active_session_history h where event = 'enq: TX - row lock contention' order by h.sample_id desc;
XID SESSION_STATE SESSION_ID SESSION_SERIAL# EVENT MOD(P1,16) BLOCKING_SESSION BLOCKING_SESSION_SERIAL# ---------------- ------------- ---------- --------------- ---------------------------------------------------------------- ---------- ---------------- ------------------------ 4F021700A3C00600 WAITING 232 53035 enq: TX - row lock contention 6 FC010B00196E0A00 WAITING 471 5205 enq: TX - row lock contention 6 670320004FA50300 WAITING 2652 11791 enq: TX - row lock contention 6 640204005BA40500 WAITING 4300 49665 enq: TX - row lock contention 6
So, you can see we have four sessions being blocked on exclusive mode 6 row level locks and that the blocking session is null. If I showed you the full history then you would see that these sessions have been repeatedly trying for many hours.
BTW, this is RAC but all these sessions are intentionally on the same instance so there’s none of that jiggery-pokery involved.
I imagine at some point there was an internal conversation in Oracle about whether to report blocking session or blocking transaction.
At this point, it’s just a normal lock held by a “normal” transaction which hasn’t committed yet and actually the application just keeps trying to run the transaction waits for 1 minute until they hit the default distributed transaction timeout:
ORA-02049: timeout: distributed transaction waiting for lock
which will be logged somewhere obscure in the application logs – and there’ll be a brief pause and then it starts all over again.
Anyway at this point what to do?
At this point, the transaction hasn’t timed out.
In a few hours, the transaction will time out and become an in-doubt transaction.
Once it does, the application will receive a message:
ORA-01591: lock held by in-doubt distributed transaction
At which time, it can be manually resolved.
Typically, this is what we usually do:
begin for txn in (select local_tran_id from dba_2pc_pending) loop -- if txn is not already forced rollback execute immediate 'rollback force '''||txn.local_tran_id||''''; commit; dbms_transaction.purge_lost_db_entry(txn.local_tran_id); commit; end loop; end; /
If we were going to intervene right now and resolve it, what could we do?
We could bounce the database. Then the application would become in-doubt and see above.
But bouncing the database is quite drastic.
A lot of the advice in the manual is about what to do once it is IN-DOUBT. Some of it might work now – not sure.
What I advise is that if we know we want to rollback or commit this transaction now and we don’t want to wait for it to become IN-DOUBT which often we can’t wait for then programatically we can attach to the transaction using DBMS_XA and do something with it.
First of all, we need some information about the transaction.
CAVEAT DBA!!! This isn’t exactly well documented but I have found that what we tend to need is to identify the transactions in V$GLOBAL_TRANSACTION which are in COLLECTING state:
select state , UTL_RAW.CAST_TO_BINARY_INTEGER (globalid) , UTL_RAW.CAST_TO_BINARY_INTEGER (branchid) , t.* from v$global_transaction t where state = 'COLLECTING';
STATE UTL_RAW.CAST_TO_BINARY_INTEGER(GLOBALID) UTL_RAW.CAST_TO_BINARY_INTEGER(BRANCHID) FORMATID GLOBALID BRANCHID BRANCHES REFCOUNT PREPARECOUNT STATE FLAGS COUPLING -------------------------------------- ---------------------------------------- ---------------------------------------- ---------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- ---------- ---------- ------------ -------------------------------------- ---------- --------------- COLLECTING 49 45 131075 312D2D35363832376566363A393236643A35613562363664363A633738353036 2D35363832376566363A393236643A35613562363664363A633738353065 1 1 1 COLLECTING 1 TIGHTLY COUPLED
Then we can replace the formatid, global id and branch id in the script below. Whether you require numbers or raw depends on the signature to DBMS_XA_XID – see documentation.
set serveroutput on DECLARE l_xid DBMS_XA_XID := DBMS_XA_XID(131075, '312D2D35363832376566363A393236643A35613562363664363A633738353036', '2D35363832376566363A393236643A35613562363664363A633738353065'); l_return PLS_INTEGER; BEGIN l_return := SYS.dbms_xa.xa_rollback(xid => l_xid); dbms_output.put_line(l_return); END; /
This approach above comes with no guarantees.
But it has worked for me several times in the past.
Recent Comments