Plan Instability

There seems to me to be a relatively simple choice.

Either you except that the Oracle Optimizer has a wealth of complicated strategies and, in this complex effort to get the best executions it can, will inevitably either get it wrong sometimes (or take some extra executions to realise it’s wrong).

Or you stick your head in the sand and raise a bug for every unexpected poor execution or plan flip.

But let’s say that above else you wanted Plan Stability.

This is an widespread desire.

What would be the best strategy?
And to what lengths would you take it?

SQL Plan Management features are designed to give you some degree of stability.

You could baseline statements which have given you a problem in the past.

What if that is not deemed adequate?

So, on the one hand, you could baseline everything you could find.

Taking a long history of AWR and taking regularly snaps of V$SQL, you could put all in a SQL Plan Baseline.

But, depending on your application, you might regularly get new SQL appear unprotected by a baseline.
In one Hibernate application I know, a change in the application domain model often results in the table aliases changing, meaning that there is lots of brand new, but very familiar, SQL.
So, you then become dependant on having a reliable testing infrastructure which will generate nearly all the SQL you’re likely to get in advance of production.

In addition, you might want multiple plans in your baseline – for ACS for example – and then, once that extra bindset awareness is flushed out of memory, you then need a couple of executions to rediscover it (unless you use a SQL Patch to inject BIND_AWARE into the specific sql statements).

It’s certainly no guarantee of stability.

What is the alternative?

I like to argue that most problematic executions stem from two sources:

  1. Badly written SQL
  2. Using poor combinations of features – e.g. bind variable peeking + histograms

The other day I made a comment to someone that “if you’re heavily reliant on ACS, perhaps you should rethink whether you should be using literals rather than bind variables”.

So, you might then take the position that plan instability stems from increasing complexity in the optimizer.

In which case, maybe a viable strategy might be to turn off much of the complexity:

  • Bind Variable Peeking
  • Cardinality feedback
  • Adaptive Cursor Sharing
  • Dynamic Sampling
  • Adaptive direct path reads
  • 12c adaptive execution plans
  • Anything with the word “adaptive” in it?
  • Default Stats job and/or default method_opt histogram behaviour

This seems quite radical to many. Perhaps justifiably.

Perhaps it would be nice if there were new optimizer strategies – OPTIMIZER_MODE? – perhaps “AGGRESSIVE_PERFORMANCE” (maybe “ADAPTIVE_PERFORMANCE_WITH_THE_ODD_HICCUP” ?) vs “PERFORMANCE_STABILITY” which were an umbrella of many of the features above.

Thoughts?

To what extent are you concerned with performance stability, above all else?

How many of the optimizer features in each release are actually aligned with that widespread concern?

Conditional uniqueness

A quick fly through the options for conditional uniqueness.

Requirement #1: I want uniqueness on a column but only under certain conditions.

For example, I have an active flag and I want to make sure there is only one active record for a particular attribute but there can be many inactive rows.

Initial setup:

create table t1
(col1      number       not null
,col2      varchar2(24) not null
,is_active number(1)    not null
,constraint pk_t1 primary key (col1)
,constraint ck_t1_is_active check (is_active in (1,0)));

Solution #1: A unique index on an expression which evaluates to null when the condition is not met.

create unique index i_t1 on t1 (case when is_active = 1 then col2 end);

unique index I_T1 created.

insert into t1 values(1,'SHAGGY',1);

1 rows inserted.

insert into t1 values(2,'SHAGGY',1);

SQL Error: ORA-00001: unique constraint (I_T1) violated
00001. 00000 -  "unique constraint (%s.%s) violated"
*Cause:    An UPDATE or INSERT statement attempted to insert a duplicate key.
           For Trusted Oracle configured in DBMS MAC mode, you may see
           this message if a duplicate entry exists at a different level.
*Action:   Either remove the unique restriction or do not insert the key.

Only one active SHAGGY allowed.
But multiple inactives allowed:

insert into t1 values(2,'SHAGGY',0);

1 rows inserted.

insert into t1 values(3,'SHAGGY',0);

1 rows inserted.

Solution #2: A virtual column with a unique constraint

drop index i_t1;

index I_T1 dropped.

alter table t1 add (vc_col2 varchar2(24) generated always as (case when is_active = 1 then col2 end));

table T1 altered.

alter table t1 add constraint uk_t1 unique (vc_col2);

table T1 altered.

Note that now we have a virtual column we have to be very aware of insert statements with no explicit column list:

insert into t1 values(4,'SCOOBY',1);

SQL Error: ORA-00947: not enough values
00947. 00000 -  "not enough values"

Unless we’re lucky enough to be on 12c and use the INVISIBLE syntax:

alter table t1 add (vc_col2 varchar2(24) invisible generated always as (case when is_active = 1 then col2 end));

But as this example is on 11.2.0.3:

insert into t1 (col1, col2, is_active) values(4,'SCOOBY',1);

1 rows inserted.

insert into t1 (col1, col2, is_active) values(5,'SCOOBY',1);

SQL Error: ORA-00001: unique constraint (UK_T1) violated
00001. 00000 -  "unique constraint (%s.%s) violated"
*Cause:    An UPDATE or INSERT statement attempted to insert a duplicate key.
           For Trusted Oracle configured in DBMS MAC mode, you may see
           this message if a duplicate entry exists at a different level.
*Action:   Either remove the unique restriction or do not insert the key.

insert into t1 (col1, col2, is_active) values(5,'SCOOBY',0);

1 rows inserted.

insert into t1 (col1, col2, is_active) values(6,'SCOOBY',0);

1 rows inserted.

Requirement #2: Sorry we forgot to tell you that we insert the new row first and the update the old one to be inactive so we need deferred constraint (hmmm!)

In which case, you can’t have deferred uniqueness on an index so the only option is the virtual column.

alter table t1 drop constraint uk_t1;

table T1 altered.

alter table t1 add constraint uk_t1 unique (vc_col2) deferrable initially deferred;

table T1 altered.

insert into t1 (col1, col2, is_active) values(7,'FRED',1);

1 rows inserted.

insert into t1 (col1, col2, is_active) values(8,'FRED',1);

1 rows inserted.

commit;

SQL Error: ORA-02091: transaction rolled back
ORA-00001: unique constraint (.UK_T1) violated
02091. 00000 -  "transaction rolled back"
*Cause:    Also see error 2092. If the transaction is aborted at a remote
           site then you will only see 2091; if aborted at host then you will
           see 2092 and 2091.
*Action:   Add rollback segment and retry the transaction.

insert into t1 (col1, col2, is_active) values(7,'FRED',1);

1 rows inserted.

insert into t1 (col1, col2, is_active) values(8,'FRED',1);

1 rows inserted.

update t1 set is_active = 0 where col1 = 7;

1 rows updated.

commit;

committed.

See previous post on similar approach for conditional foreign key

Audit space threat – a perfect storm

According to http://en.wikipedia.org/wiki/Perfect_storm:

A “perfect storm” is an expression that describes an event where a rare combination of circumstances will aggravate a situation drastically.

The other day we had a bit of a panic when AUD$ space grew by 45GB in 15 minutes and set off some alerts.

The audit_trail parameter was set to DB, EXTENDED.

which means:

Behaves the same as AUDIT_TRAIL=DB, but also populates the SQL bind and SQL text CLOB-type columns of the SYS.AUD$ table, when available.

So, we’re logging SQL statements.

If in DBA_STMT_AUDIT_OPTS we have this:

USER_NAME PROXY_NAME AUDIT_OPTION  SUCCESS  FAILURE  
--------- ---------- ------------- -------- ----------
                     SELECT TABLE  NOT SET   BY ACCESS  

BY ACCESS means:

Oracle Database records separately each execution of a SQL statement, the use of a privilege, and access to the audited object. Given that the values for the return code, timestamp, SQL text recorded are accurate for each execution, this can help you find how many times the action was performed.

Now, given developer read-only access to production, if we then add these sequence of events:

1.A generated sql statement with 17,000 UNION statements each accessing the same table with a different id, e.g.

select * from t1 where id = 1 union
select * from t1 where id = 2 union
...
select * from t1 where id = 17000;

*no need to comment on whether this is a sensible query to be writing :)

2. The (foolish) developer then setting this statement off in prod.

3. After a few minutes, the developer sensibly realising that running this in prod might not be a good idea and cancelling.

4. Unluckily (for the developer), the sensible cancel operation on the foolish sql statement causing a ORA-01013 sql failure which will then be audited.

5. The audit settings above meaning that 17,000 entries will be written into AUD$, one each for the 17,000 references to T1 in the SQL statement above and each of the 17,000 entries containing the 17,000 line long SQL statement.

The additional 45GB was all taken up by the AUD$.SQLTXT LOB.

To me, this was new information – I didn’t expect AUDIT to write down each and every reference to the table from a single execution – a list of DISTINCT objects in the statement would have been my expectation. And it is a concern that given just SELECT access, it’s so easy to generate significant volumes of data.

Should you wish to play around with this scenario, steps are:

1. If you need to change audit_trail param, it’s not dynamic, you’ll need to restart:

ALTER SYSTEM SET audit_trail=DB,EXTENDED scope=spfile;

2. Setup SELECT auditing BY ACCESS for failures (for testing, you can set it up for success as well of course):

AUDIT SELECT TABLE BY ACCESS WHENEVER NOT SUCCESSFUL;

3. As a non-SYS user, create a table to select from:

CREATE TABLE t1 AS SELECT * FROM dba_objects;

4. Generate a large SQL statement with lots of table references:

set serveroutput on
declare
 l_sql clob;
begin
 dbms_output.put_line('select * from t1 where object_id = 1');
 for i in 2 .. 10000
 loop
     dbms_output.put_line(' union select * from t1 where object_id = '||i);
 end loop;
end;
/

5. Kick it off and then cancel the execution.

6. Wait a significant amount of time for the background session to write the audit data to AUD$. For this smaller/simpler SQL above, it took my underpowered VM several minutes to write down 4GB of audit data.

6. Check out the growth of your AUD$ table (or rather the associated LOB):

select segment_name, sum(bytes) 
from dba_segments
where segment_name in ('AUD$')
or    segment_name in (select segment_name from dba_lobs where table_name = 'AUD$')
group by segment_name;

Unusable unique constraint

Another annoying thing about unusable indexes

I’m surprised that I can’t remember coming across this before before.

I want to archive some data from a bunch of partitions.

Unfortunately, I can’t follow the virtuous circle of data lifecycling and partition the table in such a way that I can archive all the data from a particular partition (or subpartition) via exchange partition.

Without going into too much detail and skipping a bunch of other steps, roughly what I wanted to do was:
1. Mark indexes as unusable (ignore whether they’re global/local, it doesn’t matter).
2. Delete from existing partition
3. Rebuild indexes.

But if you mark a unique index as unusable, you can’t then do DML on the table.

It is expected behaviour.
Doesn’t matter what the setting of skip_unusable_indexes is.

If an index is used to enforce a UNIQUE constraint on a table, then allowing insert and update operations on the table might violate the constraint.
Therefore, this setting does not disable error reporting for unusable indexes that are unique.

Well that seems to compromise one of the main benefits of marking a unique index as unusable.

SQL> get tmp
  1  drop table t1;
  2  create table t1
  3  (pt    varchar2(24)
  4  ,col1  number)
  5  PARTITION BY LIST (pt)
  6  (PARTITION PX VALUES('X')
  7  ,PARTITION PY VALUES('Y'));
  8  create unique index i1 on t1 (col1) global;
  9  insert into t1 values ('X',1);
 10  alter index i1 unusable;
 11* delete from t1 partition (px) where col1 = 1;
 12  .
SQL> @tmp

Table dropped.

Table created.

Index created.

1 row created.

Index altered.

delete from t1 partition (px) where col1 = 1
*
ERROR at line 1:
ORA-01502: index 'E668983_DBA.I1' or partition of such index is in unusable state

To get around this, I can use a unique constraint backed by a non-unique index but that can come with some extra baggage.

SQL> get tmp
  1  drop table t1;
  2  create table t1
  3  (pt    varchar2(24)
  4  ,col1  number)
  5  PARTITION BY LIST (pt)
  6  (PARTITION PX VALUES('X')
  7  ,PARTITION PY VALUES('Y'));
  8  create  index i1 on t1 (col1) global;
  9  alter table t1 add constraint uk_t1 unique (col1) using index i1;
 10  insert into t1 values ('X',1);
 11  alter table t1 disable constraint uk_t1;
 12  alter index i1 unusable;
 13* delete from t1 partition (px) where col1 = 1;
 14  .
SQL> @tmp

Table dropped.

Table created.

Index created.

Table altered.

1 row created.

Table altered.

Index altered.

1 row deleted.

Remote Surprise

Here is an example of surprising behaviour from a remote DB from an OTN forum thread

Setup a link to a remote DB (I’ve used an actual remote DB and not tested a loopback)

Remote DB:

create table t1
(col1  varchar2(1));

Local DB:

create or replace view v1 
as 
select count(*) c1 from t1@l1;

Then alternate variations on this sequence of events:

1. On local DB execute SELECT:

SELECT * FROM v1;

2. On remote DB execute

begin
insert into t1 values ('a');
commit;
end;
/

3. On local DB execute SELECT:

SELECT * FROM v1;

4. On local DB execute SELECT:

SELECT * FROM v1;

OR

1. On local DB execute SELECT:

SELECT * FROM v1;

2. On remote DB execute

begin
delete from t1; 
commit;
end;
/

3. On local DB execute:

SELECT * FROM v1;

4. On local DB execute:

SELECT * FROM v1;

You may have to execute several times to catch the “inconsistency” but between the remote INSERT or DELETE and the local SELECT, you should see the odd “old” result which is corrected on the subsequent execution.

For example, on local DB with nothing happening on remote DB between the two SELECTS:

SQL> select * from v1;

        C1
----------
         6

SQL> select * from v1;

        C1
----------
         7

The behaviour is documented.


no practical way exists to keep SCNs in a distributed system absolutely synchronized: a window always exists in which one node may have an SCN that is somewhat in the past with respect to the SCN of another node.

Because of the SCN gap, you can execute a query that uses a slightly old snapshot, so that the most recent changes to the remote database are not seen. In accordance with read consistency, a query can therefore retrieve consistent, but out-of-date data.

And two workarounds documented:

You can use the following techniques to ensure that the SCNs of the two systems are synchronized just before a query:

Because SCNs are synchronized at the end of a remote query, precede each remote query with a dummy remote query to the same site, for example, SELECT * FROM DUAL@REMOTE.

Because SCNs are synchronized at the start of every remote transaction, commit or roll back the current transaction before issuing the remote query.

An example of where the optimizer should ignore a hint

Here’s an example of where the optimizer really should ignore a hint.

I’m testing an implementation of deliberately unusable index partitions – i.e. some unusable index partitions, some usable.

This is 11.2.0.3 so whilst there are enhancements in 12c, they’re no use here.

SQL> create table t1
  2  (col1 number)
  3  partition by list(col1)
  4  (partition p0 values(0),
  5   partition pdef values(default));

Table created.

SQL> create index i1 on t1 (col1) local unusable;

Index created.

SQL> alter index i1 rebuild partition pdef;

Index altered.

SQL> select * from t1;

no rows selected

SQL> select * from t1 partition (p0);

no rows selected

SQL> select /*+ index(t1 i1) */ * from t1 partition (p0);

no rows selected

SQL> select * from t1 where col1 = 0;

no rows selected

SQL> select /*+ index(t1 i1) */ * from t1 where col1 = 0;
select /*+ index(t1 i1) */ * from t1 where col1 = 0
*
ERROR at line 1:
ORA-01502: index 'I1' or partition of such index is in unusable state

SQL>

AWR: Was a baselined plan used?

Sometimes a simple question turns out to be harder than expected.

“Can we see if a particular SQL execution in AWR used a baselined plan?”

Initial thoughts might be:

Q: Does DBMS_XPLAN.DISPLAY_AWR tell us this?
A: Apparently not. See below. This question could also be rephrased as two other possible questions:

Q:Isn’t there a column on DBA_HIST_SQLSTAT which tell us this?
A: No. You’d think there should be. There is a SQL_PROFILE column. There isn’t a SQL_PLAN_BASELINE column.
There also isn’t an EXACT_MATCHING_SIGNATURE although there is a FORCE_MATCHING_SIGNATURE.

Q: Is it in DBA_HIST_SQL_PLAN.OTHER_XML?
A. No although this is where DBMS_XPLAN.DISPLAY_AWR gets it’s notes about cardinality feedback and dynamic sampling from.

First of all, Let’s see that it’s not shown in DBMS_XPLAN.DISPLAY_AWR.
Let’s get a sql statement baselined, in AWR and not in memory.

The usual setup

SQL> create table t1
  2  (col1  number
  3  ,col2  varchar2(50)
  4  ,flag  varchar2(2));

Table created.

SQL> insert into t1
  2  select rownum
  3  ,      lpad('X',50,'X')
  4  ,      case when rownum = 1
  5              then 'Y1'
  6              when rownum = 2
  7              then 'Y2'
  8              when mod(rownum,2) = 0
  9              then 'N1'
 10              else 'N2'
 11         end
 12  from   dual
 13* connect by rownum <= 100000
SQL> /

100000 rows created.

SQL> commit;

Commit complete.

Get a plan and put it in a baseline:

SQL> var n varchar2(2);
SQL> exec :n := 'N1';

PL/SQL procedure successfully completed.

SQL> select count(*), max(col2) from t1 where flag = :n;
select * from table(dbms_xplan.display_cursor);
  COUNT(*) MAX(COL2)
---------- --------------------------------------------------
     49999 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

SQL> select * from table(dbms_xplan.display_cursor);


PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID  731b98a8u0knf, child number 0
-------------------------------------
select count(*), max(col2) from t1 where flag = :n

Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |   221 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    30 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   | 50254 |  1472K|   221   (1)| 00:00:03 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("FLAG"=:N)

Note
-----
   - dynamic sampling used for this statement (level=2)

SQL> declare
  2   l_op pls_integer;
  3  begin
  4    l_op :=
  5    dbms_spm.load_plans_from_cursor_cache('731b98a8u0knf');
  6* end;
SQL> /

PL/SQL procedure successfully completed.

Verify that DBMS_XPLAN.DISPLAY_CURSOR reports baseline usage:

SQL> select count(*), max(col2) from t1 where flag = :n;

  COUNT(*) MAX(COL2)
---------- --------------------------------------------------
     49999 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID  731b98a8u0knf, child number 0
-------------------------------------
select count(*), max(col2) from t1 where flag = :n

Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |   221 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    30 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   | 50254 |  1472K|   221   (1)| 00:00:03 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("FLAG"=:N)

Note
-----
   - dynamic sampling used for this statement (level=2)
   - SQL plan baseline SQL_PLAN_13w748wknkcwd616acf47 used for this statement

See that dynamic_sampling note most likely comes from OTHER_XML and Baseline note doesn’t, it comes from V$SQL:

SQL>select sql_id
  2  ,      to_char(exact_matching_signature) sig
  3  ,      plan_hash_value
  4  ,      sql_plan_baseline
  5  from   v$sql
  6* where  sql_id = '731b98a8u0knf'
SQL> /

SQL_ID        SIG                                      PLAN_HASH_VALUE SQL_PLAN_BASELINE
------------- ---------------------------------------- --------------- ------------------------------
731b98a8u0knf 1292784087274697613                           3724264953 SQL_PLAN_13w748wknkcwd616acf47

SQL> set long 10000
SQL> select other_xml
  2  from   v$sql_plan
  3  where  sql_id = '731b98a8u0knf'
  4* and    other_xml is not null
SQL> /

OTHER_XML
--------------------------------------------------------------------------------
<other_xml><info type="db_version">11.2.0.3</info><info type="parse_schema"><![C
DATA["E668983_DBA"]]></info><info type="dynamic_sampling">2</info><info type="pl
an_hash">3724264953</info><info type="plan_hash_2">1634389831</info><peeked_bind
s><bind nam=":N" pos="1" dty="1" csi="178" frm="1" mxl="32">4e31</bind></peeked_
binds><outline_data><hint><![CDATA[IGNORE_OPTIM_EMBEDDED_HINTS]]></hint><hint><!
[CDATA[OPTIMIZER_FEATURES_ENABLE('11.2.0.3')]]></hint><hint><![CDATA[DB_VERSION(
'11.2.0.3')]]></hint><hint><![CDATA[OPT_PARAM('query_rewrite_enabled' 'false')]]
></hint><hint><![CDATA[ALL_ROWS]]></hint><hint><![CDATA[OUTLINE_LEAF(@"SEL$1")]]
></hint><hint><![CDATA[FULL(@"SEL$1" "T1"@"SEL$1")]]></hint></outline_data></oth
er_xml>

Now, let’s add statement to AWR, do an AWR snap and flush the SP:

SQL> exec dbms_workload_repository.add_colored_sql('731b98a8u0knf');

PL/SQL procedure successfully completed.

SQL> exec dbms_workload_repository.create_snapshot;

PL/SQL procedure successfully completed.

SQL> alter system flush shared_pool;

System altered.

Now if we look back at the plan in memory, there’s nothing there.

SQL> select * from v$sql where sql_id = '731b98a8u0knf';

no rows selected

SQL> select * from table(dbms_xplan.display_cursor('731b98a8u0knf'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------
SQL_ID: 731b98a8u0knf, child number: 0 cannot be found

But the SQL is in AWR and let’s see that DBMS_XPLAN.DISPLAY_AWR does not tells us that a baseline is used:

SQL> select * from table(dbms_xplan.display_awr('731b98a8u0knf'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------
SQL_ID 731b98a8u0knf
--------------------
select count(*), max(col2) from t1 where flag = :n

Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |   221 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    30 |            |          |
|   2 |   TABLE ACCESS FULL| T1   | 50238 |  1471K|   221   (1)| 00:00:03 |
---------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)

But it will tell us if a sql profile is used because it can via DBA_HIST_SQLSTAT.SQL_PROFILE – I leave that exercise to the reader.

So, back to the original question:

“Can we see if a particular SQL execution in AWR used a baselined plan?”

I’ve never needed to do this but as an exercise we can infer this connection by the presence of a baselined plan with the same phv2 but it’s convoluted.

It may be that there are simpler ways but it seems to me that we need to:

  1. Get the exact_matching_signature from the sql text
  2. Get the phv2 out of DBA_HIST_SQL_PLAN.OTHER_XML
  3. Match that phv2 with planid which is not exposed in DBA_SQL_PLAN_BASELINES but is in the underlying SYS.SQLOBJ$ tables
SQL> with subq_mysql as
  2  (select sql_id
  3   ,      (select dbms_sqltune.sqltext_to_signature(ht.sql_text)
  4           from dual) sig
  5   from   dba_hist_sqltext       ht
  6   where  sql_id = '731b98a8u0knf')
  7  ,    subq_baselines as
  8  (select b.signature
  9   ,      b.plan_name
 10   ,      b.accepted
 11   ,      b.created
 12   ,      o.plan_id
 13   from   subq_mysql             ms
 14   ,      dba_sql_plan_baselines b
 15   ,      sys.sqlobj$            o
 16   where  b.signature   = ms.sig
 17   and    o.signature   = b.signature
 18   and    o.name        = b.plan_name)
 19  ,    subq_awr_plans as
 20  (select  sn.snap_id
 21   ,       to_char(sn.end_interval_time,'DD-MON-YYYY HH24:MI') dt
 22   ,       hs.sql_id
 23   ,       hs.plan_hash_value
 24   ,       t.phv2
 25   ,       ms.sig
 26   from    subq_mysql        ms
 27   ,       dba_hist_sqlstat  hs
 28   ,       dba_hist_snapshot sn
 29   ,       dba_hist_sql_plan hp
 30   ,       xmltable('for $i in /other_xml/info
 31                     where $i/@type eq "plan_hash_2"
 32                     return $i'
 33                    passing xmltype(hp.other_xml)
 34                    columns phv2 number path '/') t
 35   where   hs.sql_id          = ms.sql_id
 36   and     sn.snap_id         = hs.snap_id
 37   and     sn.instance_number = hs.instance_number
 38   and     hp.sql_id          = hs.sql_id
 39   and     hp.plan_hash_value = hs.plan_hash_value
 40   and     hp.other_xml      is not null)
 41  select awr.*
 42  ,       nvl((select max('Y')
 43               from   subq_baselines b
 44               where  b.signature = awr.sig
 45               and    b.accepted  = 'YES'),'N') does_baseline_exist
 46  ,      nvl2(b.plan_id,'Y','N') is_baselined_plan
 47  ,      to_char(b.created,'DD-MON-YYYY HH24:MI')  when_baseline_created
 48  from   subq_awr_plans awr
 49  ,      subq_baselines b
 50  where  b.signature (+) = awr.sig
 51  and    b.plan_id   (+) = awr.phv2
 52* order by awr.snap_id
SQL> /

   SNAP_ID DT                SQL_ID        PLAN_HASH_VALUE       PHV2        SIG D I WHEN_BASELINE_CRE
---------- ----------------- ------------- --------------- ---------- ---------- - - -----------------
      8703 05-FEB-2014 15:18 731b98a8u0knf      3724264953 1634389831 1.2928E+18 Y Y 05-FEB-2014 15:08

SQL>

Surprising.
Unless, of course, I’ve overlooked something bleeding obvious.

SQL utils using XML

You may have previously seen a short post I did on a SQL statement to identify which statements are using dynamic sampling.

If not, quick recap:

SELECT p.sql_id, t.val
FROM   v$sql_plan p
,      xmltable('for $i in /other_xml/info
                 where $i/@type eq "dynamic_sampling"
                 return $i'
                passing xmltype(p.other_xml)
                columns attr varchar2(50) path '@type',
                        val  varchar2(50) path '/') t
WHERE  p.other_xml IS NOT NULL;

This uses the incredibly powerful XMLTABLE functionality, there’s so much that can be done with it.

Here are a couple of other utilities I used recently which also highlight the powerful convenience of SQL and XML.

First up, I don’t know if this is useful to anyone but I had a crappy refresh script which should have been creating table partitions with SEGMENT CREATION DEFERRED but wasn’t.

So there was a reasonable amount of space wastage caused by empty segments.

How to identify? See below.

Could be combined with DBMS_SPACE_ADMIN.DROP_EMPTY_SEGMENTS to clean up?

WITH subq_pos_empty AS
(SELECT t.table_owner
 ,      t.table_name
 ,      t.partition_name
 ,      x.cnt
 FROM   dba_segments         s
 ,      dba_tab_partitions   t
 ,      xmltable('for $i in /ROWSET/ROW/CNT
                  return $i'
                 passing xmltype(
                          dbms_xmlgen.getxml
                          ('select count(*) cnt '
                         ||'from '||t.table_owner||'.'||t.table_name||' PARTITION ('||t.partition_name||') '
                         --||'SAMPLE(.01)' -- If you want to sample to speed up unexpected large seg counts                  
                           ))
                 columns cnt number path '/') x
 WHERE  s.segment_type       = 'TABLE PARTITION'
 --AND    t.table_owner      LIKE 'XYZ%'
 AND    t.table_owner        = s.owner
 AND    t.table_name         = s.segment_name
 AND    t.partition_name     = s.partition_name
 AND    t.num_rows           = 0
 AND    t.partition_position > 1)
SELECT *
FROM   subq_pos_empty
WHERE  cnt = 0
ORDER BY 
       table_owner
,      table_name
,      partition_name;

For example:

SQL> create table t1
  2  (col1 date
  3  ,col2 number)
  4  partition by range(col1) interval (numtodsinterval(1,'DAY'))
  5  (PARTITION p0 values less than (to_Date(20130101,'YYYYMMDD')) segment creation immediate
  6  ,PARTITION p1 values less than (to_Date(20130102,'YYYYMMDD'))  segment creation immediate)
  7  ;

Table created.

SQL> exec dbms_stats.gather_table_stats(USER,'T1');

PL/SQL procedure successfully completed.

SQL> WITH subq_pos_empty AS
  2  (SELECT t.table_owner
  3   ,      t.table_name
  4   ,      t.partition_name
  5   ,      x.cnt
  6   FROM   dba_segments         s
  7   ,      dba_tab_partitions   t
  8   ,      xmltable('for $i in /ROWSET/ROW/CNT
  9                    return $i'
 10                   passing xmltype(
 11                            dbms_xmlgen.getxml
 12                            ('select count(*) cnt '
 13                           ||'from '||t.table_owner||'.'||t.table_name||' PARTITION ('||t.partition_name||') '
 14                           --||'SAMPLE(.01)' -- If you want to sample to speed up unexpected large seg counts                                      
 15                             ))
 16                   columns cnt number path '/') x
 17   WHERE  s.segment_type       = 'TABLE PARTITION'
 18   --AND    t.table_owner      LIKE 'XYZ%'
 19   AND    t.table_name         = 'T1' -- Comment out
 20   AND    t.table_owner        = s.owner
 21   AND    t.table_name         = s.segment_name
 22   AND    t.partition_name     = s.partition_name
 23   AND    t.num_rows           = 0
 24   AND    t.partition_position > 1)
 25  SELECT *
 26  FROM   subq_pos_empty
 27  WHERE  cnt = 0
 28  ORDER BY
 29         table_owner
 30  ,      table_name
 31  ,      partition_name;

TABLE_OWNER                    TABLE_NAME                     PARTITION_NAME                        CNT
------------------------------ ------------------------------ ------------------------------ ----------
PGPS_UAT1                      T1                             P1                                      0

Secondly, a helper for partitions and that nasty LONG column which can be used for partition maintenance to roll off oldest partitions:

SELECT table_name
,      partition_name
,      hi
FROM   (SELECT t.table_name
        ,      t.partition_name
        ,      t.partition_position
        ,      x.hi
        FROM   user_tab_partitions t
        ,      xmltable('for $i in /ROWSET/ROW/HI
                         return $i'
                        passing xmltype(
                                dbms_xmlgen.getxml
                                ('select high_value hi from user_tab_partitions x'
                               ||' where x.table_name     = '''||t.table_name||''''
                               ||' and   x.partition_name = '''|| t.partition_name|| ''''))
                        columns hi number path '/') x
        --WHERE  partition_position > 1
        --AND    table_name  = i_table_name
        )
--WHERE hi       <= i_date_yyyymmdd
;   

This works an awful lot more easily if you have range/interval partitioning on a number – which most people probably don’t have.

For the more normal DATE range partitioning, it’s only slightly more fiddly.

I haven’t spent too long thinking about it so there may be a better way, but I tried to avoid the deprecated EXTRACTVALUE approach:

SELECT t.table_name
,      t.partition_name
,      t.partition_position
,      to_date(x2.dt,'YYYYMMDDHH24MISS') hi
FROM   user_tab_partitions t
,      xmltable('for $i in /ROWSET/ROW/HI
                 return $i'
                passing xmltype(
                          dbms_xmlgen.getxml
                          ('select high_value hi from user_tab_partitions x'
                         ||' where x.table_name     = '''||t.table_name||''''
                         ||' and   x.partition_name = '''|| t.partition_name|| ''''))
                columns dt varchar2(4000) path '/') x
,      xmltable('for $i in /ROWSET/ROW/DT
                 return $i'
                passing xmltype(dbms_xmlgen.getxml(q'[select to_char(]'||x.dt||q'[,'YYYYMMDDHH24MISS') dt from dual]'))
                columns dt varchar2(16) path '/') x2
;

For example:

SQL> alter session set nls_date_format = 'DD-MON-YYYY HH24:MI';

SQL> SELECT t.table_name
  2  ,      t.partition_name
  3  ,      t.partition_position
  4  ,      to_date(x2.dt,'YYYYMMDDHH24MISS') hi
  5  FROM   user_tab_partitions t
  6  ,      xmltable('for $i in /ROWSET/ROW/HI
  7                   return $i'
  8                  passing xmltype(
  9                            dbms_xmlgen.getxml
 10                            ('select high_value hi from user_tab_partitions x'
 11                           ||' where x.table_name     = '''||t.table_name||''''
 12                           ||' and   x.partition_name = '''|| t.partition_name|| ''''))
 13                  columns dt varchar2(4000) path '/') x
,      xmltable('for $i in /ROWSET/ROW/DT
 14   15                   return $i'
 16                  passing xmltype(dbms_xmlgen.getxml(q'[select to_char(]'||x.dt||q'[,'YYYYMMDDHH24MISS') dt from dual]'))
 17                  columns dt varchar2(16) path '/') x2
 18  WHERE  t.table_name = 'T1';

TABLE_NAME                     PARTITION_NAME                 PARTITION_POSITION HI
------------------------------ ------------------------------ ------------------ -----------------
T1                             P0                                              1 01-JAN-2013 00:00
T1                             P1                                              2 02-JAN-2013 00:00

SQL>

Warning about the XMLTABLE approach – if running on versions less than 11.2, you may occasionally run into some ORA-00600 bugs.

ORs, IN lists and LNNVL

I’ve previously written about manually rewriting an OR condition into a UNION ALL using LNNVL.

This is a description of a performance issue observed in the real world from the optimizer coming up with a CONCATENATION operation against many child operations including an INLIST operator and other children which then have to use LNNVL as a filter.

Really this is nothing more than an illustration of our old performance killing friend the function call from within SQL, particularly from within predicates, and the problem of context switching.

Ok. So, the real world example is a crazy query.
It contains 1091 OR clauses each stipulating a unique key lookup.
Yes, “1091” OR clauses.
I know, I know.
Probably itself a workaround to avoid the 1000 limit on an IN list, who knows?
Anyway.

I’m going to use a table just to illustrate the execution plan and then use some metrics from the execution of the real world example.

drop table t1;

create table t1
(col1 number
,col2 number
,col3 varchar2(10)
,constraint pk_t1 primary key (col1, col2));

insert into t1
select mod(rownum,10000)+1 col1
,      ceil(rownum/10000)  col2
,      rpad('X',10,'X')    col3
from   dual
connect by rownum <= 100000
order by col1, col2;

commit;

If we use a simple multi-column OR condition, we don’t get the desired LNNVLs.

explain plan for
select *
from   t1
where (col1 = 1
and    col2 = 1)
or    (col1 = 2
and    col2 = 2)
or    (col1 = 3
and    col2 = 3)
or    (col1 = 3
and    col2 = 4);

select * from table(dbms_xplan.display);

The optimizer just uses an INLIST iterator:

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |  1129 | 37257 |     5   (0)| 00:00:01 |
|   1 |  INLIST ITERATOR             |       |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |  1129 | 37257 |     5   (0)| 00:00:01 |
|*  3 |    INDEX UNIQUE SCAN         | PK_T1 |     7 |       |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - access(("COL1"=1 AND "COL2"=1 OR "COL1"=2 AND "COL2"=2 OR "COL1"=3 AND 
              "COL2"=3 OR "COL1"=3 AND "COL2"=4))

However, if we combine an IN list with some of those ORs:

explain plan for
select *
from   t1
where (col1 = 1
and    col2 = 1)
or    (col1 = 2
and    col2 = 2)
or    (col1 = 3
and    col2 IN (3,4));

select * from table(dbms_xplan.display);

Which gives

---------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |  1140 | 37620 |     2   (0)| 00:00:01 |
|   1 |  CONCATENATION                |       |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID | T1    |    11 |   363 |     1   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN           | PK_T1 |     1 |       |     2   (0)| 00:00:01 |
|   4 |   INLIST ITERATOR             |       |       |       |            |          |
|   5 |    TABLE ACCESS BY INDEX ROWID| T1    |  1129 | 37257 |     1   (0)| 00:00:01 |
|*  6 |     INDEX UNIQUE SCAN         | PK_T1 |     1 |       |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - access("COL1"=3)
       filter("COL2"=3 OR "COL2"=4)
   6 - access(("COL1"=1 AND "COL2"=1 OR "COL1"=2 AND "COL2"=2))
       filter(LNNVL("COL1"=3) OR LNNVL("COL2"=3) AND LNNVL("COL2"=4))

We get the CONCATENATION of the simpler OR conditions which are combined into a single INLIST ITERATOR operations with another child operation which applies LNNVLs in the filter predicate.

Now imagine we have a much larger number of the simpler OR predicates combined with liberal scatterings of such IN clauses throughout our 1091 OR’d predicates.

Perhaps we’d be a bit concerned about the context switching and increased cpu usage?

Looking at my real world example right here… there’s no point me pasting in the thousands of lines.

If I run the real world behemoth:

Client elapsed time 108 seconds

DBMS_XPLAN.DISPLAY_CURSOR tells me it ran in 1.26 seconds:

plan hash value: 4262066066  
 
-------------------------------------------------------------------------------------------------------------------- 
| Id  | Operation                     | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | 
--------------------------------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT              |                 |      1 |        |   1628 |00:00:01.26 |    3352 |    418 | 
|   1 |  CONCATENATION                |                 |      1 |        |   1628 |00:00:01.26 |    3352 |    418 |  
|   2 |   INLIST ITERATOR             |                 |      1 |        |    966 |00:00:00.94 |    2735 |    389 | 
|   3 |    TABLE ACCESS BY INDEX ROWID| XXXXXXXXXX      |    966 |    624 |    966 |00:00:00.94 |    2735 |    389 | 
|*  4 |     INDEX RANGE SCAN          | XXXXXXXXXX_UK01 |    966 |    624 |    966 |00:00:00.86 |    1915 |    380 | 
|   5 |   TABLE ACCESS BY INDEX ROWID | XXXXXXXXXX      |      1 |      1 |      2 |00:00:00.01 |       4 |      0 |  
|*  6 |    INDEX RANGE SCAN           | XXXXXXXXXX_UK01 |      1 |      1 |      2 |00:00:00.01 |       3 |      0 |  
....
| 253 |   TABLE ACCESS BY INDEX ROWID | XXXXXXXXXX      |      1 |      2 |     47 |00:00:00.01 |      14 |      1 |  
|*254 |    INDEX RANGE SCAN           | XXXXXXXXXX_UK01 |      1 |      2 |     47 |00:00:00.01 |       3 |      1 |   
--------------------------------------------------------------------------------------------------------------------  

Predicate Information (identified by operation id):
--------------------------------------------------- 
   4 - access(((("XXXXXXXXXX"."VERSION"=1 AND "XXXXXXXXXX"."ID_XXXXXXXXXX"=1611722) OR             
              ("XXXXXXXXXX"."VERSION"=1 AND "XXXXXXXXXX"."ID_XXXXXXXXXX"=1611795) OR ("XXXXXXXXXX"."VERSION"=1 AND                                       
              "XXXXXXXXXX"."ID_XXXXXXXXXX"=1611863) OR ("XXXXXXXXXX"."VERSION"=1 AND
              "XXXXXXXXXX"."ID_XXXXXXXXXX"=1612023) OR ("XXXXXXXXXX"."VERSION"=1 AND 			  .....
  6 - access "XXXXXXXXXX"."ID_XXXXXXXXXX"=2046939) 
       filter(((LNNVL("XXXXXXXXXX"."VERSION"=1) OR LNNVL("XXXXXXXXXX"."ID_XXXXXXXXXX"=1611722)) AND 
           (LNNVL("XXXXXXXXXX"."VERSION"=1) OR LNNVL("XXXXXXXXXX"."ID_XXXXXXXXXX"=1611795)) AND 
              (LNNVL("XXXXXXXXXX"."VERSION"=1) OR LNNVL("XXXXXXXXXX"."ID_XXXXXXXXXX"=1611863)) AND 
.....

DBMS_MONITOR tells me the query ran in 1.68 seconds:

Global Information
 Status              :  DONE (ALL ROWS)     
 Instance ID         :  2                   
 Session ID          :  1220                
 SQL ID              :  5fvt3tfbgmqp3       
 SQL Execution ID    :  33554432            
 Plan Hash Value     :  4262066066          
 Execution Started   :  04/05/2013 14:53:42 
 First Refresh Time  :  04/05/2013 14:53:42 
 Last Refresh Time   :  04/05/2013 14:53:45 

--------------------------------------------------------------------
| Elapsed |   Cpu   |    IO    | Cluster  | Fetch | Buffer | Reads |
| Time(s) | Time(s) | Waits(s) | Waits(s) | Calls |  Gets  |       |
--------------------------------------------------------------------
|    1.68 |    0.50 |     1.13 |     0.05 |     4 |   3352 |   418 |
--------------------------------------------------------------------

Fortunately V$SQL tells me the real story:

select sql_id, executions, rows_processed, elapsed_time/1000/1000, cpu_time/1000/1000 
from v$sql where sql_id = '5fvt3tfbgmqp3';

Gives

SQL_ID        EXECUTIONS ROWS_PROCESSED ELAPSED_TIME/1000/1000 CPU_TIME/1000/1000
------------- ---------- -------------- ---------------------- ------------------
5fvt3tfbgmqp3          1           1628              105.81511         104.581102

Then again, if you write nasty code, nasty things can happen.

Fun with distributed transactions

Three items within:
Part 1 – Subquery materialisation not possible
Part 2 – Materialised views
Part 3 – Lock scripts

Part 1 – Subquery materialisation not possible
Yesterday I was reminder of a familiar issue, officially listed as a bug, that can occur when combining subquery materialisation with distributed transactions.

I had previously tuned a query for a colleague, the best approach for which was using implicit materialised subquery factoring (implicit because it wasn’t hinted as /*+ materalize */ but was referenced multiple times in the main query) and I had provided such a solution, without even a thought to the performance threat that the query might be running in an XA transaction.

As a result, unsurprisingly with the benefit of hindsight, the query was not performing as expected from the application in said XA transaction.

Let me provide a refresher on the issue.

create table t1
as
select rownum col1
from   dual
connect by rownum <= 100000;

alter session set statistics_level = all;

with x as  
(select count(*) col1  
 from   t1)  
select x1.*, x2.*
from x x1, x x2;  

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

provides

PLAN_TABLE_OUTPUT 
---------------------------------------------------------------------------------------------
SQL_ID  aqsc5m0wtj7wn, child number 0
-------------------------------------                              
with x as   (select count(*) col1    from   t1)   select x1.*, x2.*
from x x1, x x2

Plan hash value: 1115620710 

---------------------------------------------------------------------------------------------
| Id  | Operation                  | Name                        | Starts | E-Rows | A-Rows |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                             |      1 |        |      1 |
|   1 |  TEMP TABLE TRANSFORMATION |                             |      1 |        |      1 |
|   2 |   LOAD AS SELECT           |                             |      1 |        |      1 |
|   3 |    SORT AGGREGATE          |                             |      1 |      1 |      1 |
|   4 |     TABLE ACCESS FULL      | T1                          |      1 |    122K|    100K|
|   5 |   MERGE JOIN CARTESIAN     |                             |      1 |      1 |      1 |
|   6 |    VIEW                    |                             |      1 |      1 |      1 |
|   7 |     TABLE ACCESS FULL      | SYS_TEMP_0FD9D6717_BCA95716 |      1 |      1 |      1 |
|   8 |    BUFFER SORT             |                             |      1 |      1 |      1 |
|   9 |     VIEW                   |                             |      1 |      1 |      1 |
|  10 |      TABLE ACCESS FULL     | SYS_TEMP_0FD9D6717_BCA95716 |      1 |      1 |      1 |
---------------------------------------------------------------------------------------------

Note
-----  
   - dynamic sampling used for this statement
                  

 27 rows selected 

As you might expect, subquery materialisation using a temp table.

But if you execute the query within a distributed transaction, it will probably not use a temporary table (see original post above about why I say probably).

I’m using a simple insert script over a db link to start the distributed transaction but in my real world occurrence it was the JDBC XA driver.

drop table t2;

create table t2
(col1 number);

alter system flush shared_pool;

select count(*) from gv$sql where sql_text like 'with x%';

alter session set statistics_level = all;

insert into t2@mydba values(1);

with x as  
(select count(*) col1  
 from   t1)  
select x1.*, x2.*
from x x1, x x2;  

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

rollback;

Which gives:

PLAN_TABLE_OUTPUT  
---------------------------------------------------------------------------------------
SQL_ID  aqsc5m0wtj7wn, child number 0
-------------------------------------
with x as   (select count(*) col1    from   t1)   select x1.*, x2.*
from x x1, x x2

Plan hash value: 3433669518

---------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |      1 |00:00:00.08 |     314 |
|   1 |  NESTED LOOPS        |      |      1 |      1 |      1 |00:00:00.08 |     314 |
|   2 |   VIEW               |      |      1 |      1 |      1 |00:00:00.04 |     157 |
|   3 |    SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.04 |     157 |
|   4 |     TABLE ACCESS FULL| T1   |      1 |    122K|    100K|00:00:00.01 |     157 |
|   5 |   VIEW               |      |      1 |      1 |      1 |00:00:00.04 |     157 |
|   6 |    SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.04 |     157 |
|   7 |     TABLE ACCESS FULL| T1   |      1 |    122K|    100K|00:00:00.01 |     157 |
--------------------------------------------------------------------------------------- 

Note
-----
   - dynamic sampling used for this statement


 24 rows selected 

As mentioned, the distributed transaction bypasses materialisation which could have a very negative effect on the performance of a particular query where that materialisation was key.

Part II – Materialized views.

This trouble with distributed transactions reminds me of a particular bugbear with materialised views.

This restriction is listed here in the documentation

http://docs.oracle.com/cd/E11882_01/server.112/e10706/repmview.htm

I’m surprised that a bigger deal is not made of it, given the proliferation of XA drivers and distributed transactions in many enterprise applications.

This is demonstrated quite easily, again using a db link to do the distributed transaction.

drop table t2;

create table t2
(col1 number
,constraint pk_t2 primary key(col1));

create materialized view log on t2;

create materialized view mv2
build immediate
refresh fast on commit
as
select * from t2;

insert into t2@mydba values (1);

commit;

Which, on commit, gives:

SQL Error: ORA-02050: transaction 26.8.810414 rolled back, some remote DBs may be in-doubt
ORA-02050: transaction 26.8.810414 rolled back, some remote DBs may be in-doubt
ORA-02051: another session in same transaction failed
ORA-02063: preceding 2 lines from MYDBA
02050. 00000 -  "transaction %s rolled back, some remote DBs may be in-doubt"
*Cause:    network or remote failure in 2PC.
*Action:   Notify operations; remote DBs will automatically re-sync when the
           failure is repaired.

I keep meaning to double check that the same applies when using XA driver and Java but there’s no reason to think it wouldn’t be the same.

Tom Kyte has mentioned this several times on AskTom, “recently” here.

I’m not quite sure I get his comment about OLTP databases and distributed transactions but, not for the first time, I’ve found myself wanting to use MVs to mitigate certain performance problems and have run into this restriction

Part III – Locking scripts
There are a fair number of scripts out there on t’internet for showing / diagnosing locking.

During my special distributed transaction day yesterday, I was reminded that anything lock related that starts with V$SESSION or depends on / makes assumptions about SID/SESSION_ID can be misleading.

I initially thought something a little unusual was up because neither V$SESSION or ASH could tell me who was blocking a bunch of requested mode 6 TX locks. But I thought some of the blocking diagnosis was missing because of cross-node RAC complications.

But it actually was because there were transactions holding locks but no sessions.

Admittedly, this was a slightly strange situation, but essentially what happened was that an XA JDBC transaction that was also mixing with ActiveMQ ended up causing what I suppose would essentially be orphaned transactions, and locks held by transactions that had no associated sessions.

This may be an “exotic” situation but V$LOCKED_OBJECT, for example, exposes SESSION_ID but no SESSION_SERIAL# so itself can be very misleading if you go off joining willy nilly by SESSION_ID because SIDs get reused quickly.

Similarly, DBA_BLOCKERS and DBA_WAITERS both showed zilch.

A more reliable starting point was to ignore V$LOCKED_OBJECT.SESSION_ID and pay more attention to the transction information, joining to V$TRANSACTION and then outer joining to V$SESSION, like so:

select lo.inst_id
,      lo.object_id
,      lo.session_id
,      lo.os_user_name
,      lo.process
,      lo.locked_mode
,      ob.owner
,      ob.object_name
,      tx.addr
,      tx.start_time txn_start_time
,      tx.status
,      s.*
from   gv$locked_object lo
,      dba_objects      ob
,      gv$transaction    tx
,      gv$session        s
where  ob.object_id = lo.object_id
and    tx.xidusn    (+) = lo.xidusn
and    tx.xidslot   (+) = lo.xidslot
and    tx.xidsqn    (+) = lo.xidsqn
and    s.taddr      (+) = tx.addr; 

Which showed me, as mentioned, a whole bunch of locks in these orphaned, PREPARED distributed transactions dating back a few hours.

Follow

Get every new post delivered to your Inbox.

Join 68 other followers