Outer Join with OR and Lateral View Decorrelation

Use of ANSI SQL is a personal thing.

Historically I have not been a fan apart from where it makes things easier/possible.

This reticence was mainly due to optimizer bugs and limitations in the earlier days.

Recently I have been using it much more because I find that the developers I interact with prefer it / understand it better.

You might/should be aware that Oracle will rewrite ANSI SQL to an Oracle syntax representation, this transformation being listed in the optimizer trace file.

You might/should also be aware that Oracle outer join syntax does not allow OR or IN:

drop table t1;
drop table t2;

create table t1
as
select floor((rownum+1)/2) col1
,      case when mod(rownum,2) = 0 then 1 else 2 end col2
,      10 col3
from   dual
connect by rownum <= 20;

create table t2
as
select rownum col1
,      case when mod(rownum,2) = 0 then 2 else 1 end col3
from   dual
connect by rownum <= 10;

select *
from   t1
,      t2
where  t1.col1 = t2.col1 (+) 
and  ((t1.col2 = 1
and    t2.col3 (+) > t1.col3)
or    (t1.col2 = 2
and    t2.col3 (+) < t1.col3));

ORA-01719: outer join operator (+) not allowed in operand of OR or IN

ANSI SQL remedies this:

alter session tracefile_identifier = 'domlg1';
alter session set events 'trace[rdbms.SQL_Optimizer.*]';
select *
from   t1
left join t2
on    t1.col1 = t2.col1
and ((t1.col2 = 1
and   t2.col3 > t1.col3)
or   (t1.col2 = 2
and   t2.col3 < t1.col3));

alter session set events 'trace off';

But it comes at a price.

Note the execution plan:

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |    20 |  1300 |    42   (0)| 00:00:01 |
|   1 |  NESTED LOOPS OUTER |      |    20 |  1300 |    42   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL | T1   |    20 |   780 |     2   (0)| 00:00:01 |
|   3 |   VIEW              |      |     1 |    26 |     2   (0)| 00:00:01 |
|*  4 |    TABLE ACCESS FULL| T2   |     1 |    26 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter("T1"."COL1"="T2"."COL1" AND ("T1"."COL2"=1 AND
              "T2"."COL3">"T1"."COL3" OR "T1"."COL2"=2 AND "T2"."COL3"<"T1"."COL3"))   

Now, maybe you will have better luck than me but no matter what I try I cannot change the NESTED LOOPS OUTER operation (I could rewrite it to do two joins rather than one and remove the OR).

So, if that lateral view involves some full table scans or other significant operations, they might be very expense on the outer operation of a nested loop.

The reason is in the optimizer trace.

Query after View Removal
******* UNPARSED QUERY IS ********
SELECT "T1."COL1" "COL1", "T1."COL2" "COL2", "T1."COL3" "COL3", "VW_LAT_AE9E49E8"."ITEM_1_0" "COL1", "VW_LAT_AE9E49E8"."ITEM_2_1" "COL3" FROM "DOM"."T1" "T1", LATERAL( (SELECT "T2"."COL1" "ITEM_1_0", "T2"."COL3" "ITEM_2_1" FROM "DOM"."T2" "T2" WHERE "T1"."COL1"="T2"."COL1" AND ("T1"."COL2"=1 AND "T2"."COL3">"T1"."COL3" OR "T1"."COL2"=2 AND "T2"."COL3" < "T1"."COL3"))) (+) "VW_LAT_AE9E49E8"
DCL:Checking validity of lateral view decorrelation SEL$BCD4421C (#1)
DCL: Bypassed: view has non-well-formed predicate
DCL: Failed decorrelation validity for lateral view block SEL$BCD4421C (#1)

The OR prevents the decorrelation which seems to mean that we’re stuck with a NESTED LOOP for now.

Further Reading on ANSI:
Oracle Optimizer Blog
Jonathan Lewis on ANSI Outer
Jonathan Lewis on ANSI

Did it really fix it 1: OFE & nasty FIRST_ROWS plan on 11.2.0.4

Plan degradations on upgrade are normal.

This one’s no different.

On further investigation, turned out application was setting optimizer_mode = first_rows somewhere.

First point about this is that first_rows really shouldn’t be used either as a hint or as an optimizer_mode.

What does FIRST_ROWS mean?
From 11g doco:

The optimizer uses a mix of costs and heuristics to find a best plan for fast delivery of the first few rows.

If any sort of FIRST_ROWS type intervention is required, then it might be better to be using a more modern, fully-costed FIRST_ROWS_n (where n = 1 | 10 | 100 | 1000).
If it’s even required.

The 12c documentation is clearer about this:

FIRST_ROWS is available for backward compatibility and plan stability; use FIRST_ROWS_n instead.

But I don’t think we should attach too much credence to that statement not being in the 11g documentation.

Here’s the problem plan.

Pre upgrade, snippet of the plan looked as per below.
Please note position of T1.

-------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name                        | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |                             |    34 | 13668 |  3003   (1)| 00:00:22 |       |       |
|   1 |  SORT ORDER BY                              |                             |    34 | 13668 |  3003   (1)| 00:00:22 |       |       |
|   2 |   NESTED LOOPS OUTER                        |                             |    34 | 13668 |  3002   (1)| 00:00:22 |       |       |
|   3 |    NESTED LOOPS OUTER                       |                             |    34 | 12648 |  2968   (1)| 00:00:22 |       |       |
|   4 |     NESTED LOOPS OUTER                      |                             |    34 | 12274 |  2935   (1)| 00:00:22 |       |       |
|   5 |      NESTED LOOPS OUTER                     |                             |    34 | 11900 |  2867   (1)| 00:00:21 |       |       |
|   6 |       NESTED LOOPS OUTER                    |                             |    34 | 11628 |  2833   (1)| 00:00:21 |       |       |
|   7 |        NESTED LOOPS OUTER                   |                             |    34 | 10404 |  2799   (1)| 00:00:21 |       |       |
|   8 |         NESTED LOOPS OUTER                  |                             |    34 |  8228 |  2765   (1)| 00:00:20 |       |       |
|   9 |          NESTED LOOPS                       |                             |    34 |  6732 |  2731   (1)| 00:00:20 |       |       |
|  10 |           NESTED LOOPS                      |                             |    69 | 12282 |  2524   (1)| 00:00:19 |       |       |
|  11 |            NESTED LOOPS                     |                             |    69 |  9591 |  2386   (1)| 00:00:18 |       |       |
|* 12 |             TABLE ACCESS FULL               | T1                          |  1658 |   213K|   725   (3)| 00:00:06 |       |       |
|* 13 |             TABLE ACCESS BY INDEX ROWID     | T2                          |     1 |     7 |     1   (0)| 00:00:01 |       |       |
|* 14 |              INDEX UNIQUE SCAN              | T2_U1                       |     1 |       |     0   (0)| 00:00:01 |       |       |
|  15 |            TABLE ACCESS BY INDEX ROWID      | T3                          |     1 |    39 |     2   (0)| 00:00:01 |       |       |
|* 16 |             INDEX UNIQUE SCAN               | T3_U1                       |     1 |       |     1   (0)| 00:00:01 |       |       |
|  17 |           TABLE ACCESS BY GLOBAL INDEX ROWID| T4                          |     1 |    20 |     3   (0)| 00:00:01 | ROWID | ROWID |
|* 18 |            INDEX RANGE SCAN                 | T4_N1                       |     1 |       |     2   (0)| 00:00:01 |       |       |

Post upgrade, plan has degraded to:

--------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name                        | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |                             |    69 | 27738 |   508M  (3)|999:59:59 |       |       |
|   1 |  NESTED LOOPS OUTER                          |                             |    69 | 27738 |   508M  (3)|999:59:59 |       |       |
|   2 |   NESTED LOOPS OUTER                         |                             |    69 | 26979 |   508M  (3)|999:59:59 |       |       |
|   3 |    NESTED LOOPS OUTER                        |                             |    69 | 26220 |   508M  (3)|999:59:59 |       |       |
|   4 |     NESTED LOOPS OUTER                       |                             |    69 | 23184 |   508M  (3)|999:59:59 |       |       |
|   5 |      NESTED LOOPS OUTER                      |                             |    69 | 22632 |   508M  (3)|999:59:59 |       |       |
|   6 |       NESTED LOOPS OUTER                     |                             |    69 | 20562 |   508M  (3)|999:59:59 |       |       |
|   7 |        NESTED LOOPS OUTER                    |                             |    69 | 16146 |   508M  (3)|999:59:59 |       |       |
|   8 |         NESTED LOOPS                         |                             |    69 | 13662 |   508M  (3)|999:59:59 |       |       |
|   9 |          NESTED LOOPS                        |                             |  1675 |   312K|   508M  (3)|999:59:59 |       |       |
|  10 |           NESTED LOOPS                       |                             |   703K|    39M|  3476K  (1)| 06:54:55 |       |       |
|* 11 |            TABLE ACCESS BY GLOBAL INDEX ROWID| T4                          |   703K|    13M|  2068K  (1)| 04:06:54 | ROWID | ROWID |
|  12 |             INDEX FULL SCAN                  | T4_N2                       |  2240K|       |  7943   (1)| 00:00:57 |       |       |
|  13 |            TABLE ACCESS BY INDEX ROWID       | T3                          |     1 |    39 |     2   (0)| 00:00:01 |       |       |
|* 14 |             INDEX UNIQUE SCAN                | T3_U1                       |     1 |       |     1   (0)| 00:00:01 |       |       |
|* 15 |           TABLE ACCESS FULL                  | T1                          |     1 |   132 |   718   (3)| 00:00:06 |       |       |

In the post-upgrade plan, the most pertinent part was step 15.

A FULL TABLE SCAN as the inner rowsource of a NESTED LOOP, in other words for every row in the driving rowsource do a full table scan.”

Standard reaction to such a degradation is to test the SQL with OPTIMIZER_FEATURES_ENABLE (OFE).

With OFE set either via OPT_PARAM hint or via ALTER SESSION then the SQL reverted to pre-upgrade plan.

This is good!

But… how much confidence can we have that this actually fixed the problem?

It would be great if we could find a bug which was definitively responsible for this change in execution plan.

OFE is an umbrella but it doesn’t turn off ALL fixes/optimizer features in a release.

Many of those fixes covered by the OFE umbrella are represented in V$SYSTEM_FIX_CONTROL

So, I went looking for anything which sounded related and went back to default 11.2.0.4 OFE and starting turning off the BUGNOs one-by-one – no difference, still a bad plan

Then still at OFE 11.2.0.4, I turned off all the BUGNOs…. no difference, still a bad plan.

At this point, my confidence in OFE as a fix for this problem was starting to waiver.

So, I tried to reduce the problem SQL to it’s smallest form representing the problem.

alter session set optimizer_mode = first_rows;
alter session set optimizer_features_enable = '11.2.0.4';

 explain plan for
 SELECT *
 FROM   T1
 ,      T3
 ,      T4
 WHERE  T1.COL1             = T3.COL1
 AND    T3.COL2             = T4.COL2
 AND    T1.DATE_COL        '+OUTLINE'));

This distilled statement reproduced the issue at the heart of the original statement:

----------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |         |  6029 |  4533K|  1612M  (3)|999:59:59 |       |       |
|   1 |  NESTED LOOPS                        |         |  6029 |  4533K|  1612M  (3)|999:59:59 |       |       |
|   2 |   NESTED LOOPS                       |         |  2237K|   795M|  6551K  (1)| 13:01:53 |       |       |
|   3 |    TABLE ACCESS BY GLOBAL INDEX ROWID| T4      |  2240K|   188M|  2068K  (1)| 04:06:53 | ROWID | ROWID |
|   4 |     INDEX FULL SCAN                  | T4_N2   |  2240K|       |  7943   (1)| 00:00:57 |       |       |
|   5 |    TABLE ACCESS BY INDEX ROWID       | T3      |     1 |   285 |     2   (0)| 00:00:01 |       |       |
|*  6 |     INDEX UNIQUE SCAN                | T3_U1   |     1 |       |     1   (0)| 00:00:01 |       |       |
|*  7 |   TABLE ACCESS FULL                  | T1      |     1 |   397 |   718   (3)| 00:00:06 |       |       |
----------------------------------------------------------------------------------------------------------------

Is this test case a 100% indication that OFE wasn’t applicable to the original problem?
Not sure.
But at this point I had littleconfidence that OFE provided any sort of reliable fix to the original SQL statement.

I don’t know what the internal heuristics are/were for FIRST_ROWS.

But that is a stupid place for a full table scan regardless of the number of rows expected to return.

If we remove one of the T1 predicates above and the various row estimates go up then the FTS moves to a driving position.

I could/should get a 10053 trace but FIRST_ROWS shouldn’t be in use in the first place.

Solution – hunt down where the app code is doing the FIRST_ROWS thing and exterminate.

All the other possible OPTIMIZER_MODE modes gave a plan for the original SQL which started like this:

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |      |    35 | 14105 |  1386   (2)| 00:00:10 |       |       |
|   1 |  SORT ORDER BY                              |      |    35 | 14105 |  1386   (2)| 00:00:10 |       |       |
|   2 |   NESTED LOOPS OUTER                        |      |    35 | 14105 |  1385   (2)| 00:00:10 |       |       |
|*  3 |    HASH JOIN OUTER                          |      |    35 | 12565 |  1350   (2)| 00:00:10 |       |       |
|*  4 |     HASH JOIN OUTER                         |      |    35 | 11305 |  1315   (2)| 00:00:10 |       |       |
|   5 |      NESTED LOOPS OUTER                     |      |    35 |  9065 |  1280   (2)| 00:00:10 |       |       |
|   6 |       NESTED LOOPS OUTER                    |      |    35 |  8015 |  1245   (2)| 00:00:09 |       |       |
|   7 |        NESTED LOOPS OUTER                   |      |    35 |  7735 |  1210   (2)| 00:00:09 |       |       |
|   8 |         NESTED LOOPS OUTER                  |      |    35 |  7350 |  1176   (2)| 00:00:09 |       |       |
|   9 |          NESTED LOOPS                       |      |    35 |  6965 |  1106   (2)| 00:00:08 |       |       |
|  10 |           NESTED LOOPS                      |      |    69 | 12351 |   899   (3)| 00:00:07 |       |       |
|* 11 |            HASH JOIN                        |      |    69 |  9660 |   761   (3)| 00:00:06 |       |       |
|* 12 |             TABLE ACCESS FULL               | T2   |   237 |  1659 |    36   (6)| 00:00:01 |       |       |
|* 13 |             TABLE ACCESS FULL               | T1   |  1671 |   217K|   725   (3)| 00:00:06 |       |       |

So…. when optimizer upgrade problems strike, you should always try OFE.

But it’s also worth further investigation to confirm that OFE really is a reliable fix for the problem.

COUNT STOPKEY – NVL

Yesterday I mentioned issues with a recursive delete operation on statistics history. This is a quick illustration of the last points I made on that post regarding the lack of a COUNT STOPKEY optimisation because of the use of NVL. COUNT STOPKEY is an optimisation which allows processing to stop once the target number of rows has been reached. For example:

create table t1 as select * from dba_objects;

alter session set statistics_level = all;

var rn number
exec :rn := 10;

select *
from   t1
where  rownum <= :rn
and    mod(object_id,5) = 0;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

Plan hash value: 3836375644

-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |     10 |00:00:00.01 |       4 | 
|*  1 |  COUNT STOPKEY     |      |      1 |        |     10 |00:00:00.01 |       4 |
|*  2 |   TABLE ACCESS FULL| T1   |      1 |  26148 |     10 |00:00:00.01 |       4 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=:RN)
   2 - filter(MOD("OBJECT_ID",5)=0) 

COUNT STOPKEY kicks in after we have fetched the relevant rows and stops any unnecessary further execution – note in particular A-Rows & Buffers for STEP 2 However, if we use NVL around our ROWNUM limitation, then this optimisation is prevented.

var rn number
exec :rn := 10;

select *
from   t1
where  rownum <= nvl(:rn,rownum)
and    mod(object_id,5) = 0;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

Plan hash value: 624922415

--------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |     10 |00:00:00.09 |    2310 |
|   1 |  COUNT              |      |      1 |        |     10 |00:00:00.09 |    2310 |
|*  2 |   FILTER            |      |      1 |        |     10 |00:00:00.09 |    2310 |
|*  3 |    TABLE ACCESS FULL| T1   |      1 |  26148 |  29630 |00:00:00.08 |    2310 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(ROWNUM<=NVL(:RN,ROWNUM))
   3 - filter(MOD("OBJECT_ID",5)=0)

Understanding vs Resolution – Statistics History Cleanup

Today I helped resolve a problem quickly but to the detriment of my understanding of exactly what was going on and why.

And as part of the quicker resolution, I had to destroy the evidence which would have helped understand better.

So… now need to go back and figure it out if I have time to model it properly, etc.

Here’s what little I know so far.

What happened was that there were a number of application sessions experiencing slowness and contention when executing a recursive stats history cleanup statement.

Verified via ASH that this recursive delete was somehow being called by app ETL code (TOP_LEVEL_SQL_ID, PLSQL_ENTRY_OBJECT_ID & USER_ID columns), four of them each running slow statement and also because of that nature of the statement below and being blocked by mode 6 TX locks from the leading execution.

Version is 11.2.0.3

Statement was sql id 9v9n97qj8z1dg:

delete /*+ dynamic_sampling(4) */ 
from sys.wri$_optstat_histhead_history 
where savtime < :1 
and rownum <= NVL(:2, rownum);

First up, resolution was quick and easy according to, by a quirk of coincidence, my last post:
https://orastory.wordpress.com/2015/02/16/dbms_stats-purge_stats/

We just got rid of all the statistics history using the magic PURGE_ALL truncate flag.
The history is of limited usefulness day-to-day anyway.

The slowness was holding up ETL jobs which were just trying to calculate stats for their own partition.

I was brought into the situation towards the end of the situation but here’s an illustration of the slowness:

SQL Text
------------------------------
delete /*+ dynamic_sampling(4) */ from sys.wri$_optstat_histhead_history where savtime < :1 and rownum <= NVL(:2, rownum)

Global Information
------------------------------
 Status              :  EXECUTING               
 Instance ID         :  1                       
 SQL ID              :  9v9n97qj8z1dg           
 SQL Execution ID    :  16777249                
 Execution Started   :  02/24/2015 19:11:25     
 First Refresh Time  :  02/24/2015 19:13:06     
 Last Refresh Time   :  02/25/2015 09:10:35     
 Duration            :  50351s                  
 Module/Action       :  JDBC Thin Client/-      
 Program             :  JDBC Thin Client        

Binds
========================================================================================================================
| Name | Position |  Type  |                                           Value                                           |
========================================================================================================================
| :2   |        2 | NUMBER | 10000                                                                                     |
========================================================================================================================

Global Stats
===============================================================================================
| Elapsed |   Cpu   |    IO    | Application | Concurrency |  Other   | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   |  Waits(s)   | Waits(s) |  Gets  | Reqs | Bytes |
===============================================================================================
|   50359 |   35199 |     0.16 |       14669 |         254 |      237 |     1G |   63 | 504KB |
===============================================================================================

SQL Plan Monitoring Details (Plan Hash Value=2348801730)
=======================================================================================================================================================================================================
| Id   |            Operation             |             Name              |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read | Read  | Activity |            Activity Detail            |
|      |                                  |                               | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs | Bytes |   (%)    |              (# samples)              |
=======================================================================================================================================================================================================
|    0 | DELETE STATEMENT                 |                               |         |       |     50262 |   +101 |  1423 |        1 |      |       |          |                                       |
|    1 |   DELETE                         | WRI$_OPTSTAT_HISTHEAD_HISTORY |         |       |     50361 |     +2 |  1423 |        1 |      |       |    29.74 | enq: TX - row lock contention (14630) |
|      |                                  |                               |         |       |           |        |       |          |      |       |          | buffer busy waits (231)               |
|      |                                  |                               |         |       |           |        |       |          |      |       |          | Cpu (82)                              |
|      |                                  |                               |         |       |           |        |       |          |      |       |          | resmgr:cpu quantum (5)                |
|    2 |    COUNT                         |                               |         |       |     50262 |   +101 |  1423 |      14M |      |       |     0.00 | Cpu (1)                               |
|    3 |     FILTER                       |                               |         |       |     50266 |    +97 |  1423 |      14M |      |       |    30.14 | Cpu (15146)                           |
|    4 |      TABLE ACCESS BY INDEX ROWID | WRI$_OPTSTAT_HISTHEAD_HISTORY |      6M | 23218 |     50253 |   +100 |  1423 |       4G |   22 | 176KB |    23.17 | buffer busy waits (18)                |
|      |                                  |                               |         |       |           |        |       |          |      |       |          | Cpu (11627)                           |
| -> 5 |       INDEX RANGE SCAN           | I_WRI$_OPTSTAT_HH_ST          |    990K |  5827 |     50264 |   +101 |  1423 |       4G |   41 | 328KB |    16.94 | buffer busy waits (3)                 |
|      |                                  |                               |         |       |           |        |       |          |      |       |          | Cpu (8332)                            |
|      |                                  |                               |         |       |           |        |       |          |      |       |          | resmgr:cpu quantum (180)              |
=======================================================================================================================================================================================================
   3 - filter(ROWNUM<=NVL(:2,ROWNUM))
   5 - access("WRI$_OPTSTAT_HISTHEAD_HISTORY"."SYS_NC00024$"<SYS_EXTRACT_UTC(:1))

So, first thought was that:

1. I thought that MMON was responsible for purging old stats… clearly not in this case as there were multiple concurrent application connections purging old data as part of their ETL process.

2. The DELETE is deleting any old history older than a parameterised date, the first 10000 rows thereof. There is no connection to the object on whose statistics the application ETL jobs were working on.

3. I would expect a COUNT STOPKEY operation but then the NVL predicate avoids this optimisation.

4. UPDATE: The real time sql monitoring output is also reporting the metrics for 1423 executions of this statement. The clue is in the 14M rows reported in the COUNT + FILTER operations. 1423 * 10000 rows = 14M. But I requested the output for what I thought was a single execution id – 16777249 – strange… bulk/batched delete ?

More questions than answers…

ORs, IN lists and LNNVL

I’ve previously written about manually rewriting an OR condition into a UNION ALL using LNNVL.

This is a description of a performance issue observed in the real world from the optimizer coming up with a CONCATENATION operation against many child operations including an INLIST operator and other children which then have to use LNNVL as a filter.

Really this is nothing more than an illustration of our old performance killing friend the function call from within SQL, particularly from within predicates, and the problem of context switching.

Ok. So, the real world example is a crazy query.
It contains 1091 OR clauses each stipulating a unique key lookup.
Yes, “1091” OR clauses.
I know, I know.
Probably itself a workaround to avoid the 1000 limit on an IN list, who knows?
Anyway.

I’m going to use a table just to illustrate the execution plan and then use some metrics from the execution of the real world example.

drop table t1;

create table t1
(col1 number
,col2 number
,col3 varchar2(10)
,constraint pk_t1 primary key (col1, col2));

insert into t1
select mod(rownum,10000)+1 col1
,      ceil(rownum/10000)  col2
,      rpad('X',10,'X')    col3
from   dual
connect by rownum <= 100000
order by col1, col2;

commit;

If we use a simple multi-column OR condition, we don’t get the desired LNNVLs.

explain plan for
select *
from   t1
where (col1 = 1
and    col2 = 1)
or    (col1 = 2
and    col2 = 2)
or    (col1 = 3
and    col2 = 3)
or    (col1 = 3
and    col2 = 4);

select * from table(dbms_xplan.display);

The optimizer just uses an INLIST iterator:

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |  1129 | 37257 |     5   (0)| 00:00:01 |
|   1 |  INLIST ITERATOR             |       |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |  1129 | 37257 |     5   (0)| 00:00:01 |
|*  3 |    INDEX UNIQUE SCAN         | PK_T1 |     7 |       |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - access(("COL1"=1 AND "COL2"=1 OR "COL1"=2 AND "COL2"=2 OR "COL1"=3 AND 
              "COL2"=3 OR "COL1"=3 AND "COL2"=4))

However, if we combine an IN list with some of those ORs:

explain plan for
select *
from   t1
where (col1 = 1
and    col2 = 1)
or    (col1 = 2
and    col2 = 2)
or    (col1 = 3
and    col2 IN (3,4));

select * from table(dbms_xplan.display);

Which gives

---------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |  1140 | 37620 |     2   (0)| 00:00:01 |
|   1 |  CONCATENATION                |       |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID | T1    |    11 |   363 |     1   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN           | PK_T1 |     1 |       |     2   (0)| 00:00:01 |
|   4 |   INLIST ITERATOR             |       |       |       |            |          |
|   5 |    TABLE ACCESS BY INDEX ROWID| T1    |  1129 | 37257 |     1   (0)| 00:00:01 |
|*  6 |     INDEX UNIQUE SCAN         | PK_T1 |     1 |       |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - access("COL1"=3)
       filter("COL2"=3 OR "COL2"=4)
   6 - access(("COL1"=1 AND "COL2"=1 OR "COL1"=2 AND "COL2"=2))
       filter(LNNVL("COL1"=3) OR LNNVL("COL2"=3) AND LNNVL("COL2"=4))

We get the CONCATENATION of the simpler OR conditions which are combined into a single INLIST ITERATOR operations with another child operation which applies LNNVLs in the filter predicate.

Now imagine we have a much larger number of the simpler OR predicates combined with liberal scatterings of such IN clauses throughout our 1091 OR’d predicates.

Perhaps we’d be a bit concerned about the context switching and increased cpu usage?

Looking at my real world example right here… there’s no point me pasting in the thousands of lines.

If I run the real world behemoth:

Client elapsed time 108 seconds

DBMS_XPLAN.DISPLAY_CURSOR tells me it ran in 1.26 seconds:

plan hash value: 4262066066  
 
-------------------------------------------------------------------------------------------------------------------- 
| Id  | Operation                     | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | 
--------------------------------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT              |                 |      1 |        |   1628 |00:00:01.26 |    3352 |    418 | 
|   1 |  CONCATENATION                |                 |      1 |        |   1628 |00:00:01.26 |    3352 |    418 |  
|   2 |   INLIST ITERATOR             |                 |      1 |        |    966 |00:00:00.94 |    2735 |    389 | 
|   3 |    TABLE ACCESS BY INDEX ROWID| XXXXXXXXXX      |    966 |    624 |    966 |00:00:00.94 |    2735 |    389 | 
|*  4 |     INDEX RANGE SCAN          | XXXXXXXXXX_UK01 |    966 |    624 |    966 |00:00:00.86 |    1915 |    380 | 
|   5 |   TABLE ACCESS BY INDEX ROWID | XXXXXXXXXX      |      1 |      1 |      2 |00:00:00.01 |       4 |      0 |  
|*  6 |    INDEX RANGE SCAN           | XXXXXXXXXX_UK01 |      1 |      1 |      2 |00:00:00.01 |       3 |      0 |  
....
| 253 |   TABLE ACCESS BY INDEX ROWID | XXXXXXXXXX      |      1 |      2 |     47 |00:00:00.01 |      14 |      1 |  
|*254 |    INDEX RANGE SCAN           | XXXXXXXXXX_UK01 |      1 |      2 |     47 |00:00:00.01 |       3 |      1 |   
--------------------------------------------------------------------------------------------------------------------  

Predicate Information (identified by operation id):
--------------------------------------------------- 
   4 - access(((("XXXXXXXXXX"."VERSION"=1 AND "XXXXXXXXXX"."ID_XXXXXXXXXX"=1611722) OR             
              ("XXXXXXXXXX"."VERSION"=1 AND "XXXXXXXXXX"."ID_XXXXXXXXXX"=1611795) OR ("XXXXXXXXXX"."VERSION"=1 AND                                       
              "XXXXXXXXXX"."ID_XXXXXXXXXX"=1611863) OR ("XXXXXXXXXX"."VERSION"=1 AND
              "XXXXXXXXXX"."ID_XXXXXXXXXX"=1612023) OR ("XXXXXXXXXX"."VERSION"=1 AND 			  .....
  6 - access "XXXXXXXXXX"."ID_XXXXXXXXXX"=2046939) 
       filter(((LNNVL("XXXXXXXXXX"."VERSION"=1) OR LNNVL("XXXXXXXXXX"."ID_XXXXXXXXXX"=1611722)) AND 
           (LNNVL("XXXXXXXXXX"."VERSION"=1) OR LNNVL("XXXXXXXXXX"."ID_XXXXXXXXXX"=1611795)) AND 
              (LNNVL("XXXXXXXXXX"."VERSION"=1) OR LNNVL("XXXXXXXXXX"."ID_XXXXXXXXXX"=1611863)) AND 
.....

DBMS_MONITOR tells me the query ran in 1.68 seconds:

Global Information
 Status              :  DONE (ALL ROWS)     
 Instance ID         :  2                   
 Session ID          :  1220                
 SQL ID              :  5fvt3tfbgmqp3       
 SQL Execution ID    :  33554432            
 Plan Hash Value     :  4262066066          
 Execution Started   :  04/05/2013 14:53:42 
 First Refresh Time  :  04/05/2013 14:53:42 
 Last Refresh Time   :  04/05/2013 14:53:45 

--------------------------------------------------------------------
| Elapsed |   Cpu   |    IO    | Cluster  | Fetch | Buffer | Reads |
| Time(s) | Time(s) | Waits(s) | Waits(s) | Calls |  Gets  |       |
--------------------------------------------------------------------
|    1.68 |    0.50 |     1.13 |     0.05 |     4 |   3352 |   418 |
--------------------------------------------------------------------

Fortunately V$SQL tells me the real story:

select sql_id, executions, rows_processed, elapsed_time/1000/1000, cpu_time/1000/1000 
from v$sql where sql_id = '5fvt3tfbgmqp3';

Gives

SQL_ID        EXECUTIONS ROWS_PROCESSED ELAPSED_TIME/1000/1000 CPU_TIME/1000/1000
------------- ---------- -------------- ---------------------- ------------------
5fvt3tfbgmqp3          1           1628              105.81511         104.581102

Then again, if you write nasty code, nasty things can happen.

Poll: SQL Plan Management

I’m interested in what SQL Plan Management features people are actively using.
Read more of this post

ORA-00600 [kkqtSetOp.1] – Join Factorization

Just a quick note about an ORA-00600 that recently occurred following an upgrade to 11.2.0.3.

I’ve not been able to distill an isolated test case from the specific production code generating this error so I’ll skip the application-specific example.

The error message reported was:
Read more of this post

Follow

Get every new post delivered to your Inbox.

Join 317 other followers