Did it really fix it 2: Plan flip, unprofileable, baseline OFE round 2 and “stability”

Snappy title, huh?

Aka: Why a sql plan baseline may be no guarantee of stability.

The other day, a problematic plan flip…

Tangent 1:
Cue much discussion about plan flips etc.
My thoughts on stability are that the priority for most application owners is stability and predictability but that does not tally with the defaul CBO behaviour and potentially you have to turn off a helluva lot to even get close.

Tangent 2:
I have commented before about the common sensitive of many databases to concurrent direct path reads (or rather the sensitivity is on the underlying SAN).

Here is one such illustration caused by this particular plan flip.

See that our single block read times go out from a “normal” range of 5 – 20ms to an average of 50+ ms. At this point anything doing physical IO starts to notice.
(Needless to say, I’m not a fan of these “normal” IO times – I think they’re awful but the SLA of the tier 2 (non flash) SAN storage is 20ms so we don’t alert until we breach that.)
sbr

Source:

select snap_id, begin_time, end_time, round(average,2)
from   dba_hist_sysmetric_summary
where  metric_name     = 'Average Synchronous Single-Block Read Latency'
order by snap_id;

Back to reality
Anyway, standard immediate-term fix for a plan flip is to hunt down a better plan from AWR and fix it with a profile or a baseline.

Not a problem to find one for this SQL as it nearly always executed with the same decent plan.
(AWR history is much longer than this but it gives the picture)

select to_char(trunc(sn.end_interval_time),'DD-MON-YYYY') dt
,      st.sql_id
,      st.plan_hash_value
,      sum(rows_processed_delta) rws
,      sum(executions_delta)     execs
,      round(sum(elapsed_time_delta)/1000/1000)   elp
,      round(sum(elapsed_time_delta)/1000/1000/nvl(nullif(sum(executions_delta),0),1),2)   elpe
,      round(sum(iowait_delta)/1000/1000)         io
,      round(sum(cpu_time_delta)/1000/1000)       cpu
,      sum(buffer_gets_delta)    gets
,      round(sum(disk_reads_delta))         disk_reads
from   dba_hist_snapshot sn
,      dba_hist_sqlstat  st
where  st.snap_id            = sn.snap_id
and    sn.instance_number = st.instance_number
and    st.sql_id             IN ('6xrx006fmqbq2')
group by trunc(sn.end_interval_time), st.sql_id, st.plan_hash_value
order by trunc(sn.end_interval_time) desc, elp desc; 
DT          SQL_ID        PLAN_HASH_VALUE        RWS      EXECS        ELP       ELPE         IO        CPU       GETS DISK_READS
----------- ------------- --------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
10-NOV-2015 6xrx006fmqbq2      3871506300          5       1815     236211     130.14     226340       9993  757853211  745588219 
09-NOV-2015 6xrx006fmqbq2      3871506300          1       1232     156412     126.96     149660       6846  507571690  499305166 
09-NOV-2015 6xrx006fmqbq2      3803735407         53       6835        743        .11          3        723  190488567         41 
07-NOV-2015 6xrx006fmqbq2      3803735407         31       3079        230        .07          3        218   37385043         56 
06-NOV-2015 6xrx006fmqbq2      3803735407        166      18931       1200        .06         11       1128  180142678        484 
05-NOV-2015 6xrx006fmqbq2      3803735407        305       9904        553        .06          5        508   75239139         93 
04-NOV-2015 6xrx006fmqbq2      3803735407        160      18900       1089        .06          9       1027  150523728        204 
03-NOV-2015 6xrx006fmqbq2      3803735407        226      12048        859        .07         58        794  154111239        424 
02-NOV-2015 6xrx006fmqbq2      3803735407       1081      13327       1276         .1         66       1200  278129234        465 

Problem plan is obvious, right?

Started during the day on the 9th, continued overnight until eventually noticed.

PHV 3871506300 does a FTS and gets direct path reads which causes our IO stress.

So, as mentioned, no problem, hint it with a sql profile via COE_XFR_SQL_PROFILE.sql (see Metalink note 215187.1)

But it didn’t work.

Not properly.

See top line.

DT          SQL_ID        PLAN_HASH_VALUE        RWS      EXECS        ELP       ELPE         IO        CPU       GETS DISK_READS
----------- ------------- --------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
10-NOV-2015 6xrx006fmqbq2      3353364157         20       4485     114084      25.44          4     111246 2907305774        216
10-NOV-2015 6xrx006fmqbq2      3871506300          5       1815     236211     130.14     226340       9993  757853211  745588219 
09-NOV-2015 6xrx006fmqbq2      3871506300          1       1232     156412     126.96     149660       6846  507571690  499305166 
09-NOV-2015 6xrx006fmqbq2      3803735407         53       6835        743        .11          3        723  190488567         41 
07-NOV-2015 6xrx006fmqbq2      3803735407         31       3079        230        .07          3        218   37385043         56 
06-NOV-2015 6xrx006fmqbq2      3803735407        166      18931       1200        .06         11       1128  180142678        484 
05-NOV-2015 6xrx006fmqbq2      3803735407        305       9904        553        .06          5        508   75239139         93 
04-NOV-2015 6xrx006fmqbq2      3803735407        160      18900       1089        .06          9       1027  150523728        204 
03-NOV-2015 6xrx006fmqbq2      3803735407        226      12048        859        .07         58        794  154111239        424 
02-NOV-2015 6xrx006fmqbq2      3803735407       1081      13327       1276         .1         66       1200  278129234        465 

We didn’t get “good” plan PHV 3803735407, we avoided “nasty” plan PHV 3871506300 but we got not-quite-so-nasty-but-still-not-nice PHV 3353364157.

The SQL Profile was not able to reproduce the desired PHV.

But at least PHV was not doing the problematic direct path reads!

I took the hints from PHV 3803735407 via

select * from table(dbms_xplan.display_awr('6xrx006fmqbq2',3803735407,format=>'+OUTLINE'));

I double checked they matched the hints in the SQL Profile – they did.

I tried using the hints manually in a SQL statement and the result was consistent with the SQL Profile – PHV 3353364157.

At this point, the DBA suggested using a SQL Plan Baseline.

I resisted but eventually gave in.

Why would it make any difference?

When a plan cannot be reproduced then there are good reasons why a baseline is much, much safer than a profile.

A baseline must reproduce the desired PHV or it will be rejected completely.

A profile has no knowledge of PHV so it is possible that under certain circumstances the hints in a profile may be only partially applied.

Profile was dropped and baseline was created.

This was the immediate aftermath (top line)… Success!

And V$SQL was showing that the baseline was being used.

DT          SQL_ID        PLAN_HASH_VALUE        RWS      EXECS        ELP       ELPE         IO        CPU       GETS DISK_READS
----------- ------------- --------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
10-NOV-2015 6xrx006fmqbq2      3803735407          4       1866        120        .06          1        116   17401768         30 
10-NOV-2015 6xrx006fmqbq2      3353364157         20       4485     114084      25.44          4     111246 2907305774        216
10-NOV-2015 6xrx006fmqbq2      3871506300          5       1815     236211     130.14     226340       9993  757853211  745588219 
09-NOV-2015 6xrx006fmqbq2      3871506300          1       1232     156412     126.96     149660       6846  507571690  499305166 
09-NOV-2015 6xrx006fmqbq2      3871506300          1       1232     156412     126.96     149660       6846  507571690  499305166 
09-NOV-2015 6xrx006fmqbq2      3803735407         53       6835        743        .11          3        723  190488567         41 
07-NOV-2015 6xrx006fmqbq2      3803735407         31       3079        230        .07          3        218   37385043         56 
06-NOV-2015 6xrx006fmqbq2      3803735407        166      18931       1200        .06         11       1128  180142678        484 
05-NOV-2015 6xrx006fmqbq2      3803735407        305       9904        553        .06          5        508   75239139         93 
04-NOV-2015 6xrx006fmqbq2      3803735407        160      18900       1089        .06          9       1027  150523728        204 
03-NOV-2015 6xrx006fmqbq2      3803735407        226      12048        859        .07         58        794  154111239        424 
02-NOV-2015 6xrx006fmqbq2      3803735407       1081      13327       1276         .1         66       1200  278129234        465 

I maintained that this didn’t make sense but the ticket was closed so I couldn’t get an SPM trace.

At this point a copy of the statement without profile and without baseline and without the manual fullset of hints was returning the original PHV 3803735407.

I went away that night and on the train I had a thought: “what if it wasn’t round 1 of the baseline which worked but round 2”.

Reminder of the baseline reproduction system

Round 2 is OFE only.

Doesn’t matter whether it is round 1 or round 2, if it’s reproduced it’s reproduced.

But if it was under OFE only then that was no guarantee of stability.

Then the following night, late night call, the baseline was no longer reproducing PHV 3803735407 but was back to the big problem PHV 3871506300!

Although I didn’t have an SPM trace, I’m adamant it verified my theory about it being round 2 only.

DT          SQL_ID        PLAN_HASH_VALUE        RWS      EXECS        ELP       ELPE         IO        CPU       GETS DISK_READS
----------- ------------- --------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
11-NOV-2015 6xrx006fmqbq2      3871506300          0        471      80513     170.94      77936       2543  193096137  188992511 
11-NOV-2015 6xrx006fmqbq2      3803735407        212       9933        690        .07         14        586   97253038        154 
10-NOV-2015 6xrx006fmqbq2      3803735407          4       1866        120        .06          1        116   17401768         30 
10-NOV-2015 6xrx006fmqbq2      3353364157         20       4485     114084      25.44          4     111246 2907305774        216
10-NOV-2015 6xrx006fmqbq2      3871506300          5       1815     236211     130.14     226340       9993  757853211  745588219 
09-NOV-2015 6xrx006fmqbq2      3871506300          1       1232     156412     126.96     149660       6846  507571690  499305166 
09-NOV-2015 6xrx006fmqbq2      3871506300          1       1232     156412     126.96     149660       6846  507571690  499305166 
09-NOV-2015 6xrx006fmqbq2      3803735407         53       6835        743        .11          3        723  190488567         41 
07-NOV-2015 6xrx006fmqbq2      3803735407         31       3079        230        .07          3        218   37385043         56 
06-NOV-2015 6xrx006fmqbq2      3803735407        166      18931       1200        .06         11       1128  180142678        484 
05-NOV-2015 6xrx006fmqbq2      3803735407        305       9904        553        .06          5        508   75239139         93 
04-NOV-2015 6xrx006fmqbq2      3803735407        160      18900       1089        .06          9       1027  150523728        204 
03-NOV-2015 6xrx006fmqbq2      3803735407        226      12048        859        .07         58        794  154111239        424 
02-NOV-2015 6xrx006fmqbq2      3803735407       1081      13327       1276         .1         66       1200  278129234        465 

Note that at times of change, no database structures were being changed.
Stats will have been gathered, sure – that was the agent of change no doubt.
But no index maintenance was happening, nothing was going unusable, no partition maintenance was happening.
Nothing was changing which should mean that a particular plan was impossible.

This means that the set of hints for this particular plan was not deterministic or rather the hintset was surely incomplete in some way which mean that the optimizer was free to do certain transformations which meant that the hintset was then invalid? Not 100% sure, still to be looked at in more detail…

At this point we dropped the baseline and went with a cut-down sql profile which applied only two hints – to politely request index scans for the two problematic FTS in each of PHV 3871506300 and 3353364157.

And this worked and has been ok since (so far) but it will not be the longer term solution.

DT          SQL_ID        PLAN_HASH_VALUE        RWS      EXECS        ELP       ELPE         IO        CPU       GETS DISK_READS
----------- ------------- --------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
16-NOV-2015 6xrx006fmqbq2      3803735407         80       6887        348        .05         11        310   44574494         86 
13-NOV-2015 6xrx006fmqbq2      3803735407        114      13054        842        .06         19        713  117260543        482 
12-NOV-2015 6xrx006fmqbq2      3803735407        585      28074       1854        .07         12       1823  321890748       1983 
11-NOV-2015 6xrx006fmqbq2      3871506300          0        471      80513     170.94      77936       2543  193096137  188992511 
11-NOV-2015 6xrx006fmqbq2      3803735407        212       9933        690        .07         14        586   97253038        154 
10-NOV-2015 6xrx006fmqbq2      3871506300          5       1815     236211     130.14     226340       9993  757853211  745588219 
10-NOV-2015 6xrx006fmqbq2      3353364157         20       4485     114084      25.44          4     111246 2907305774        216 
10-NOV-2015 6xrx006fmqbq2      3803735407          4       1866        120        .06          1        116   17401768         30 
09-NOV-2015 6xrx006fmqbq2      3871506300          1       1232     156412     126.96     149660       6846  507571690  499305166 
09-NOV-2015 6xrx006fmqbq2      3803735407         53       6835        743        .11          3        723  190488567         41 
07-NOV-2015 6xrx006fmqbq2      3803735407         31       3079        230        .07          3        218   37385043         56 
06-NOV-2015 6xrx006fmqbq2      3803735407        166      18931       1200        .06         11       1128  180142678        484 
05-NOV-2015 6xrx006fmqbq2      3803735407        305       9904        553        .06          5        508   75239139         93 
04-NOV-2015 6xrx006fmqbq2      3803735407        160      18900       1089        .06          9       1027  150523728        204 
03-NOV-2015 6xrx006fmqbq2      3803735407        226      12048        859        .07         58        794  154111239        424 
02-NOV-2015 6xrx006fmqbq2      3803735407       1081      13327       1276         .1         66       1200  278129234        465

Never before seen an execution plan which had quite this problem interacting with SPM.

A test case to get to the root of the problem is still on the TODO list.

Something to do with a statement involving a WITH subquery and NO_MERGE
e.g.

WITH x AS (SELECT /*+ no_merge...)
SELECT 
FROM ...
WHERE EXISTS (SELECT
              FROM   x
              ...
              WHERE...);

Did it really fix it 1: OFE & nasty FIRST_ROWS plan on 11.2.0.4

Plan degradations on upgrade are normal.

This one’s no different.

On further investigation, turned out application was setting optimizer_mode = first_rows somewhere.

First point about this is that first_rows really shouldn’t be used either as a hint or as an optimizer_mode.

What does FIRST_ROWS mean?
From 11g doco:

The optimizer uses a mix of costs and heuristics to find a best plan for fast delivery of the first few rows.

If any sort of FIRST_ROWS type intervention is required, then it might be better to be using a more modern, fully-costed FIRST_ROWS_n (where n = 1 | 10 | 100 | 1000).
If it’s even required.

The 12c documentation is clearer about this:

FIRST_ROWS is available for backward compatibility and plan stability; use FIRST_ROWS_n instead.

But I don’t think we should attach too much credence to that statement not being in the 11g documentation.

Here’s the problem plan.

Pre upgrade, snippet of the plan looked as per below.
Please note position of T1.

-------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name                        | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |                             |    34 | 13668 |  3003   (1)| 00:00:22 |       |       |
|   1 |  SORT ORDER BY                              |                             |    34 | 13668 |  3003   (1)| 00:00:22 |       |       |
|   2 |   NESTED LOOPS OUTER                        |                             |    34 | 13668 |  3002   (1)| 00:00:22 |       |       |
|   3 |    NESTED LOOPS OUTER                       |                             |    34 | 12648 |  2968   (1)| 00:00:22 |       |       |
|   4 |     NESTED LOOPS OUTER                      |                             |    34 | 12274 |  2935   (1)| 00:00:22 |       |       |
|   5 |      NESTED LOOPS OUTER                     |                             |    34 | 11900 |  2867   (1)| 00:00:21 |       |       |
|   6 |       NESTED LOOPS OUTER                    |                             |    34 | 11628 |  2833   (1)| 00:00:21 |       |       |
|   7 |        NESTED LOOPS OUTER                   |                             |    34 | 10404 |  2799   (1)| 00:00:21 |       |       |
|   8 |         NESTED LOOPS OUTER                  |                             |    34 |  8228 |  2765   (1)| 00:00:20 |       |       |
|   9 |          NESTED LOOPS                       |                             |    34 |  6732 |  2731   (1)| 00:00:20 |       |       |
|  10 |           NESTED LOOPS                      |                             |    69 | 12282 |  2524   (1)| 00:00:19 |       |       |
|  11 |            NESTED LOOPS                     |                             |    69 |  9591 |  2386   (1)| 00:00:18 |       |       |
|* 12 |             TABLE ACCESS FULL               | T1                          |  1658 |   213K|   725   (3)| 00:00:06 |       |       |
|* 13 |             TABLE ACCESS BY INDEX ROWID     | T2                          |     1 |     7 |     1   (0)| 00:00:01 |       |       |
|* 14 |              INDEX UNIQUE SCAN              | T2_U1                       |     1 |       |     0   (0)| 00:00:01 |       |       |
|  15 |            TABLE ACCESS BY INDEX ROWID      | T3                          |     1 |    39 |     2   (0)| 00:00:01 |       |       |
|* 16 |             INDEX UNIQUE SCAN               | T3_U1                       |     1 |       |     1   (0)| 00:00:01 |       |       |
|  17 |           TABLE ACCESS BY GLOBAL INDEX ROWID| T4                          |     1 |    20 |     3   (0)| 00:00:01 | ROWID | ROWID |
|* 18 |            INDEX RANGE SCAN                 | T4_N1                       |     1 |       |     2   (0)| 00:00:01 |       |       |

Post upgrade, plan has degraded to:

--------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name                        | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |                             |    69 | 27738 |   508M  (3)|999:59:59 |       |       |
|   1 |  NESTED LOOPS OUTER                          |                             |    69 | 27738 |   508M  (3)|999:59:59 |       |       |
|   2 |   NESTED LOOPS OUTER                         |                             |    69 | 26979 |   508M  (3)|999:59:59 |       |       |
|   3 |    NESTED LOOPS OUTER                        |                             |    69 | 26220 |   508M  (3)|999:59:59 |       |       |
|   4 |     NESTED LOOPS OUTER                       |                             |    69 | 23184 |   508M  (3)|999:59:59 |       |       |
|   5 |      NESTED LOOPS OUTER                      |                             |    69 | 22632 |   508M  (3)|999:59:59 |       |       |
|   6 |       NESTED LOOPS OUTER                     |                             |    69 | 20562 |   508M  (3)|999:59:59 |       |       |
|   7 |        NESTED LOOPS OUTER                    |                             |    69 | 16146 |   508M  (3)|999:59:59 |       |       |
|   8 |         NESTED LOOPS                         |                             |    69 | 13662 |   508M  (3)|999:59:59 |       |       |
|   9 |          NESTED LOOPS                        |                             |  1675 |   312K|   508M  (3)|999:59:59 |       |       |
|  10 |           NESTED LOOPS                       |                             |   703K|    39M|  3476K  (1)| 06:54:55 |       |       |
|* 11 |            TABLE ACCESS BY GLOBAL INDEX ROWID| T4                          |   703K|    13M|  2068K  (1)| 04:06:54 | ROWID | ROWID |
|  12 |             INDEX FULL SCAN                  | T4_N2                       |  2240K|       |  7943   (1)| 00:00:57 |       |       |
|  13 |            TABLE ACCESS BY INDEX ROWID       | T3                          |     1 |    39 |     2   (0)| 00:00:01 |       |       |
|* 14 |             INDEX UNIQUE SCAN                | T3_U1                       |     1 |       |     1   (0)| 00:00:01 |       |       |
|* 15 |           TABLE ACCESS FULL                  | T1                          |     1 |   132 |   718   (3)| 00:00:06 |       |       |

In the post-upgrade plan, the most pertinent part was step 15.

A FULL TABLE SCAN as the inner rowsource of a NESTED LOOP, in other words for every row in the driving rowsource do a full table scan.”

Standard reaction to such a degradation is to test the SQL with OPTIMIZER_FEATURES_ENABLE (OFE).

With OFE set either via OPT_PARAM hint or via ALTER SESSION then the SQL reverted to pre-upgrade plan.

This is good!

But… how much confidence can we have that this actually fixed the problem?

It would be great if we could find a bug which was definitively responsible for this change in execution plan.

OFE is an umbrella but it doesn’t turn off ALL fixes/optimizer features in a release.

Many of those fixes covered by the OFE umbrella are represented in V$SYSTEM_FIX_CONTROL

So, I went looking for anything which sounded related and went back to default 11.2.0.4 OFE and starting turning off the BUGNOs one-by-one – no difference, still a bad plan

Then still at OFE 11.2.0.4, I turned off all the BUGNOs…. no difference, still a bad plan.

At this point, my confidence in OFE as a fix for this problem was starting to waiver.

So, I tried to reduce the problem SQL to it’s smallest form representing the problem.

alter session set optimizer_mode = first_rows;
alter session set optimizer_features_enable = '11.2.0.4';

 explain plan for
 SELECT *
 FROM   T1
 ,      T3
 ,      T4
 WHERE  T1.COL1             = T3.COL1
 AND    T3.COL2             = T4.COL2
 AND    T1.DATE_COL        '+OUTLINE'));

This distilled statement reproduced the issue at the heart of the original statement:

----------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |         |  6029 |  4533K|  1612M  (3)|999:59:59 |       |       |
|   1 |  NESTED LOOPS                        |         |  6029 |  4533K|  1612M  (3)|999:59:59 |       |       |
|   2 |   NESTED LOOPS                       |         |  2237K|   795M|  6551K  (1)| 13:01:53 |       |       |
|   3 |    TABLE ACCESS BY GLOBAL INDEX ROWID| T4      |  2240K|   188M|  2068K  (1)| 04:06:53 | ROWID | ROWID |
|   4 |     INDEX FULL SCAN                  | T4_N2   |  2240K|       |  7943   (1)| 00:00:57 |       |       |
|   5 |    TABLE ACCESS BY INDEX ROWID       | T3      |     1 |   285 |     2   (0)| 00:00:01 |       |       |
|*  6 |     INDEX UNIQUE SCAN                | T3_U1   |     1 |       |     1   (0)| 00:00:01 |       |       |
|*  7 |   TABLE ACCESS FULL                  | T1      |     1 |   397 |   718   (3)| 00:00:06 |       |       |
----------------------------------------------------------------------------------------------------------------

Is this test case a 100% indication that OFE wasn’t applicable to the original problem?
Not sure.
But at this point I had littleconfidence that OFE provided any sort of reliable fix to the original SQL statement.

I don’t know what the internal heuristics are/were for FIRST_ROWS.

But that is a stupid place for a full table scan regardless of the number of rows expected to return.

If we remove one of the T1 predicates above and the various row estimates go up then the FTS moves to a driving position.

I could/should get a 10053 trace but FIRST_ROWS shouldn’t be in use in the first place.

Solution – hunt down where the app code is doing the FIRST_ROWS thing and exterminate.

All the other possible OPTIMIZER_MODE modes gave a plan for the original SQL which started like this:

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |      |    35 | 14105 |  1386   (2)| 00:00:10 |       |       |
|   1 |  SORT ORDER BY                              |      |    35 | 14105 |  1386   (2)| 00:00:10 |       |       |
|   2 |   NESTED LOOPS OUTER                        |      |    35 | 14105 |  1385   (2)| 00:00:10 |       |       |
|*  3 |    HASH JOIN OUTER                          |      |    35 | 12565 |  1350   (2)| 00:00:10 |       |       |
|*  4 |     HASH JOIN OUTER                         |      |    35 | 11305 |  1315   (2)| 00:00:10 |       |       |
|   5 |      NESTED LOOPS OUTER                     |      |    35 |  9065 |  1280   (2)| 00:00:10 |       |       |
|   6 |       NESTED LOOPS OUTER                    |      |    35 |  8015 |  1245   (2)| 00:00:09 |       |       |
|   7 |        NESTED LOOPS OUTER                   |      |    35 |  7735 |  1210   (2)| 00:00:09 |       |       |
|   8 |         NESTED LOOPS OUTER                  |      |    35 |  7350 |  1176   (2)| 00:00:09 |       |       |
|   9 |          NESTED LOOPS                       |      |    35 |  6965 |  1106   (2)| 00:00:08 |       |       |
|  10 |           NESTED LOOPS                      |      |    69 | 12351 |   899   (3)| 00:00:07 |       |       |
|* 11 |            HASH JOIN                        |      |    69 |  9660 |   761   (3)| 00:00:06 |       |       |
|* 12 |             TABLE ACCESS FULL               | T2   |   237 |  1659 |    36   (6)| 00:00:01 |       |       |
|* 13 |             TABLE ACCESS FULL               | T1   |  1671 |   217K|   725   (3)| 00:00:06 |       |       |

So…. when optimizer upgrade problems strike, you should always try OFE.

But it’s also worth further investigation to confirm that OFE really is a reliable fix for the problem.

SQL Tuning: Thinking in Sets / How and When to be Bushy

Below is a SQL statement from a performance problem I was looking at the other day.

This is a real-world bit of SQL which has slightly simplified and sanitised but, I hope, without losing the real-worldliness of it and the points driving this article.

You don’t really need to be familiar with the data or table structures (I wasn’t) as this is a commentary on SQL structure and why sometimes a rewrite is the best option.

SELECT bd.trade_id
,      bdp.portfolio_id
,      bd.deal_id 
,      bd.book_id
,      pd.deal_status   prev_deal_status
FROM   deals            bd
,      portfolios       bdp
,      deals            pd
,      portfolios       pdp
-- today's data
WHERE  bd.business_date         = :t_date
AND    bd.src_bus_date          < :t_date
AND    bd.type                  = 'Trade'
AND    bdp.ref_portfolio_id     = bd.book_id
-- yesterday's data
AND    pd.business_date         = :y_date
AND    pd.type                  = 'Trade'
AND    pdp.ref_portfolio_id     = pd.book_id
-- some join columns
AND    bd.trade_id              = pd.trade_id
AND    bdp.portfolio_id         = pdp.portfolio_id;

There is no particular problem with how the SQL statement is written per se.

It is written in what seems to be a standard developer way.

Call it the “lay everything on the table” approach.

This is a common developer attitude:

“Let’s just write a flat SQL structure and let Oracle figure the best way out.”

Hmmm… Let’s look at why this can be a problem.

First, what is the essential requirement of the SQL?

Compare information (deal status) that we had yesterday for a subset of deals/trades

Something like that anyway…

So … What is the problem?

The Optimizer tends to rewrite and transform any SQL we give it anyway and tries to flatten it out.

The SQL above is already flat so isn’t that a good thing? Shouldn’t there be less work for the optimizer to do?

No, not necessarily. Flat SQL immediately restricts our permutations.

The problem comes with how Oracle can take this flat SQL and join the relevant row sources to efficiently get to the relevant data.

Driving Rowsource

Let’s assume that we should drive from today’s deal statuses (where we actually drive from will depend on what the optimizer estimates / costs).

SELECT ...
FROM   deals            bd
,      portfolio        bdp
...
-- today's data
WHERE  bd.business_date         = :t_date
AND    bd.src_bus_date          < :t_date
AND    bd.type                  = 'Trade'
AND    bdp.ref_portfolio_id     = bd.book_id
....

Where do we go from here?

We want to join from today’s deals to yesterdays deals.

But the data for the two sets of deals data is established via the two table join (DEALS & PORTFOLIOS).

thinking_in_Sets_venn

We want to join on TRADE_ID which comes from the two DEALS tables and PORTFOLIO_ID which comes from the two PORTFOLIOS tables.

SELECT ...
FROM   ...
,      deals            pd
,      portfolios       pdp
WHERE  ...
-- yesterday's data
AND    pd.business_date         = :y_date
AND    pd.type                  = 'Trade'
AND    pdp.ref_portfolio_id     = pd.book_id
...

And joined to via:

AND    bd.trade_id              = pd.trade_id
AND    bdp.portfolio_id         = pdp.portfolio_id

So from our starting point of today’s business deals, we can either go to PD or to PDP, but not to both at the same time.

Hang on? What do you mean not to both at the same time?

For any multi-table join involving more than two tables, the Optimizer evaluates the different join tree permutations.

Left-Deep Tree

Oracle has a tendency to choose what is called a left-deep tree.

If you think about a join between two rowsources (left and right), a left-deep is one where the second child (the right input) is always a table.

NESTED LOOPS are always left-deep.

HASH JOINS can be left-deep or right-deep (normally left-deep as already mentioned)

Zigzags are also possible, a mixture of left-deep and right-deep.

Below is an image of a left-based tree based on the four table join above.

left_tree

Here is an execution plan which that left-deep tree might represent:

---------------------------------------------------------------
| Id  | Operation                               | Name        |
---------------------------------------------------------------
|   0 | SELECT STATEMENT                      |               |
|   1 |  NESTED LOOPS                         |               |
|   2 |   NESTED LOOPS                        |               |
|   3 |    NESTED LOOPS                       |               |
|   4 |     NESTED LOOPS                      |               |
|*  5 |      TABLE ACCESS BY ROWID            | DEALS         |
|*  6 |       INDEX RANGE SCAN                | DEALS_IDX01   |
|   7 |      TABLE ACCESS BY INDEX ROWID      | PORTFOLIOS    |
|*  8 |       INDEX UNIQUE SCAN               | PK_PORTFOLIOS |
|*  9 |     TABLE ACCESS BY INDEX ROWID       | DEALS         |
|* 10 |      INDEX RANGE SCAN                 | DEALS_IDX01   |
|* 11 |    INDEX UNIQUE SCAN                  | PK_PORTFOLIOS |
|* 12 |   TABLE ACCESS BY INDEX ROWID         | PORTFOLIOS    |
---------------------------------------------------------------

Predicate Information (identified by operation id): 
--------------------------------------------------- 
 
   5 - filter("BD"."TYPE"='Trade' AND "BD"."SRC_BUS_DATE"<:t_date) 
   6 - access("BD"."BUSINESS_DATE"=:t_date) 
   8 - access("BD"."BOOK_ID"="BDP"."REF_PORTFOLIO_ID") 
   9 - filter(("BD"."TYPE"='Trade' AND "BD"."TRADE_ID"="PD"."TRADE_ID")) 
  10 - access("PD"."BUSINESS_DATE"=:y_date) 
  11 - access("PD"."BOOK_ID"="PDP"."REF_PORTFOLIO_ID") 
  12 - filter("BDP"."PORTFOLIO_ID"="PDP"."PORTFOLIO_ID")

Right-Deep Tree

A right-deep tree is one where the first child, the left input, is a table.

Illustration not specific to the SQL above:

right_tree

Bushy Tree

For this particular SQL, this is more what we are looking for:

bushy_tree

The essence of the problem is that we cannot get what is called bushy join, not with the original flat SQL.

The Optimizer cannot do this by default. And this isn’t an approach that we can get at by hinting (nor would we want to if we could, of course!).

Rewrite Required

To get this bushy plan, we need to rewrite our SQL to be more explicit around the set-based approach required.

WITH subq_curr_deal AS
     (SELECT /*+ no_merge */
             bd.trade_id
      ,      bd.deal_id
      ,      bd.book_id
      ,      bdp.portfolio_id
      FROM   deals      bd
      ,      portfolios bdp
      WHERE  bd.business_date         = :t_date
      AND    bd.src_bus_date          < :t_date
      AND    bd.type                  = 'Trade'
      AND    bdp.ref_portfolio_id     = bd.book_id)
,    subq_prev_deal AS
     (SELECT /*+ no_merge */
             pd.trade_id 
      ,      pd.deal_status
      ,      pdp.portfolio_id
      FROM   deals      pd
      ,      portfolios pdp
      WHERE  pd.business_date         = :y_date
      AND    pd.type                  = 'Trade'
      AND    pdp.ref_portfolio_id     = pd.book_id)
SELECT cd.trade_id
,      cd.portfolio_id
,      cd.deal_id
,      cd.book_id 
,      pd.deal_status prev_deal_status
FROM   subq_curr_deal cd
,      subq_prev_deal pd
WHERE  cd.trade_id             = pd.trade_id
AND    cd.portfolio_id         = pd.portfolio_id;

How exactly does the rewrite help?

By writing the SQL deliberately with this structure, by using WITH to create subqueries in conjunction with no_merge, we are deliberately forcing the bushy join.

This is an example execution plan that this bushy tree might represent.

----------------------------------------------------------
| Id  | Operation                        | Name          |
----------------------------------------------------------
|   0 | SELECT STATEMENT                 |               |
|*  1 |  HASH JOIN                       |               |
|   2 |   VIEW                           |               |
|   3 |    NESTED LOOPS                  |               |
|   4 |     NESTED LOOPS                 |               |
|*  5 |      TABLE ACCESS BY INDEX ROWID | DEALS         |
|*  6 |       INDEX RANGE SCAN           | DEALS_IDX01   |
|*  7 |      INDEX UNIQUE SCAN           | PK_PORTFOLIOS |
|   8 |     TABLE ACCESS BY INDEX ROWID  | PORTFOLIOS    |
|   9 |   VIEW                           |               |
|  10 |    NESTED LOOPS                  |               |
|  11 |     NESTED LOOPS                 |               |
|* 12 |      TABLE ACCESS BY INDEX ROWID | DEALS         |
|* 13 |       INDEX RANGE SCAN           | DEALS_IDX01   |
|* 14 |      INDEX UNIQUE SCAN           | PK_PORTFOLIOS |
|  15 |     TABLE ACCESS BY INDEX ROWID  | PORTFOLIOS    |
----------------------------------------------------------
 
Predicate Information (identified by operation id): 
--------------------------------------------------- 
                                 
   1 - access("CD"."TRADE_ID"="PD"."TRADE_ID" AND "CD"."PORTFOLIO_ID"="PD"."PORTFOLIO_ID")
   5 - filter(("BD"."TYPE"='Trade' AND "BD"."SRC_BUS_DATE"<:t_date)) 
   6 - access("BD"."BUSINESS_DATE"=:t_date ) 
   7 - access("BD"."BOOK_ID"="BDP"."REF_PORTFOLIO_ID") 
  12 - filter("PD"."TYPE"='Trade') 
  13 - access("PD"."BUSINESS_DATE"=:y_date) 
  14 - access("PDP"."REF_PORTFOLIO_ID"="PD"."BOOK_ID")

Is this a recommendation to go use WITH everywhere?

No.

What about the no_merge hint?

No.
The no_merge hint is a tricky one. This is not necessarily a recommendation but its usage here prevents the Optimizer from flattening. I often find it goes hand-in-hand with this sort of deliberately structured SQL for that reason, and similar goes for push_pred.

Do developers need to know about left deep, right deep and bushy?

No, not at all.

Takeaways?

It helps to think in sets and about what sets of data you are joining and recognise when SQL should be deliberately structured.

Further Reading

http://www.oaktable.net/content/right-deep-left-deep-and-bushy-joins
https://tonyhasler.wordpress.com/2008/12/27/bushy-joins/
https://www.toadworld.com/platforms/oracle/b/weblog/archive/2014/06/16/hitchhikers-guide-to-explain-plan-8
https://jonathanlewis.wordpress.com/2007/01/24/left-deep-trees/
https://antognini.ch/top/

Strategies for Minimising SQL Execution Plan Instability

Execution Plan Instability – What is the problem?

The Oracle Optimizer is a complex piece of software and with every release it becomes more complex.

RBO

In the beginning, the Optimizer was rule-based.

The Optimizer had a ranked list of heuristics used to optimize a query, picking the lowest ranked rule available.

This rule-based mode, whilst still in use with some internal Oracle dictionary queries, has been unsupported since version 10.1.

This means that no code changes have been officially made to the RBO and no bug fixes are provided. There are many features that the RBO is unaware of.

Applications should not still be using the rule-based optimizer.

CBO

The Cost-Based Optimizer is designed to evaluate a number of execution plans and pick the one estimated to be the fastest.

Many of the CBO features are designed to combat common optimization problems and many of these problems occur where development teams are not necessarily aware of the full implications of the features that they are using.

These built-in default behaviours roughly conform to the infamous 80%:20% rule in that most of the time they do a good job but they are not infallible.

Bind Variables, Literals and Histograms

Most of the features which are deliberately designed such that plan instability is difficult to avoid stem from the decision to use bind variables or literals (NOT that they should be mutually exclusive) and the interaction of the former with histograms.

In article 1 of his Philosophy series, Jonathan Lewis neatly sums this up:

Histograms and bind variables exist for diametrically opposed reasons – they won’t work well together without help

Fundamentally, bind variables exist to provide shareable plans.

Bind variables should be used where we are not interested in getting specific optimizations for differing parameters.

Literals should be used where we want the Optimizer to pay particular attention to data distribution and skew for the specific parameters supplied.

A SQL statement will/should often have some parameters which should be literals and some which should be binds.

From this point onwards, there has been a whole raft of features designed to treat the misuse of one or the other.

In Oracle 8.1.67.3, histograms were introduced.

Histograms exist to provide specific data distribution information, particularly relevant to specific parameters.

Also, in that version, we got CURSOR_SHARING – targeted at applications using literals instead of binds such that SQL which was identical part from the use of binds was rewritten to use sytem-generated bind variables.

Then in 9.2, we got bind variable peeking.

This feature was introduced so that the optimizer could peek at the values supplied at parse time and use data distribution information specific to these parse-time values to generate an execution plan which suited those values.

In addition and at the same time, through these various versions to present day, we have had the default behaviour of DBMS_STATS statistic gathering to let the database decide which columns it will create histograms on, based on the SQL which has been running.

This means that new histograms can suddenly spring up – or existing histograms unexpectedly disappear – on all sorts of columns. This can be problematic on columns with large numbers of distinct values AND particularly so on join cardinalities where there may be a mismatch of histograms on both sides of the join.

Ever since this point, we have had a conflict of interest in feature usage and an ever increasing number of additional functionality to battle against this conflict – adaptive cursor sharing, cardinality feedback, etc, etc

Finally, the education message got blurred or lost somewhere along the line to the extent that a lot of systems blindly overuse bind variables because of the perceived performance impact of using literals.

This situation is not helped by the fact PL/SQL is designed to encourage bind variables.

Using supplied parameters as literals means using some construct of dynamic SQL, not difficult but nevertheless an added complexity and also another feature which is often blindly discouraged.

SQL Execution Plan Instability – Is SPM a viable approach?

SPM Overview

In Oracle 11g, Oracle introduced SQL Plan Baselines as a SQL Plan Management feature.

The core attraction of this functionality is that you can “lock in” a particular plan or plans for a SQL statement. This stores a set of outline hints and a specific plan hash value in relation to a specific SQL signature.

The Optimizer then uses that set of hints to try to reproduce the desired plan. If it can it will, if it can’t it will reject the hintset.

Additionally, the Optimizer completes its best-cost execution plan optimization anyway so that it can provide the ability to do a controlled evolution of baselined plans in the event that the lower-cost plan that it would have used is different and better performing than the baselined plan.

To use this the database parameters (session or system) just need to be configured to capture plans into a baseline and then use them.

There is flexibility to this capture/use. You can capture everything as it is generated; you could capture everything from memory now and/or at regular intervals and import into a baseline; you could capture everything from AWR into a SQL Tuning Set and then import into a baseline; or you could capture everything from another environment and export/import it into another.

And at any point, you can turn off capture and continue to use those which you currently have – this usage continues to capture any lower cost plans that the optimizer would have generated for any existing baselined plans

For a more detailed look at SPM and a comparison with SQL Profiles, see documentation.

Sounds good – why isn’t it the silver bullet for system-wide stability?

This approach might be good enough, but it is not a cast-iron guarantee of system-wide stability.

There are a number of reasons why not.

New & Changed SQL

Firstly, you need to have captured all your plans already. If you get brand new SQL, then there will be nothing in SPM.

Depending on your application, this may be a significant concern.

For example, consider an application making heavy usage of Hibernate which generates SQL statements.

A minor change in the domain model can mean a change of system-generated table alias in many statements.

As a result, you may suddenly get a lot of brand new SQL and significant numbers of baselined statements which you will now never see again.

What are the baselined plans based on? Are they the best plans? The only plans ever used?

If you suddenly baseline the plans for a large number of SQL statements, you are dictating which plan is to be used.

The plans will be based on the parsed plans in memory or in AWR at the time.
Are these the best plans?

Does/should this SQL statement always use this plan?

Are there normally multiple plans for different bindsets?

What if you normally get benefit from adapative cursor sharing?

ACS and baselines

What if your application benefits from adapative cursor sharing?

Sure, you can baseline multiple plans but these plans have no ACS information.

As soon as that ACS information is no longer in memory (as happens), there is no shortcut in a baseline to regain that, you still have to have the multiple executions required for the optimizer to recognize that which plans to use for which bindsets.

Parsing overhead

Widespread usage of baselines might, depending on your system performance profile, have a significant impact on the parsing resources.

This is because it always generates a best-cost plan anyway.

Then if that is not the baselined plan, it will use the baselined hintset to try to generate the specific plan hash.

In addition, it that is not possible, it will use just the optimizer_features_enable hint to try to generate the required plan.

So, you might in a heavily-baselined system to be doing 2x the parse work of a non-baselined system.

This might well be easily accommodated but there are systems where this would cause a problem.

Integration with development and testing processes

A SQL Plan Baseline is tied to a SQL statement based on the statement’s EXACT_MATCHING_SIGNATURE – a hash of the SQL statement which has been case and space normalized.

If a SQL statement materially changes, the baseline no longer applies.

How aware are developers of the presence of baselines?

And how to integrate with the development process?

How will our release process deal with baselines?

And if baselining large numbers of plans is being considered, then we have to think about where these will be generated.

The natural implication (of widespread baseline usage) is that new functionality being promoted to Prod would have a set of tested, baselined plans accompanying it and these would presumably have to be generated in a Prod-like environment which included Prod-like volumes.

SQL Execution Plan Instability – Decision Time

There is a decision to be made.

(And/or perhaps there is often a conceptual over-simplification by senior management to combat? Or at least a lack of deep understanding of the beast that we’re dealing with here?)

Do you want the Optimizer to try to get a better execution plan sometimes?

If the answer is yes, then you have to accept that it will get it wrong from time to time.

In particular, the various feedback and adaptive mechanisms are designed to recognize that they have got it wrong.

BUT they need that problematic execution in the the first place – sometimes more than one – to recognize that fact.

That one problematic execution could be your next Priority 1 incident.

In addition, the feedback mechanism is not perfect and it still can make subsequent executions worse in some circumstances.

SQL Execution Plan Instability – Turn it off?

IF your primary goal is plan stability – and I think many teams would claim this is this their goal but they do not embrace the implications of this – then perhaps a better decision is to turn off the various features which cause or combine to cause most of the problems of instability.

Appropriate usage of binds, appropriate usage of literals

Choose whether to use a bind variable or a literal as is appropriate for the value / column / SQL statement.

A SQL statement might have a mix of both.

DBMS_STATS defaults

A METHOD_OPT of FOR ALL INDEXED COLUMNS SIZE AUTO is an immediate red flag. This is never a good setting.

FOR ALL COLUMNS SIZE AUTO without table-level preferences (SET_TABLE_PREFS) is another red flag.

As an interim step, consider use FOR ALL COLUMNS SIZE REPEAT to lock in the current histogram usage.

The end goal should be to have table level preferences set for all tables.

This relies on knowing your data, your data distribution, your code, and knowing which histograms make sense (i.e. for skewed columns) – it will be far fewer than gathered by default.

For columns with significant numbers of distinct skew, it may be necessary to manually craft the histograms.

Volatile tables

Volatile tables should have stats set to an appropriate setting to generate appropriate plans for all situations and then those stats should be locked.

Stats which are gathered at any point during the volatility cycle may be good or may be problematic.

Similarly dynamic sampling can only see the data at the time of hard parse – you might be lucky and this is better than stats which say 0 rows but it can be a time bomb.

Turn off optimizer features

Turning off optimizer features might be best done via a LOGON trigger and turning such off for a subset of application users. These features include:

  • Bind Variable Peeking – off via _optim_peek_user_binds = false
  • Cardinality Feedback and ACS – should be disabled by turning off bind variable peeking but off via _optimizer_use_feedback = false, _optimizer_adaptive_cursor_sharing = false, _optimizer_extended_cursor_sharing_rel = “none”
  • Dynamic Sampling – optimizer_dynamic_sampling to 0
  • 12c additional adaptive features – e.g. adaptive execution plans

Additionally it probably makes sense to turn off the adaptive direct path read behaviour or anything with the word adaptive or dynamic in it or associated to it

This functionality decides on whether to do full segment scans via the buffer cache or not and the behaviour is a runtime decision depending on the size of the object relative to the buffer cache AND depending on how much of the segment is currently in the cache.

  • Adaptive direct path reads – _serial_direct_read = always

All too often I’ve seen a concurrently executed SQL statement switch to a “bad” plan involving a full table scan delivered via direct path reads stress out the IO subsystem because of the number of concurrent executions of that query which then affects performance across the DB.

Caveat

The actions above are still not sufficient to guarantee plan stability but, for this goal above all else, this is likely to be the most appropriate action.

However, to further guarantee stability it is still likely that some degree of hinting – whether via manual hints, sql profiles or baselines – might be necessary for small numbers of SQL statements where the intial cost-based plan is not appropriate e.g. GTTs and other situations but it should be small number of statements.

SQL Execution Plan Instability – Summary & Opinion

The actions discussed above are made on the basis that we want to minimise the chance of execution plan instability at all costs.

By making this decision, we are prioritizing stability over all the features within Oracle designed to generate better plans for specific situations, sets of binds, etc.

Personally, I always recommend going with the default behaviour until such time as it causes a significant problem.

I also always recommend matching the scope of a solution to the scope of a problem.

For example, if we have a problem with one or two SQL statements, the potential solutions should be limited to those SQL statements.

We should never be making changes with a potential system-wide impact for the sake of a couple of SQL statements.

And even parameter changes can be injected to specific SQL statements either via a SQL Patch or via manual hints.

In my opinion, having multiple plans for a SQL statement is expected.

But what is under-appreciated is the extent to which this is normal.

These are normally only noticed when they cause a problem and the significant number of plans which regularly change plans without impact tend to go unnoticed.

It is also my opinion that SQL execution issues occur mostly when the SQL is badly written or when incompatible features are combined – e.g. peeked binds together with histograms – and I prefer to tackle that root cause rather than a generic concept of plan instability being inherently problematic.

A future path?

Might it be a good idea for there to be a built-in choice of optimizer “umbrella” modes – maximum performance vs maximum stability which would turn on/off some of the features above and/or adjust their sensitivity/aggressiveness?

Regexp hungry for CPU? Real time sql monitoring shows just how

Not exactly a snappy blog post title…

First up, an execution plan showing the old problem of how the costs of a scalar subquery are not properly accounted for, even in the latest 11gR2 releases.
Read more of this post

Poll: SQL Plan Management

I’m interested in what SQL Plan Management features people are actively using.
Read more of this post

Plan Problem with Partition Top N

Yesterday I was having issues with a poor choice of plan by the optimizer when trying to do a top N of a partitioned table.

SQL> CREATE TABLE t1
  2  (col1  varchar2(1)  not null
  3  ,col2  number       not null
  4  ,col3  varchar2(50) not null)
  5  PARTITION BY LIST (col1)
  6  (  
  7   PARTITION P1 VALUES ('A'),
  8   PARTITION P2 VALUES ('B'),
  9   PARTITION P3 VALUES ('C'),
 10   PARTITION P4 VALUES ('D')
 11  );

Table created.

SQL> INSERT 
  2  INTO   t1
  3  SELECT DECODE(MOD(ROWNUM,4),1,'A',2,'B',3,'C',0,'D') 
  4  ,      100000000+ROWNUM
  5  ,      LPAD('X',50,'X')
  6  FROM   DUAL
  7  CONNECT BY ROWNUM <= 1000000;

1000000 rows created.

SQL> CREATE UNIQUE INDEX i1 ON T1 (col2,col1) LOCAL;

Index created.

SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'T1');

PL/SQL procedure successfully completed.

Now COL2 is unique but to be a unique index on a partitioned table, I have to include the partition key.

I want the top N of COL2 in descending order.

SQL> select /*+ gather_plan_statistics */
  2        *
  3  from (select col2 from t1 t order by col2 desc) 
  4  where rownum <=10;

      COL2
----------
 101000000
 100999999
 100999998
 100999997
 100999996
 100999995
 100999994
 100999993
 100999992
 100999991

10 rows selected.

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------
SQL_ID  cwt7s3cmmw3vb, child number 0
-------------------------------------
select /*+ gather_plan_statistics */       * from (select col2 from t1
t order by col2 desc) where rownum <=10

Plan hash value: 738905059

-------------------------------------------------------------------------------------------
| Id  | Operation                | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |      1 |        |     10 |00:00:00.49 |    2698 |
|*  1 |  COUNT STOPKEY           |      |      1 |        |     10 |00:00:00.49 |    2698 |
|   2 |   VIEW                   |      |      1 |   1000K|     10 |00:00:00.49 |    2698 |
|*  3 |    SORT ORDER BY STOPKEY |      |      1 |   1000K|     10 |00:00:00.49 |    2698 |
|   4 |     PARTITION LIST ALL   |      |      1 |   1000K|   1000K|00:00:00.65 |    2698 |
|   5 |      INDEX FAST FULL SCAN| I1   |      4 |   1000K|   1000K|00:00:00.27 |    2698 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=10)
   3 - filter(ROWNUM<=10)

Seems a strange option to me to do a FAST FULL SCAN and in the process visiting 2698 buffers.

Wouldn’t we much rather this did a descending index scan?

SQL> select /*+ gather_plan_statistics */
  2        *
  3  from (select /*+ index_desc (t i1) */ col2 from t1 t order by col2 desc) 
  4  where rownum <=10;

      COL2
----------
 101000000
 100999999
 100999998
 100999997
 100999996
 100999995
 100999994
 100999993
 100999992
 100999991

10 rows selected.

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------
SQL_ID  f1ytxtp8bdgx6, child number 0
-------------------------------------
select /*+ gather_plan_statistics */       * from (select /*+
index_desc (t i1) */ col2 from t1 t order by col2 desc) where rownum
<=10

Plan hash value: 2521435439
-------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |      |      1 |        |     10 |00:00:00.01 |      12 |
|*  1 |  COUNT STOPKEY                  |      |      1 |        |     10 |00:00:00.01 |      12 |
|   2 |   VIEW                          |      |      1 |   1000K|     10 |00:00:00.01 |      12 |
|*  3 |    SORT ORDER BY STOPKEY        |      |      1 |   1000K|     10 |00:00:00.01 |      12 |
|   4 |     PARTITION LIST ALL          |      |      1 |   1000K|     40 |00:00:00.01 |      12 |
|*  5 |      COUNT STOPKEY              |      |      4 |        |     40 |00:00:00.01 |      12 |
|   6 |       INDEX FULL SCAN DESCENDING| I1   |      4 |   1000K|     40 |00:00:00.01 |      12 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=10)
   3 - filter(ROWNUM<=10)
   5 - filter(ROWNUM<=10)

I’m glad that actually kicked in there as I wanted because on the real-world example that this relates to, it was being distinctly stubborn and at one point I thought I might have to resort to something distinctly ugly like:

SQL> select /*+ gather_plan_statistics */
  2         *
  3  from (select col2
  4        from   (select col2
  5                from   t1 partition (p1) t
  6                order by col2 desc) 
  7        where rownum <=10
  8        union all
  9        select col2
 10        from   (select col2
 11                from   t1 partition (p2) t
 12                order by col2 desc) 
 13        where rownum <=10
 14        union all
 15        select col2
 16        from   (select col2
 17                from   t1 partition (p3) t
 18                order by col2 desc) 
 19        where rownum <=10
 20        union all
 21        select col2
 22        from   (select col2
 23                from   t1 partition (p4) t
 24                order by col2 desc) 
 25        where rownum <=10
 26        order by col2 desc)
 27  where rownum <=10;

      COL2
----------
 101000000
 100999999
 100999998
 100999997
 100999996
 100999995
 100999994
 100999993
 100999992
 100999991

10 rows selected.

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------
SQL_ID  8y7d958mpah49, child number 0
-------------------------------------
select /*+ gather_plan_statistics */        * from (select col2
from   (select col2               from   t1 partition (p1) t
   order by col2 desc)       where rownum <=10       union all
select col2       from   (select col2               from   t1 partition
(p2) t               order by col2 desc)       where rownum <=10
union all       select col2       from   (select col2
from   t1 partition (p3) t               order by col2 desc)
where rownum <=10       union all       select col2       from
(select col2               from   t1 partition (p4) t
order by col2 desc)       where rownum <=10       order by col2 desc)
where rownum <=10

Plan hash value: 1726521473
---------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |      |      1 |        |     10 |00:00:00.01 |      12 |
|*  1 |  COUNT STOPKEY                    |      |      1 |        |     10 |00:00:00.01 |      12 |
|   2 |   VIEW                            |      |      1 |     40 |     10 |00:00:00.01 |      12 |
|*  3 |    SORT ORDER BY STOPKEY          |      |      1 |     40 |     10 |00:00:00.01 |      12 |
|   4 |     UNION-ALL                     |      |      1 |        |     40 |00:00:00.01 |      12 |
|*  5 |      COUNT STOPKEY                |      |      1 |        |     10 |00:00:00.01 |       3 |
|   6 |       VIEW                        |      |      1 |    250K|     10 |00:00:00.01 |       3 |
|   7 |        PARTITION LIST SINGLE      |      |      1 |    250K|     10 |00:00:00.01 |       3 |
|   8 |         INDEX FULL SCAN DESCENDING| I1   |      1 |    250K|     10 |00:00:00.01 |       3 |
|*  9 |      COUNT STOPKEY                |      |      1 |        |     10 |00:00:00.01 |       3 |
|  10 |       VIEW                        |      |      1 |    250K|     10 |00:00:00.01 |       3 |
|  11 |        PARTITION LIST SINGLE      |      |      1 |    250K|     10 |00:00:00.01 |       3 |
|  12 |         INDEX FULL SCAN DESCENDING| I1   |      1 |    250K|     10 |00:00:00.01 |       3 |
|* 13 |      COUNT STOPKEY                |      |      1 |        |     10 |00:00:00.01 |       3 |
|  14 |       VIEW                        |      |      1 |    250K|     10 |00:00:00.01 |       3 |
|  15 |        PARTITION LIST SINGLE      |      |      1 |    250K|     10 |00:00:00.01 |       3 |
|  16 |         INDEX FULL SCAN DESCENDING| I1   |      1 |    250K|     10 |00:00:00.01 |       3 |
|* 17 |      COUNT STOPKEY                |      |      1 |        |     10 |00:00:00.01 |       3 |
|  18 |       VIEW                        |      |      1 |     10 |     10 |00:00:00.01 |       3 |
|  19 |        PARTITION LIST SINGLE      |      |      1 |    250K|     10 |00:00:00.01 |       3 |
|  20 |         INDEX FULL SCAN DESCENDING| I1   |      1 |    250K|     10 |00:00:00.01 |       3 |
---------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=10)
   3 - filter(ROWNUM<=10)
   5 - filter(ROWNUM<=10)
   9 - filter(ROWNUM<=10)
  13 - filter(ROWNUM<=10)
  17 - filter(ROWNUM<=10)

But then the stubbornness disappeared and I couldn’t reproduce.
Conclusion – I must have been doing something stupid.

On a related note, whilst I was messing about, I noticed this.
Spot the trivial difference (not that it matters):

1
SQL> select *
  2  from (select col2 from t1 t order by col2 desc)
  3  where rownum <= 10;

........

SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
SQL_ID  8uy80z5da45ct, child number 0
-------------------------------------
select * from (select col2 from t1 t order by col2 desc) where rownum
<= 10

Plan hash value: 3155368986

-------------------------------------------------------------------------------------
| Id  | Operation                    | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |      |       |       |     3 (100)|          |
|*  1 |  COUNT STOPKEY               |      |       |       |            |          |
|   2 |   VIEW                       |      |    10 |   130 |     3   (0)| 00:00:01 |
|   3 |    INDEX FULL SCAN DESCENDING| I1   |  1000K|  6835K|     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=10)


21 rows selected.

SQL> select *
  2  from (select /*+ index_desc(t i1) */ col2 from t1 t order by col2 desc)
  3  where rownum <= 10;

........

SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
SQL_ID  3asrnmxg6bqsj, child number 0
-------------------------------------
select * from (select /*+ index_desc(t i1) */ col2 from t1 t order by
col2 desc) where rownum <= 10

Plan hash value: 3155368986

-------------------------------------------------------------------------------------
| Id  | Operation                    | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |      |       |       |     3 (100)|          |
|*  1 |  COUNT STOPKEY               |      |       |       |            |          |
|   2 |   VIEW                       |      |    10 |   130 |     3   (0)| 00:00:01 |
|   3 |    INDEX FULL SCAN DESCENDING| I1   |    10 |    70 |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=10)


21 rows selected.

SQL> 
Follow

Get every new post delivered to your Inbox.

Join 266 other followers