Did it really fix it 1: OFE & nasty FIRST_ROWS plan on 11.2.0.4

Plan degradations on upgrade are normal.

This one’s no different.

On further investigation, turned out application was setting optimizer_mode = first_rows somewhere.

First point about this is that first_rows really shouldn’t be used either as a hint or as an optimizer_mode.

What does FIRST_ROWS mean?
From 11g doco:

The optimizer uses a mix of costs and heuristics to find a best plan for fast delivery of the first few rows.

If any sort of FIRST_ROWS type intervention is required, then it might be better to be using a more modern, fully-costed FIRST_ROWS_n (where n = 1 | 10 | 100 | 1000).
If it’s even required.

The 12c documentation is clearer about this:

FIRST_ROWS is available for backward compatibility and plan stability; use FIRST_ROWS_n instead.

But I don’t think we should attach too much credence to that statement not being in the 11g documentation.

Here’s the problem plan.

Pre upgrade, snippet of the plan looked as per below.
Please note position of T1.

-------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name                        | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |                             |    34 | 13668 |  3003   (1)| 00:00:22 |       |       |
|   1 |  SORT ORDER BY                              |                             |    34 | 13668 |  3003   (1)| 00:00:22 |       |       |
|   2 |   NESTED LOOPS OUTER                        |                             |    34 | 13668 |  3002   (1)| 00:00:22 |       |       |
|   3 |    NESTED LOOPS OUTER                       |                             |    34 | 12648 |  2968   (1)| 00:00:22 |       |       |
|   4 |     NESTED LOOPS OUTER                      |                             |    34 | 12274 |  2935   (1)| 00:00:22 |       |       |
|   5 |      NESTED LOOPS OUTER                     |                             |    34 | 11900 |  2867   (1)| 00:00:21 |       |       |
|   6 |       NESTED LOOPS OUTER                    |                             |    34 | 11628 |  2833   (1)| 00:00:21 |       |       |
|   7 |        NESTED LOOPS OUTER                   |                             |    34 | 10404 |  2799   (1)| 00:00:21 |       |       |
|   8 |         NESTED LOOPS OUTER                  |                             |    34 |  8228 |  2765   (1)| 00:00:20 |       |       |
|   9 |          NESTED LOOPS                       |                             |    34 |  6732 |  2731   (1)| 00:00:20 |       |       |
|  10 |           NESTED LOOPS                      |                             |    69 | 12282 |  2524   (1)| 00:00:19 |       |       |
|  11 |            NESTED LOOPS                     |                             |    69 |  9591 |  2386   (1)| 00:00:18 |       |       |
|* 12 |             TABLE ACCESS FULL               | T1                          |  1658 |   213K|   725   (3)| 00:00:06 |       |       |
|* 13 |             TABLE ACCESS BY INDEX ROWID     | T2                          |     1 |     7 |     1   (0)| 00:00:01 |       |       |
|* 14 |              INDEX UNIQUE SCAN              | T2_U1                       |     1 |       |     0   (0)| 00:00:01 |       |       |
|  15 |            TABLE ACCESS BY INDEX ROWID      | T3                          |     1 |    39 |     2   (0)| 00:00:01 |       |       |
|* 16 |             INDEX UNIQUE SCAN               | T3_U1                       |     1 |       |     1   (0)| 00:00:01 |       |       |
|  17 |           TABLE ACCESS BY GLOBAL INDEX ROWID| T4                          |     1 |    20 |     3   (0)| 00:00:01 | ROWID | ROWID |
|* 18 |            INDEX RANGE SCAN                 | T4_N1                       |     1 |       |     2   (0)| 00:00:01 |       |       |

Post upgrade, plan has degraded to:

--------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name                        | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |                             |    69 | 27738 |   508M  (3)|999:59:59 |       |       |
|   1 |  NESTED LOOPS OUTER                          |                             |    69 | 27738 |   508M  (3)|999:59:59 |       |       |
|   2 |   NESTED LOOPS OUTER                         |                             |    69 | 26979 |   508M  (3)|999:59:59 |       |       |
|   3 |    NESTED LOOPS OUTER                        |                             |    69 | 26220 |   508M  (3)|999:59:59 |       |       |
|   4 |     NESTED LOOPS OUTER                       |                             |    69 | 23184 |   508M  (3)|999:59:59 |       |       |
|   5 |      NESTED LOOPS OUTER                      |                             |    69 | 22632 |   508M  (3)|999:59:59 |       |       |
|   6 |       NESTED LOOPS OUTER                     |                             |    69 | 20562 |   508M  (3)|999:59:59 |       |       |
|   7 |        NESTED LOOPS OUTER                    |                             |    69 | 16146 |   508M  (3)|999:59:59 |       |       |
|   8 |         NESTED LOOPS                         |                             |    69 | 13662 |   508M  (3)|999:59:59 |       |       |
|   9 |          NESTED LOOPS                        |                             |  1675 |   312K|   508M  (3)|999:59:59 |       |       |
|  10 |           NESTED LOOPS                       |                             |   703K|    39M|  3476K  (1)| 06:54:55 |       |       |
|* 11 |            TABLE ACCESS BY GLOBAL INDEX ROWID| T4                          |   703K|    13M|  2068K  (1)| 04:06:54 | ROWID | ROWID |
|  12 |             INDEX FULL SCAN                  | T4_N2                       |  2240K|       |  7943   (1)| 00:00:57 |       |       |
|  13 |            TABLE ACCESS BY INDEX ROWID       | T3                          |     1 |    39 |     2   (0)| 00:00:01 |       |       |
|* 14 |             INDEX UNIQUE SCAN                | T3_U1                       |     1 |       |     1   (0)| 00:00:01 |       |       |
|* 15 |           TABLE ACCESS FULL                  | T1                          |     1 |   132 |   718   (3)| 00:00:06 |       |       |

In the post-upgrade plan, the most pertinent part was step 15.

A FULL TABLE SCAN as the inner rowsource of a NESTED LOOP, in other words for every row in the driving rowsource do a full table scan.”

Standard reaction to such a degradation is to test the SQL with OPTIMIZER_FEATURES_ENABLE (OFE).

With OFE set either via OPT_PARAM hint or via ALTER SESSION then the SQL reverted to pre-upgrade plan.

This is good!

But… how much confidence can we have that this actually fixed the problem?

It would be great if we could find a bug which was definitively responsible for this change in execution plan.

OFE is an umbrella but it doesn’t turn off ALL fixes/optimizer features in a release.

Many of those fixes covered by the OFE umbrella are represented in V$SYSTEM_FIX_CONTROL

So, I went looking for anything which sounded related and went back to default 11.2.0.4 OFE and starting turning off the BUGNOs one-by-one – no difference, still a bad plan

Then still at OFE 11.2.0.4, I turned off all the BUGNOs…. no difference, still a bad plan.

At this point, my confidence in OFE as a fix for this problem was starting to waiver.

So, I tried to reduce the problem SQL to it’s smallest form representing the problem.

alter session set optimizer_mode = first_rows;
alter session set optimizer_features_enable = '11.2.0.4';

 explain plan for
 SELECT *
 FROM   T1
 ,      T3
 ,      T4
 WHERE  T1.COL1             = T3.COL1
 AND    T3.COL2             = T4.COL2
 AND    T1.DATE_COL        '+OUTLINE'));

This distilled statement reproduced the issue at the heart of the original statement:

----------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |         |  6029 |  4533K|  1612M  (3)|999:59:59 |       |       |
|   1 |  NESTED LOOPS                        |         |  6029 |  4533K|  1612M  (3)|999:59:59 |       |       |
|   2 |   NESTED LOOPS                       |         |  2237K|   795M|  6551K  (1)| 13:01:53 |       |       |
|   3 |    TABLE ACCESS BY GLOBAL INDEX ROWID| T4      |  2240K|   188M|  2068K  (1)| 04:06:53 | ROWID | ROWID |
|   4 |     INDEX FULL SCAN                  | T4_N2   |  2240K|       |  7943   (1)| 00:00:57 |       |       |
|   5 |    TABLE ACCESS BY INDEX ROWID       | T3      |     1 |   285 |     2   (0)| 00:00:01 |       |       |
|*  6 |     INDEX UNIQUE SCAN                | T3_U1   |     1 |       |     1   (0)| 00:00:01 |       |       |
|*  7 |   TABLE ACCESS FULL                  | T1      |     1 |   397 |   718   (3)| 00:00:06 |       |       |
----------------------------------------------------------------------------------------------------------------

Is this test case a 100% indication that OFE wasn’t applicable to the original problem?
Not sure.
But at this point I had littleconfidence that OFE provided any sort of reliable fix to the original SQL statement.

I don’t know what the internal heuristics are/were for FIRST_ROWS.

But that is a stupid place for a full table scan regardless of the number of rows expected to return.

If we remove one of the T1 predicates above and the various row estimates go up then the FTS moves to a driving position.

I could/should get a 10053 trace but FIRST_ROWS shouldn’t be in use in the first place.

Solution – hunt down where the app code is doing the FIRST_ROWS thing and exterminate.

All the other possible OPTIMIZER_MODE modes gave a plan for the original SQL which started like this:

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |      |    35 | 14105 |  1386   (2)| 00:00:10 |       |       |
|   1 |  SORT ORDER BY                              |      |    35 | 14105 |  1386   (2)| 00:00:10 |       |       |
|   2 |   NESTED LOOPS OUTER                        |      |    35 | 14105 |  1385   (2)| 00:00:10 |       |       |
|*  3 |    HASH JOIN OUTER                          |      |    35 | 12565 |  1350   (2)| 00:00:10 |       |       |
|*  4 |     HASH JOIN OUTER                         |      |    35 | 11305 |  1315   (2)| 00:00:10 |       |       |
|   5 |      NESTED LOOPS OUTER                     |      |    35 |  9065 |  1280   (2)| 00:00:10 |       |       |
|   6 |       NESTED LOOPS OUTER                    |      |    35 |  8015 |  1245   (2)| 00:00:09 |       |       |
|   7 |        NESTED LOOPS OUTER                   |      |    35 |  7735 |  1210   (2)| 00:00:09 |       |       |
|   8 |         NESTED LOOPS OUTER                  |      |    35 |  7350 |  1176   (2)| 00:00:09 |       |       |
|   9 |          NESTED LOOPS                       |      |    35 |  6965 |  1106   (2)| 00:00:08 |       |       |
|  10 |           NESTED LOOPS                      |      |    69 | 12351 |   899   (3)| 00:00:07 |       |       |
|* 11 |            HASH JOIN                        |      |    69 |  9660 |   761   (3)| 00:00:06 |       |       |
|* 12 |             TABLE ACCESS FULL               | T2   |   237 |  1659 |    36   (6)| 00:00:01 |       |       |
|* 13 |             TABLE ACCESS FULL               | T1   |  1671 |   217K|   725   (3)| 00:00:06 |       |       |

So…. when optimizer upgrade problems strike, you should always try OFE.

But it’s also worth further investigation to confirm that OFE really is a reliable fix for the problem.

Scalar Subselect Costing

This issue is an oldie but deserving of a quick post to stop me going off on a tangent in another post.

It is an oddity of scalar subselects/subqueries that their cost is not taken into account in the top level cost of a query.

In older versions of Oracle, it used to be the case that you didn’t even see the scalar subquery in the execution plan.

However, even in the latest versions, the cost still isn’t accounted for.

Always something to keep in mind.

For example:

SQL> create table t1
  2  (col1 number not null);

Table created.

SQL> 
SQL> insert into t1
  2  select rownum
  3  from   dual
  4  connect by rownum <= 10000;

10000 rows created.

SQL> 
SQL> commit;

Commit complete.

SQL> 
SQL> create table t2
  2  (col1 number not null primary key);

Table created.

SQL> 
SQL> 
SQL> insert into t2
  2  select rownum
  3  from   dual
  4  connect by rownum <= 10000;

10000 rows created.

SQL> 
SQL> commit;

Commit complete.

SQL> 

Let’s do a scalar subselect to do an index lookup on t2 for every row in t1:

SQL> explain plan for
  2  select t1.col1
  3  ,      (select t2.col1 from t2 where t2.col1 = t1.col1)
  4  from   t1;

Explained.

SQL> 
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
Plan hash value: 2339000913

----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              | 10000 |   126K|     8   (0)| 00:00:01 |
|*  1 |  INDEX UNIQUE SCAN| SYS_C0078310 |     1 |    13 |     1   (0)| 00:00:01 |
|   2 |  TABLE ACCESS FULL| T1           | 10000 |   126K|     8   (0)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T2"."COL1"=:B1)

You can see that the cost of the scalar subquery is 1 per execution and it’s not accounted for at the top level.

Let’s force a full table scan of the row-by-row lookup:

SQL> explain plan for
  2  select t1.col1
  3  ,      (select /*+ full(t2) */ t2.col1 from t2 where t2.col1 = t1.col1)
  4  from   t1;

Explained.

SQL> 
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------
Plan hash value: 637946564

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 10000 |   126K|     8   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T2   |     1 |    13 |     2   (0)| 00:00:01 |
|   2 |  TABLE ACCESS FULL| T1   | 10000 |   126K|     8   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("T2"."COL1"=:B1)

Obviously a much more expensive operation but, again, not properly accounted for in the overall costing.

Wouldn’t it be preferable that as the optimizer has estimated the number of rows in the top level select:

|   0 | SELECT STATEMENT  |      | 10000 |   126K|     8   (0)| 00:00:01 |

and it has estimated the cost per execution of the scalar subselect:

|*  1 |  TABLE ACCESS FULL| T2   |     1 |    13 |     2   (0)| 00:00:01 |

that the top level cost include to some degree the cost of scalar subselect per execution * estimated executions?

For example, if we code a join roughly equivalent to the scalar subselect then:

SQL> explain plan for
  2  select /*+ 
  3           full(t2) 
  4           use_nl(t2)
  5           */
  6         t1.col1
  7  ,      t2.col1
  8  from   t1
  9  ,      t2
 10  where t2.col1 (+) = t1.col1;

Explained.

SQL> 
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------
Plan hash value: 2453408398

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |   253K| 66919   (7)| 00:05:35 |
|   1 |  NESTED LOOPS OUTER|      | 10000 |   253K| 66919   (7)| 00:05:35 |
|   2 |   TABLE ACCESS FULL| T1   | 10000 |   126K|     8   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| T2   |     1 |    13 |     7  (15)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("T2"."COL1"(+)="T1"."COL1")

Also see:
http://jonathanlewis.wordpress.com/2007/10/12/scalar-subqueries/

http://oracle-randolf.blogspot.com/2010/01/when-your-projection-is-not-cost-free.html

http://blog.sydoracle.com/2005/09/explain-plans-and-scalar-subqueries.html

CPU costing is off

Just a quick note that if you see in your executions plans via DBMS_XPLAN that “CPU costing is off”, then it might not be, you might just have old versions of PLAN_TABLE hanging around.

I’ve just spent a little while trying to figure out why CPU costing might be off, after which a quick drop of a local PLAN_TABLE fixed everything.

So, not so much that CPU costing was off, more that some of the CPU costing information was not available in the PLAN_TABLE.

Interestingly, I did NOT have a message saying that “‘PLAN_TABLE’ is old version”.

I did spend a few minutes trying to recreate the situation with old versions of PLAN_TABLE locally, but all I could get was a message that PLAN_TABLE was old version, nothing about cpu costing being off. Bit weird but probably not worth spending any more time over.

Problem plans

I was looking at a performance problem a week or so before Christmas which I touched on in previous posts recently.

These are big plans and I had trouble uploading them in a decent format for this blog.

The format isn’t great but I’ve finally put them up on on one of these pastebin things for now:

As to how long they remain available, we’ll see.

These are probably of limited interest to anyone.

But they do illustrate a couple of things.

1. There are a lot of distractions in a big plan.

These are big plans at around 2800 lines.

You can see a lot of the deadly duo – MERGE JOIN CARTESIAN and BUFFER SORT operations- but these themselves are not the cause of the performance degradation (see point 2 about relative goodness).

It’s not until you get to lines six to seven hundred that you start to see any symptom of the performance difference.

The main performance problems stem from the full table scans of POSITIONS (e.g. See Id 717 in Bad Plan) and the HASH JOIN (Id 605) rather than the INDEX ACCESS (Id 719 in Good plan) and NESTED LOOP (605/6) operations, operations which are repeated throughout the rest of the plan

2. Good can be relative

I don’t think anyone would look at the “good” plan and be comfortable with it.

Then again could you could ever look at that sort of length of plan and think “Yes!” ?

The point is that the business were happy with the report performance that this plan provided and in the current business environment there’s rarely appetite to change things just because they’re not good technically.

3. It’s difficult to hint adequately

There are a handful of hints in the source code which are presumably intended to offer some sort of plan stability.

However, with a plan of this size, a couple of USE_NL hints are not going to be adequate.

As Jonathan Lewis points out, it’s very difficult to hint well/completely.

There are other ways to achieve plan stability if that’s what you want.

In this case, those handful of hints have probably contributed as much to plan instability as anything else.

4. Reusability is not always a good thing

These plans come from a reporting system which is doing too much calculation dynamically.

There is a core of Views on Views on Pipelined Table Functions on top of more Views which aims to provide a reusable interface to all the information being reported on and the various reports on top.

It’s not unusual to get performance problems from that sort of setup and probably one of the reasons that a handful of hints were used in the first place.

I would nearly always favour performance over reusability.

P.S. These plans also show the wrapping issue mentioned previously. At the time of writing, I still have an open Oracle SR for that. They had asked for full dumps of tables, statistics, etc. However I reproduced it with a standalone query joining dba_objects 50 times.

Follow

Get every new post delivered to your Inbox.

Join 313 other followers