optimizer mode parameter, hint and missing statistics

On older versions at least, using an all_rows hint with the rule-based optimizer is not necessarily the same as setting optimizer_mode to all_rows.

Edit:
Note that I’m deliberately avoiding dynamic sampling and newer features like cardinality feedback, just showing how a little oddity in code paths for slightly different all_rows scenarios might lead to different estimates in older versions.

This resulted from a conversation with a colleague that started something like this:

Colleague: Is the default cardinality for a temporary table 2000?
Me: No, it’s related to block size so you should see 8168.
Colleague: No, I see 2000.

This was observed on 9.2.0.8 using a global temporary table but similar observations are possible with collections, external tables and even normal heap tables.

Yes, 9.2.0.8 is an old version, yes rule-based optimizer is now defunct, but this will remain relevant for the many systems that will continue to run old versions for years to come.

It is reasonably well-known that the default cardinality for collections and global temporary tables is related to block size – an overhead*. So, with an 8k block size, it is common to see a cardinality estimate of 8168, as shown below:

SQL> alter session set optimizer_mode = all_rows;

Session altered.

SQL> create global temporary table t1 
  2  (col1 number); 

Table created.

SQL> explain plan for
  2  select * from t1;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
| Id  | Operation            |  Name       | Rows  | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |             |  8168 |   103K|    12   (9)|
|   1 |  TABLE ACCESS FULL   | T1          |  8168 |   103K|    12   (9)|
-------------------------------------------------------------------------

7 rows selected.

SQL> 

There is no detailed explanation in the 10053 trace as to where the 8168 came from (it is documented – see further down the page):

BASE STATISTICAL INFORMATION
***********************
Table stats    Table: T1   Alias: T1
  TOTAL ::  (NOT ANALYZED)    CDN: 8168  NBLKS:  100  AVG_ROW_LEN:  100

However, if we step back in time and set optimizer_mode to rule but try to make amends using a statement level ALL_ROWS hint:

SQL> alter session set optimizer_mode = rule;

Session altered.

SQL> explain plan for
  2  select /*+ all_rows */ * from t1;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
| Id  | Operation            |  Name       | Rows  | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |             |  2000 | 26000 |    12   (9)|
|   1 |  TABLE ACCESS FULL   | T1          |  2000 | 26000 |    12   (9)|
-------------------------------------------------------------------------

7 rows selected.

SQL> 

We fall into a different code path in the optimizer.

The following is found in a 10053 trace:

BASE STATISTICAL INFORMATION
***********************
Table stats    Table: T1   Alias: T1
  TOTAL ::  (NOT ANALYZED)    CDN: 2000  NBLKS:  100  AVG_ROW_LEN:  100

There’s no explanation regarding where the 2000 comes from but it is documented through versions 9.2 to 11gR2 as being the default used for “Remote Cardinality”.

So it seems that rule-based optimizer + all_rows hint drops into a different path in the optimizer which uses a default cardinality of 2000 rows.

The same behaviour is not true in the latest versions.

In 11gR2 you can still force the deprecated RBO but if you hint all_rows then you seem to go down a consistent path (i.e. consistent estimate of 8168 provided no dynamic sampling or cardinality feedback), as you might have expected above.

*Going back to the default value of 8168, this is calculated from the formula of

num_of_blocks * (block_size – cache_layer) / avg_row_len

And for collections and global temporary tables without any additional statistics, num_of_blocks and avg_row_len default to 100 so we’re left with (block_size – cache_layer) with the cache_layer typically evaluating to 24.

In terms of the general application of this default formula, until recently I didn’t appreciate that in the absence of statistics AND dynamic sampling, that the actual number of blocks is used (if there is an actual number of blocks).

One of the reasons that it’s easy to have missed this formula is that in any recent version, dynamic sampling will kick in by default and so you have to explicitly prevent it. That’s my excuse anyway.

SQL> drop table t1;

Table dropped.

SQL> create table t1
  2  as
  3  select rownum col1
  4  from   dual
  5  connect by rownum <= 100000;

Table created.

SQL> alter session set optimizer_mode = all_rows;

Session altered.

SQL> explain plan for
  2  select /*+ dynamic_sampling(t1 0) */ * from t1;

Explained.

SQL> 
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------
| Id  | Operation            |  Name       | Rows  | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |             | 13314 |   169K|    18   (6)|
|   1 |  TABLE ACCESS FULL   | T1          | 13314 |   169K|    18   (6)|
-------------------------------------------------------------------------

7 rows selected.

SQL> 

With the 10053 trace showing the blocks inputs:

BASE STATISTICAL INFORMATION
***********************
Table stats    Table: T1   Alias: T1
  TOTAL ::  (NOT ANALYZED)    CDN: 13314  NBLKS:  163  AVG_ROW_LEN:  100
About these ads

3 Responses to optimizer mode parameter, hint and missing statistics

  1. Dom, this default estimate can be circumvented by using the dynamic sampling. I redid the first part of your example with optimizer_dynamic_sampling=0 (off) and 2 (objects without statistics are sampled). The latter improves the estimate dramatically:

    SQL> connect scott/tiger
    Connected.
    SQL> create global temporary table t1 (col1 number); 
    
    Table created.
    
    Elapsed: 00:00:00.93
    SQL> alter session set optimizer_mode = all_rows;
    
    Session altered.
    
    Elapsed: 00:00:00.00
    SQL> alter session set optimizer_dynamic_sampling=0;
    
    Session altered.
    
    Elapsed: 00:00:00.00
    SQL> explain plan for
    2 select * from t1;
    
    Explained.
    
    Elapsed: 00:00:00.18
    SQL> select * from table(dbms_xplan.display(null,null,’advanced’));
    
    PLAN_TABLE_OUTPUT
    ——————————————————————————–
    Plan hash value: 3617692013
    
    ————————————————————————–
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    ————————————————————————–
    | 0 | SELECT STATEMENT | | 8168 | 103K| 24 (5)| 00:00:01 |
    | 1 | TABLE ACCESS FULL| T1 | 8168 | 103K| 24 (5)| 00:00:01 |
    ————————————————————————–
    
    Query Block Name / Object Alias (identified by operation id):
    ————————————————————-
    
    1 – SEL$1 / T1@SEL$1
    
    Outline Data
    ————-
    
    /*+
    BEGIN_OUTLINE_DATA
    FULL(@”SEL$1″ “T1″@”SEL$1″)
    OUTLINE_LEAF(@”SEL$1″)
    ALL_ROWS
    OPT_PARAM(‘optimizer_dynamic_sampling’ 0)
    DB_VERSION(’11.2.0.3′)
    OPTIMIZER_FEATURES_ENABLE(’11.2.0.3′)
    IGNORE_OPTIM_EMBEDDED_HINTS
    END_OUTLINE_DATA
    */
    
    Column Projection Information (identified by operation id):
    ———————————————————–
    
    1 – “T1″.”COL1″[NUMBER,22]
    
    33 rows selected.
    
    Elapsed: 00:00:01.18
    SQL> alter session set optimizer_dynamic_sampling=2;
    
    Session altered.
    
    Elapsed: 00:00:00.00
    SQL> explain plan for
    2 select * from t1;
    
    Explained.
    
    Elapsed: 00:00:00.00
    SQL> select * from table(dbms_xplan.display(null,null,’advanced’));
    
    PLAN_TABLE_OUTPUT
    ——————————————————————————–
    Plan hash value: 3617692013
    
    ————————————————————————–
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    ————————————————————————–
    | 0 | SELECT STATEMENT | | 1 | 13 | 2 (0)| 00:00:01 |
    | 1 | TABLE ACCESS FULL| T1 | 1 | 13 | 2 (0)| 00:00:01 |
    ————————————————————————–
    
    Query Block Name / Object Alias (identified by operation id):
    ————————————————————-
    
    1 – SEL$1 / T1@SEL$1
    
    Outline Data
    ————-
    
    /*+
    BEGIN_OUTLINE_DATA
    FULL(@”SEL$1″ “T1″@”SEL$1″)
    OUTLINE_LEAF(@”SEL$1″)
    ALL_ROWS
    DB_VERSION(’11.2.0.3′)
    OPTIMIZER_FEATURES_ENABLE(’11.2.0.3′)
    IGNORE_OPTIM_EMBEDDED_HINTS
    END_OUTLINE_DATA
    */
    
    Column Projection Information (identified by operation id):
    ———————————————————–
    
    1 – “T1″.”COL1″[NUMBER,22]
    
    Note
    —–
    – dynamic sampling used for this statement (level=2)
    
    36 rows selected.
    
    Elapsed: 00:00:00.34
    SQL> 
    

    The first result, without the dynamic sampling, has the very same estimate as the one in your post. With the dynamic sampling, CBO gets much more accurate statistics and the estimates are much, much better.

  2. Of course, the case above doesn’t have much to do with the defaults that you’ve discovered, it only shows a cheap way of avoiding them altogether.

    • Dom Brooks says:

      Hi Mladen,

      Thanks for the comments.

      The point of the post was really to show that, in older versions:

      “optimizer_mode = rule” + a /*+ all_rows */ hint
      !=
      “optimizer_mode = all_rows”

      And that they end up with different estimates because of that.

      Dynamic sampling is a fabulous feature, probably one of my top 5 performance tuning features. But judgement is reserved on whether features such as cardinality feedback and sql plan baselines will eventually change that.

      Whilst we’re talking about dynamic sampling, I also mentioned collections with respect to the defaults for cardinality. Dynamic sampling does not work with collections until something like 11.1.0.7.

      Also, on the same subject and whilst we’ve also mentioned newer functionality, I know you’ve been having a good look at new features in 11gR2 recently. Note that up to 11.2.0.2 at least, cardinality feedback will not kick in if dynamic sampling has been applied.

      Note that I’ve added a little edit to the article just to clarify that I’m deliberately avoiding dynamic sampling here but I don’t want to undermine your valuable comments on its benefits.

      Cheers,
      Dominic

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 68 other followers

%d bloggers like this: