optimizer mode parameter, hint and missing statistics

On older versions at least, using an all_rows hint with the rule-based optimizer is not necessarily the same as setting optimizer_mode to all_rows.

Edit:
Note that I’m deliberately avoiding dynamic sampling and newer features like cardinality feedback, just showing how a little oddity in code paths for slightly different all_rows scenarios might lead to different estimates in older versions.

This resulted from a conversation with a colleague that started something like this:

Colleague: Is the default cardinality for a temporary table 2000?
Me: No, it’s related to block size so you should see 8168.
Colleague: No, I see 2000.

This was observed on 9.2.0.8 using a global temporary table but similar observations are possible with collections, external tables and even normal heap tables.

Yes, 9.2.0.8 is an old version, yes rule-based optimizer is now defunct, but this will remain relevant for the many systems that will continue to run old versions for years to come.

It is reasonably well-known that the default cardinality for collections and global temporary tables is related to block size – an overhead*. So, with an 8k block size, it is common to see a cardinality estimate of 8168, as shown below:

SQL> alter session set optimizer_mode = all_rows;

Session altered.

SQL> create global temporary table t1 
  2  (col1 number); 

Table created.

SQL> explain plan for
  2  select * from t1;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
| Id  | Operation            |  Name       | Rows  | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |             |  8168 |   103K|    12   (9)|
|   1 |  TABLE ACCESS FULL   | T1          |  8168 |   103K|    12   (9)|
-------------------------------------------------------------------------

7 rows selected.

SQL> 

There is no detailed explanation in the 10053 trace as to where the 8168 came from (it is documented – see further down the page):

BASE STATISTICAL INFORMATION
***********************
Table stats    Table: T1   Alias: T1
  TOTAL ::  (NOT ANALYZED)    CDN: 8168  NBLKS:  100  AVG_ROW_LEN:  100

However, if we step back in time and set optimizer_mode to rule but try to make amends using a statement level ALL_ROWS hint:

SQL> alter session set optimizer_mode = rule;

Session altered.

SQL> explain plan for
  2  select /*+ all_rows */ * from t1;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
| Id  | Operation            |  Name       | Rows  | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |             |  2000 | 26000 |    12   (9)|
|   1 |  TABLE ACCESS FULL   | T1          |  2000 | 26000 |    12   (9)|
-------------------------------------------------------------------------

7 rows selected.

SQL> 

We fall into a different code path in the optimizer.

The following is found in a 10053 trace:

BASE STATISTICAL INFORMATION
***********************
Table stats    Table: T1   Alias: T1
  TOTAL ::  (NOT ANALYZED)    CDN: 2000  NBLKS:  100  AVG_ROW_LEN:  100

There’s no explanation regarding where the 2000 comes from but it is documented through versions 9.2 to 11gR2 as being the default used for “Remote Cardinality”.

So it seems that rule-based optimizer + all_rows hint drops into a different path in the optimizer which uses a default cardinality of 2000 rows.

The same behaviour is not true in the latest versions.

In 11gR2 you can still force the deprecated RBO but if you hint all_rows then you seem to go down a consistent path (i.e. consistent estimate of 8168 provided no dynamic sampling or cardinality feedback), as you might have expected above.

*Going back to the default value of 8168, this is calculated from the formula of

num_of_blocks * (block_size – cache_layer) / avg_row_len

And for collections and global temporary tables without any additional statistics, num_of_blocks and avg_row_len default to 100 so we’re left with (block_size – cache_layer) with the cache_layer typically evaluating to 24.

In terms of the general application of this default formula, until recently I didn’t appreciate that in the absence of statistics AND dynamic sampling, that the actual number of blocks is used (if there is an actual number of blocks).

One of the reasons that it’s easy to have missed this formula is that in any recent version, dynamic sampling will kick in by default and so you have to explicitly prevent it. That’s my excuse anyway.

SQL> drop table t1;

Table dropped.

SQL> create table t1
  2  as
  3  select rownum col1
  4  from   dual
  5  connect by rownum <= 100000;

Table created.

SQL> alter session set optimizer_mode = all_rows;

Session altered.

SQL> explain plan for
  2  select /*+ dynamic_sampling(t1 0) */ * from t1;

Explained.

SQL> 
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------
| Id  | Operation            |  Name       | Rows  | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |             | 13314 |   169K|    18   (6)|
|   1 |  TABLE ACCESS FULL   | T1          | 13314 |   169K|    18   (6)|
-------------------------------------------------------------------------

7 rows selected.

SQL> 

With the 10053 trace showing the blocks inputs:

BASE STATISTICAL INFORMATION
***********************
Table stats    Table: T1   Alias: T1
  TOTAL ::  (NOT ANALYZED)    CDN: 13314  NBLKS:  163  AVG_ROW_LEN:  100
Follow

Get every new post delivered to your Inbox.

Join 70 other followers