Projection Pushdown bug in 11.2.0.3

There’s a bug in 11.2.0.3 related to projection pushdown that seems to be influenced by the setting of STATISTICS_LEVEL.

Having taken the time to distill a test case from a real world issue (commenting out columns and logic and bringing view definitions inline etc – original query had no reference to DUAL), turns out it’s very simple to reproduce:

SQL> SELECT * FROM v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production

SQL> ALTER SESSION SET STATISTICS_LEVEL=ALL;

Session altered.

SQL> SELECT 
  2         1     col1,
  3         'X'   col2
  4  FROM   DUAL  trad
  5  UNION
  6  SELECT (SELECT SUM(1)
  7          FROM   DUAL) col1,
  8         TO_CHAR (NULL) col2
  9  FROM   DUAL ;
FROM   DUAL  trad
       *
ERROR at line 4:
ORA-00600: internal error code, arguments: [qkeIsExprReferenced1], [], [], [],
[], [], [], [], [], [], [], []


SQL> 

Maybe you’re thinking that you probably don’t tend to run with STATISTICS_LEVEL set to ALL though?

But perhaps you’re used to doing your SQL tuning with the GATHER_PLAN_STATISTICS hint?

SQL> SELECT /*+ gather_plan_statistics */
  2         1     col1,
  3         'X'   col2
  4  FROM   DUAL  trad
  5  UNION
  6  SELECT (SELECT SUM(1)
  7          FROM   DUAL) col1,
  8         TO_CHAR (NULL) col2
  9  FROM   DUAL ;
FROM   DUAL  trad
       *
ERROR at line 4:
ORA-00600: internal error code, arguments: [qkeIsExprReferenced1], [], [], [],
[], [], [], [], [], [], [], []


SQL> 

You can address the bug easily enough by altering _projection_pushdown to false like so:

SQL> ALTER SESSION SET "_projection_pushdown" = false;

Session altered.

SQL> SELECT /*+ gather_plan_statistics */
  2         1     col1,
  3         'X'   col2
  4  FROM   DUAL  trad
  5  UNION
  6  SELECT (SELECT SUM(1)
  7          FROM   DUAL) col1,
  8         TO_CHAR (NULL) col2
  9  FROM   DUAL ;

      COL1 C
---------- -
         1 X
         1

SQL> ALTER SESSION SET STATISTICS_LEVEL=ALL;

Session altered.

SQL> SELECT 1     col1,
  2         'X'   col2
  3  FROM   DUAL  trad
  4  UNION
  5  SELECT (SELECT SUM(1)
  6          FROM   DUAL) col1,
  7         TO_CHAR (NULL) col2
  8  FROM   DUAL ;

      COL1 C
---------- -
         1 X
         1

SQL> 

Unfortunately, if you’re turning projection pushdown off, then that must critically change the plan of the statement you’re looking at.

Any thoughts that you might use the column projection information of DBMS_XPLAN to help diagnose the bug are scuppered by the fact that this is obviously raised at parse time.

What is particularly disappointing is that we found this on Day 1 of testing an application against 11.2.0.3, an application which we’ve previously tested extensively on 11.2.0.2 and were considering 11.2.0.3 rather than request individual backports for a couple of specific, unrelated bugs that we’ve yet to address on 11.2.0.2.

optimizer mode parameter, hint and missing statistics

On older versions at least, using an all_rows hint with the rule-based optimizer is not necessarily the same as setting optimizer_mode to all_rows.

Edit:
Note that I’m deliberately avoiding dynamic sampling and newer features like cardinality feedback, just showing how a little oddity in code paths for slightly different all_rows scenarios might lead to different estimates in older versions.

This resulted from a conversation with a colleague that started something like this:

Colleague: Is the default cardinality for a temporary table 2000?
Me: No, it’s related to block size so you should see 8168.
Colleague: No, I see 2000.

This was observed on 9.2.0.8 using a global temporary table but similar observations are possible with collections, external tables and even normal heap tables.

Yes, 9.2.0.8 is an old version, yes rule-based optimizer is now defunct, but this will remain relevant for the many systems that will continue to run old versions for years to come.

It is reasonably well-known that the default cardinality for collections and global temporary tables is related to block size – an overhead*. So, with an 8k block size, it is common to see a cardinality estimate of 8168, as shown below:

SQL> alter session set optimizer_mode = all_rows;

Session altered.

SQL> create global temporary table t1 
  2  (col1 number); 

Table created.

SQL> explain plan for
  2  select * from t1;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
| Id  | Operation            |  Name       | Rows  | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |             |  8168 |   103K|    12   (9)|
|   1 |  TABLE ACCESS FULL   | T1          |  8168 |   103K|    12   (9)|
-------------------------------------------------------------------------

7 rows selected.

SQL> 

There is no detailed explanation in the 10053 trace as to where the 8168 came from (it is documented – see further down the page):

BASE STATISTICAL INFORMATION
***********************
Table stats    Table: T1   Alias: T1
  TOTAL ::  (NOT ANALYZED)    CDN: 8168  NBLKS:  100  AVG_ROW_LEN:  100

However, if we step back in time and set optimizer_mode to rule but try to make amends using a statement level ALL_ROWS hint:

SQL> alter session set optimizer_mode = rule;

Session altered.

SQL> explain plan for
  2  select /*+ all_rows */ * from t1;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
| Id  | Operation            |  Name       | Rows  | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |             |  2000 | 26000 |    12   (9)|
|   1 |  TABLE ACCESS FULL   | T1          |  2000 | 26000 |    12   (9)|
-------------------------------------------------------------------------

7 rows selected.

SQL> 

We fall into a different code path in the optimizer.

The following is found in a 10053 trace:

BASE STATISTICAL INFORMATION
***********************
Table stats    Table: T1   Alias: T1
  TOTAL ::  (NOT ANALYZED)    CDN: 2000  NBLKS:  100  AVG_ROW_LEN:  100

There’s no explanation regarding where the 2000 comes from but it is documented through versions 9.2 to 11gR2 as being the default used for “Remote Cardinality”.

So it seems that rule-based optimizer + all_rows hint drops into a different path in the optimizer which uses a default cardinality of 2000 rows.

The same behaviour is not true in the latest versions.

In 11gR2 you can still force the deprecated RBO but if you hint all_rows then you seem to go down a consistent path (i.e. consistent estimate of 8168 provided no dynamic sampling or cardinality feedback), as you might have expected above.

*Going back to the default value of 8168, this is calculated from the formula of

num_of_blocks * (block_size – cache_layer) / avg_row_len

And for collections and global temporary tables without any additional statistics, num_of_blocks and avg_row_len default to 100 so we’re left with (block_size – cache_layer) with the cache_layer typically evaluating to 24.

In terms of the general application of this default formula, until recently I didn’t appreciate that in the absence of statistics AND dynamic sampling, that the actual number of blocks is used (if there is an actual number of blocks).

One of the reasons that it’s easy to have missed this formula is that in any recent version, dynamic sampling will kick in by default and so you have to explicitly prevent it. That’s my excuse anyway.

SQL> drop table t1;

Table dropped.

SQL> create table t1
  2  as
  3  select rownum col1
  4  from   dual
  5  connect by rownum <= 100000;

Table created.

SQL> alter session set optimizer_mode = all_rows;

Session altered.

SQL> explain plan for
  2  select /*+ dynamic_sampling(t1 0) */ * from t1;

Explained.

SQL> 
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------
| Id  | Operation            |  Name       | Rows  | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |             | 13314 |   169K|    18   (6)|
|   1 |  TABLE ACCESS FULL   | T1          | 13314 |   169K|    18   (6)|
-------------------------------------------------------------------------

7 rows selected.

SQL> 

With the 10053 trace showing the blocks inputs:

BASE STATISTICAL INFORMATION
***********************
Table stats    Table: T1   Alias: T1
  TOTAL ::  (NOT ANALYZED)    CDN: 13314  NBLKS:  163  AVG_ROW_LEN:  100

Any thoughts on stats validation?

There is an application running on 9.2.0.8 and this application’s approach to object statistics is that they are gathered along with each quarterly release of code and do not change outside of the release schedule.

There are a number of other additional issues with some elements of this particular implementation, discussion of which are outside the scope of this.

Now, as part of the upgrade to 11.2, it’s highly likely that we will move to the default built-in stat proc, albeit gathering in a weekend window not daily.

One of the biggest advantages of the current situation is that the stats are tested along with the release and released with the release (of course, plans can still change depite stats being fixed – even because they are fixed due to time ticking ever onwards, sequences increasing, etc).

And so, as part of a possibly non-negotiable comfort blanket to move to the default 11g stats job, there is a management requirement to have some sort of validation routine – i.e. a set of sql statements broadly representing important or problem areas of the application – that will be run before/after and compared on some sort of metric and judgement passed on whether the stats are “good”.

There are a number of nifty features like pending stats that we can use here but my initial thoughts on this was that this sounded like a fairly easy job for SQL Tuning Sets (STS) and the SQL Performance Analyzer (SPA).

SPA requires the additional Real Application Testing (RAT) license and, now that our licensing has been negotiated, this isn’t something we’ve got.

So… hand-rolling a validation routine.

I would see SQL Tuning Sets with the sql statements and bind sets as still part of a bespoke solution
(STS are available with the Tuning Pack license which is on top of EE both of which we have).

I could picture a simple approach that would execute these statements in parallel sessions probably via DBMS_SQL, fetch all the rows, record the session stats before/after, pick a metric e.g. buffer gets, elapsed time, or some sort of ratio between those sorts of things and then and do some sort of metric comparison – yes, that run was broadly the same or better, or at least not a catastrophic degradation – these stats are ok.

(Of course, being able to picture a possible approach does not mean that I advocate it)

But I just wondered whether this is something anyone else has done?

I’d be very interested in hearing more if you have.

Edit:
I’ve tried to reword this to avoid some confusion.
Alex – I’ve deleted your comments and my replies because a) they reduce the chance of getting the input the article was aiming for b) they were based on nested recursive misunderstandings and c) much of what you were saying is preaching to the choir.

SQL_IDs and baselines

There have been some interesting posts from Marcin Przepiorowski and Kerry Osborne about mapping sql plan management (spm) baselines to a corresponding sql id.

The key to sql plan baselines (DBA_SQL_PLAN_BASELINES) is SIGNATURE.

And why signature? It’s not just another change to the key to identifying sql statements (HASH_VALUE -> (NEW_)HASH_VALUE / OLD_HASH_VALUE -> SQL_ID).

It’s use in baselines because it’s a more flexible mechanism than a straight up hash – One sql plan baseline can be used against more than one sql id.

As Christian Antognini explains so effectively in his book, when baselines are involved, a sql statement is “normalized to make it both case-insensitive and independent of the blank spaces in the text”.

So, when you look at statements in V$SQL, we’re not talking here about the FORCE_MATCHING_SIGNATURE but EXACT_MATCHING_SIGNATURE.

For example, let’s find three statements with different sql ids but the same signature:

SQL> create table t1
  2  (col1 number);

Table created.

SQL> select /*+ find_me */
  2         *
  3  from   t1
  4  where        col1 = 3;

no rows selected

SQL> select * from table(dbms_xplan.display_cursor());

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID  cr6chh7p7vvzt, child number 0
-------------------------------------
select /*+ find_me */        * from   t1 where        col1 = 3

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |     2 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |    13 |     2   (0)| 00:00:01 |

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("COL1"=3)


22 rows selected.

SQL> select /*+ find_me */
  2         *
  3  from   t1
  4  where  col1 =        3;

no rows selected

SQL> select * from table(dbms_xplan.display_cursor());

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID  8j52h3wtgvu3n, child number 0
-------------------------------------
select /*+ find_me */        * from   t1 where  col1 =        3

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |     2 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |    13 |     2   (0)| 00:00:01 |

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("COL1"=3)


22 rows selected.

SQL> SELECT /*+ find_me */ *
  2  FROM   t1
  3  WHERE  col1 =        3;

no rows selected

SQL> select * from table(dbms_xplan.display_cursor());

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID  894k8t6nu8kbf, child number 0
-------------------------------------
SELECT /*+ find_me */ * FROM   t1 WHERE  col1 =        3

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |     2 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |    13 |     2   (0)| 00:00:01 |

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("COL1"=3)

22 rows selected.

SQL> select sql_id, hash_value, old_hash_value, exact_matching_signature, force_matching_signature
  2  from v$sql 
  3  where sql_id IN ('cr6chh7p7vvzt','8j52h3wtgvu3n','894k8t6nu8kbf');

SQL_ID        HASH_VALUE OLD_HASH_VALUE EXACT_MATCHING_SIGNATURE FORCE_MATCHING_SIGNATURE
------------- ---------- -------------- ------------------------ ------------------------
894k8t6nu8kbf 2846116206     4072117629               4.4562E+18               1.2887E+19
8j52h3wtgvu3n  855500916     4269126066               4.4562E+18               1.2887E+19
cr6chh7p7vvzt 3934121977      717076022               4.4562E+18               1.2887E+19

Now, let’s load them into a SPM baseline:

SQL> declare
  2   l_op pls_integer;
  3  begin
  4   l_op := dbms_spm.load_plans_from_cursor_cache(sql_id =>'cr6chh7p7vvzt');
  5   l_op := dbms_spm.load_plans_from_cursor_cache(sql_id =>'8j52h3wtgvu3n');
  6   l_op := dbms_spm.load_plans_from_cursor_cache(sql_id =>'894k8t6nu8kbf');
  7  end;
  8  /

PL/SQL procedure successfully completed.

SQL> set long 10000
SQL> select to_char(signature), sql_text from dba_sql_plan_baselines where signature = 4456219368825
610060 order by created desc;

TO_CHAR(SIGNATURE)
----------------------------------------
SQL_TEXT
--------------------------------------------------------------------------------
4456219368825610060
select /*+ find_me */
       *
from   t1
where        col1 = 3


SQL> 

Note how the SQL text is from the first statement, the only statement that caused the baseline plan to be created.

What’s the primary purpose of the SQL text in DBA_SQL_PLAN_BASELINES – well, probably as Christian again suggests, to check that the statement for the baseline is actually equivalent to the one being run in the event of hash collisons.

So, now let’s re-run those SELECTs from earlier and watch the baseline kick in for all three:

SQL> select /*+ find_me */
  2         *
  3  from   t1
  4  where        col1 = 3;

no rows selected

SQL> select * from table(dbms_xplan.display_cursor());

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID  cr6chh7p7vvzt, child number 0
-------------------------------------
select /*+ find_me */        * from   t1 where        col1 = 3

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |     2 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |    13 |     2   (0)| 00:00:01 |

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("COL1"=3)

Note
-----
   - SQL plan baseline SQL_PLAN_3vpxbwzn11rucdbd90e8e used for this statement

23 rows selected.

SQL> select /*+ find_me */
  2         *
  3  from   t1
  4  where  col1 =        3;

no rows selected

SQL> select * from table(dbms_xplan.display_cursor());

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID  8j52h3wtgvu3n, child number 0
-------------------------------------
select /*+ find_me */        * from   t1 where  col1 =        3

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |     2 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |    13 |     2   (0)| 00:00:01 |

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("COL1"=3)

Note
-----
   - SQL plan baseline SQL_PLAN_3vpxbwzn11rucdbd90e8e used for this statement

23 rows selected.

SQL> SELECT /*+ find_me */ *
  2  FROM   t1
  3  WHERE  col1 =        3;

no rows selected

SQL> select * from table(dbms_xplan.display_cursor());

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID  894k8t6nu8kbf, child number 0
-------------------------------------
SELECT /*+ find_me */ * FROM   t1 WHERE  col1 =        3

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |     2 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |    13 |     2   (0)| 00:00:01 |

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("COL1"=3)

Note
-----
   - SQL plan baseline SQL_PLAN_3vpxbwzn11rucdbd90e8e used for this statement

23 rows selected.

SQL> 

That’s it really.

So, just showing that signature to sql id is not necessarily a one-to-one relationship but can be one-to-many.

Whilst we’re talking baselines, have you read these interesting posts from Maxym Kharchenko (they’re from a while back but I’ve only recently discovered his blog) showing that, amongst other things, because baselines are essentially based on a sql text match, they can kick in where you don’t intend them to, for example statements that look the same but aren’t and might involve objects with the same name but in different schemas, even different structures. When you think about it, it makes sense how it can happen but it surely can’t be intended that way.

JF – Join Factorization

Just a quickie. No nice isolated demo etc.

I was just looking at rewriting a query that someone asked me to review.

I was playing with what should be the initial driving logic of a bigger query, changing it from a DECODE to a UNION ALL (or an OR) to make use of a FILTER condition.

i.e. from something like:

SELECT t.*
FROM   yyyyy c
,      xxxxx t
WHERE  c.trga_code   = 'LON'
AND    t.cmpy_num    = c.cmpy_num
AND    DECODE(:b1,'S',t.value_date,t.tran_date) >= TO_DATE(:b2,'YYYYMMDD')
AND    DECODE(:b1,'S',t.value_date,t.tran_date) <= TO_DATE(:b4,'YYYYMMDD');

to something not dissimilar to:

SELECT t.*
FROM   yyyyy c
,      xxxxx t
WHERE  c.trga_code   = 'LON'
AND    t.cmpy_num    = c.cmpy_num
AND    :b1           = 'S'
AND    t.value_date >= TO_DATE(:b2,'YYYYMMDD')
AND    t.value_date <= TO_DATE(:b4,'YYYYMMDD')
UNION ALL
SELECT t.*
FROM   yyyyy c
,      xxxxx t 
WHERE  c.trga_code  = 'LON'
AND    t.cmpy_num   = c.cmpy_num
AND    :b1         != 'S'
AND    t.tran_date >= TO_DATE(:b2,'YYYYMMDD')
AND    t.tran_date <= TO_DATE(:b4,'YYYYMMDD');

And I got an unexpected execution plan:

---------------------------------------------------------------------------------------------
| Id  | Operation                              | Name               | Rows  | Pstart| Pstop |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |                    | 13271 |       |       |
|*  1 |  HASH JOIN                             |                    | 13271 |       |       |
|*  2 |   INDEX RANGE SCAN                     | YYYYY_3_IDX        |   268 |       |       |
|   3 |   VIEW                                 | VW_JF_SET$49BA79CF | 13304 |       |       |
|   4 |    UNION-ALL                           |                    |       |       |       |
|*  5 |     FILTER                             |                    |       |       |       |
|   6 |      PARTITION LIST ALL                |                    |  6652 |     1 |    11 |
|*  7 |       TABLE ACCESS BY LOCAL INDEX ROWID| XXXXX              |  6652 |     1 |    11 |
|*  8 |        INDEX RANGE SCAN                | XXXXX_16_IDX       |   112K|     1 |    11 |
|*  9 |     FILTER                             |                    |       |       |       |
|  10 |      PARTITION LIST ALL                |                    |  6652 |     1 |    11 |
|* 11 |       TABLE ACCESS BY LOCAL INDEX ROWID| XXXXX              |  6652 |     1 |    11 |
|* 12 |        INDEX RANGE SCAN                | XXXXX_4_IDX        | 67411 |     1 |    11 |
---------------------------------------------------------------------------------------------

That’s not quite what I was expecting – weird.

I was sitting here thinking “what’s that? … vw_jf? vw_jf?… hang on … vw_jf … jf … jf rings a bell… join factorization”.
Bingo.

See the Oracle Optimizer Blog for more information on Join Factorization and where they summarise the feature as

Join factorization is a cost-based transformation.
It can factorize common computations from branches in a UNION ALL query which can lead to huge performance improvement.

Fair enough. But I don’t want it. Not just yet. Let’s turn it off.
(Although I’ve mentioned before that I don’t tend to like being too prescriptive regarding a “correct plan”, in this case, for my purposes, I don’t want it doing that. I might let the CBO reconsider it later once I piece everything back together but for now…)

V$SQL_HINT mentions a NO_FACTORIZE_JOIN hint.

Unfortunately (or perhaps fortunately given the vast array of undocumented hints exposed here), it doesn’t tell us how to use it and I fiddled around for a bit but couldn’t get it to work.

So, the proper approach is to use ALTER SESSION or OPT_PARAM to change the setting of _optimizer_join_factorization (don’t go changing the setting of hidden parameters without the approval of Oracle Support, etc, etc, etc).

Which gave me what I wanted (for now).

--------------------------------------------------------------------------------------------- 
| Id  | Operation                             | Name                | Rows  | Pstart| Pstop | 
--------------------------------------------------------------------------------------------- 
|   0 | SELECT STATEMENT                      |                     | 13272 |       |       | 
|   1 |  UNION-ALL                            |                     |       |       |       | 
|*  2 |   FILTER                              |                     |       |       |       | 
|*  3 |    HASH JOIN                          |                     |  6636 |       |       | 
|*  4 |     INDEX RANGE SCAN                  | YYYYY_3_IDX         |   268 |       |       | 
|   5 |     PARTITION LIST SUBQUERY           |                     |  6652 |KEY(SQ)|KEY(SQ)| 
|*  6 |      TABLE ACCESS BY LOCAL INDEX ROWID| XXXXX               |  6652 |KEY(SQ)|KEY(SQ)| 
|*  7 |       INDEX RANGE SCAN                | XXXXX_16_IDX        |   112K|KEY(SQ)|KEY(SQ)| 
|*  8 |   FILTER                              |                     |       |       |       | 
|*  9 |    HASH JOIN                          |                     |  6636 |       |       | 
|* 10 |     INDEX RANGE SCAN                  | YYYYY_3_IDX         |   268 |       |       | 
|  11 |     PARTITION LIST SUBQUERY           |                     |  6652 |KEY(SQ)|KEY(SQ)| 
|* 12 |      TABLE ACCESS BY LOCAL INDEX ROWID| XXXXX               |  6652 |KEY(SQ)|KEY(SQ)| 
|* 13 |       INDEX RANGE SCAN                | XXXXX_4_IDX         | 67411 |KEY(SQ)|KEY(SQ)| 
--------------------------------------------------------------------------------------------- 

P.S. This is not meant to be an investigation into the whys and wherefores of why JF was picked and whether it was a good choice, nor any reflection of whether JF is a good thing and whether there are any issues with it.

Further sources on Join Factorization:
Oracle Optimizer Blog – Join Factorization
Jože Senegačnik – Query Transformations
Timur Akhmadeev – Join Factorization

Further information on those Internal View names:
Jonathan Lewis – Internal Views

Interesting cost_io in 10053 trace

17976931348623157081452742373170435679807056752584499
6598917476803157260780028538760589558632766878171540458953514
3824642343213268894641827684675467035375169860499105765512820
7624549009038932894407586850845513394230458323690322294816580
8559332123348274797826204144723168738177180919299881250404026
184124858368.00 to be precise.

I’ve got a problem with a set of sql statements in production. Fortunately the problem is reproducible in a non-prod environment.

These statements all reference one particular virtual column in a composite function-based index.

The sql statements all look something like this:


SELECT col1
FROM my_schema.table1 tab1
WHERE ...
AND NVL(col2,'UNASSIGNED') = SYS_CONTEXT('MY_CONTEXT','MY_COL2_VALUE')
...

And they are raising the following error:


ORA-00904: "MY_SCHEMA"."TAB1"."SYS_NC00062$": invalid identifier

This “SYS_NC00062$” is a hidden/virtual column on this table that is part of a function-based index. I can see it here at least:


select *
from sys.col$
where obj# in (select object_id from dba_objects where object_name = 'TABLE1' and owner = 'MY_SCHEMA') order by col#;

This article isn’t about this error specifically. Not yet at least but I might blog about it later once I know some more.

However, in the meantime, I’ve just started gathering details, running some extended trace files (10046, 10053), etc with a view to finding some extra information and then probably raising an issue on metalink.

I’ve not even started analysing this info yet but as I zipped through the 10053 file, I noted something odd.

Note that in this 10053 trace file, the original reported SYS_NC00062$ has now become SYS_NC00066$ because I dropped and recreated the FBI to see if that helped – it didn’t.

Looking at the cost_io numbers, you get various costs for various access methods – e.g a 16.00, a 3.00, a 15632.00, a 53.00 but then holy moly:


***** Virtual column Adjustment ******
Column name SYS_NC00066$
cost_cpu 300.00
cost_io 17976931348623157081452742373170435679807056752584499
6598917476803157260780028538760589558632766878171540458953514
3824642343213268894641827684675467035375169860499105765512820
7624549009038932894407586850845513394230458323690322294816580
8559332123348274797826204144723168738177180919299881250404026
184124858368.00
***** End virtual column Adjustment ******

I’ve literally done nothing yet but seen that number and thought I’d put in down quickly in an article.

But wow – that’s a big adjustment! What’s that about?

I picked the wrong day to leave my Jonathan Lewis CBO book at home…

Here’s the 10053 trace file.

Application Hanging

Today a problem with the application “hanging”.

The background is that there has been some development in an application against database A.
Database A is now out of action due to a disk controller failure of something akin to that and so the application is now pointing at database B.
When pointing at database A, the application was performing acceptably, not so when pointing at database B where it is “hanging”.

So, straightaway, in response to one of the questions we should ask ourselves when there is a sudden problem – what has changed? – we know that there has been a change in database and the immediate follow-up is what might the differences be between database A and database B.

Fortunately, one of the differences is already known to me and, given the symptoms, is jumping up and down, waving its hands and shouting ‘Look at me’.

Both database A and database B are relatively recent copies of the production database.
All 8.1.7 by the way.
However, the production database has a bad mix of some tables having statistics (some stale) and some having none which can lead the CBO to make some inappropriate decisions due to the difference between default values it uses for those objects without statistics and the data distribution reality.

Database A had had a statistics update subsequent to the refresh.
Database B had not.

Therefore it is highly likely that the symptom of the application hanging is a less than optimal plan on the SQL concerned.

For the application operation that is hanging, quite a few bits of SQL might be issued.
There might be more than one bit of SQL that is performing unacceptably but in any event, we need to work through each suboptimal performer as we find it.

The application mostly uses ORM to generate its lovely queries (although in this case the development that has occurred is to optimise some of the iterative, slow-by-slow dynamic SQL that is generated).

Therefore the best mechanism is to trace the database session concerned, replicate the “hanging” behaviour and see what it is doing. What I am expecting to find is that, on “hanging”, the last piece of SQL in the trace file will be doing more IO than it needs to.

So, a trigger is required which sets the extended trace event for a specific user:


CREATE OR REPLACE TRIGGER xxxxxx
AFTER LOGON ON DATABASE
DECLARE
BEGIN
IF USER = 'yyyyy'
THEN
--
EXECUTE IMMEDIATE 'ALTER SESSION SET events ''10046 trace name context forever, level 12''';
--
END IF;
END;

The next step is to reproduce the “hanging” behaviour via the application.
Sure enough this generates a trace file on the server which is doing many, many “db file sequential reads”.
More than normal? (where normal is the behaviour when it performs acceptably)
Don’t know for sure yet, but it’s a good bet.

The SQL in question is something like this:


SELECT ....
FROM News t0
, (SELECT DISTINCT T1.newsid
FROM nolindex T2
, IMPORTERS T3
, NewsIndex T1
WHERE T3.destindexid = :1
AND T1.sectionid = T2.indexid
AND T2.indexid = T3.sourceindexid) v1
WHERE t0.ID = v1.newsid

The trace file is incomplete because, due to all the IO, it reached its size limit. Also the application was killed because we had the information we needed.

On database B (some stats but not all), running the SQL with a SET AUTOTRACE TRACE EXPLAIN reveals the following:


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=216293 Card=59764261584 Bytes=196445127826608)
1 0 MERGE JOIN (Cost=216293 Card=59764261584 Bytes=196445127826608)
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'NEWS' (Cost=826 Card=1515434 Bytes=4961530916)
3 2 INDEX (FULL SCAN) OF 'PKNEWSTABLE' (UNIQUE) (Cost=26 Card=1515434)
4 1 SORT (JOIN) (Cost=50339 Card=3943706 Bytes=51268178)
5 4 VIEW (Cost=165128 Card=3943706 Bytes=51268178)
6 5 SORT (UNIQUE) (Cost=165128 Card=3943706 Bytes=256340890)
7 6 NESTED LOOPS (Cost=24 Card=3943706 Bytes=256340890)
8 7 MERGE JOIN (CARTESIAN) (Cost=24 Card=2589433 Bytes=134650516)
9 8 TABLE ACCESS (BY INDEX ROWID) OF 'IMPORTERS' (Cost=2 Card=11 Bytes=286)
10 9 INDEX (RANGE SCAN) OF 'DESTINDEXID_IDX' (NON-UNIQUE) (Cost=1 Card=11)
11 8 SORT (JOIN) (Cost=22 Card=235403 Bytes=6120478)
12 11 INDEX (FAST FULL SCAN) OF 'NEWSINDEXSECTLANGNEWSINDEX' (NON-UNIQUE) (Cost=2 Card=235403 Bytes=6120478)
13 7 INDEX (UNIQUE SCAN) OF 'NOLINDEX_PK' (UNIQUE)

There are some big numbers in there.

Next step is to run against database C which should be similar to database A before it died – i.e. refresh from production with statistics on more objects:


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=206 Card=89 Bytes=26700)
1 0 NESTED LOOPS (Cost=206 Card=89 Bytes=26700)
2 1 VIEW (Cost=28 Card=89 Bytes=1157)
3 2 SORT (UNIQUE) (Cost=28 Card=89 Bytes=2314)
4 3 NESTED LOOPS (Cost=26 Card=89 Bytes=2314)
5 4 NESTED LOOPS (Cost=2 Card=8 Bytes=120)
6 5 TABLE ACCESS (FULL) OF 'IMPORTERS' (Cost=2 Card=8 Bytes=80)
7 5 INDEX (UNIQUE SCAN) OF 'NOLINDEX_PK' (UNIQUE)
8 4 INDEX (RANGE SCAN) OF 'NEWSINDEXSECTLANGNEWSINDEX' (NON-UNIQUE) (Cost=3 Card=225099 Bytes=2476089)
9 1 TABLE ACCESS (BY INDEX ROWID) OF 'NEWS' (Cost=2 Card=4695495 Bytes=1347607065)
10 9 INDEX (UNIQUE SCAN) OF 'PKNEWSTABLE' (UNIQUE) (Cost=1 Card=4695495)

Some much smaller numbers and completely different access paths.

The numbers themselves are not an issue – it’s the different access paths that is the problem.
What I mean is that if, via hints for example, we forced the second access path on the poorly performing database, explain plan would still have much bigger numbers.

And so, for possible resolutions, we have a number of options, at least four, possibly more:
Option numero uno: Gather appropriate statistics on all objects so that the CBO has a decent chance.
Option numero the next one: Get rid of the partial statistics so that we have none and go back to using RBO.
Option three: Hint the SQL in the application.
Option four: Gather a stored outline for the query.

Options one and two can be discounted due to organisational “fear” of change. I would say that option one – all objects with appropriate statistics – is much preferable to option two. However, neither is a go-er due to fear that this might have a wider detrimental effect. (Incidentally, option one was attempted previously but the deployment was backed out because, according to rumour, the performance of something somewhere degraded)
Option three might be ok. The downside from an application perspective is that to put the hints in requires an application deployment, an application stop and start, and to take it out requires the same. As part of a wider strategy for this particular application, its suitability is questionable because it’s ORM generated code and therefore it’s probably not possible to control the hinting.
That leaves option four – capture a stored outline for the query on the “good” database, export, import to the “bad” database(s) and enable usage thereof. Might require some shared pool flushage but easily turn-on-and-offable, can be repeated easily for other problematic queries.

And just for completeness, if we were going for option four, the steps would be to first create the stored outline on the “good” database:


CREATE OR REPLACE OUTLINE <outline_name>
FOR CATEGORY <category_name>
ON
<sql_statement>;

(Note that the text of the sql statement must match exactly that that will be executed by the application.)

Next, export the outline from the “good database”, e.g.


exp userid=outln/ query="where category = ''<category_name>" tables=(ol$,ol$hints)

Then, import into the “bad” database, e.g.


imp userid=outln/ full=y ignore=yes

Finally, maybe use a trigger to enable the usage of the stored outline for the problem application user:


CREATE OR REPLACE TRIGGER xxxxxxxxx
AFTER LOGON ON DATABASE
DECLARE
BEGIN
IF USER = yyyyyyyyyyyyy
THEN
--
EXECUTE IMMEDIATE 'ALTER SESSION SET USE_STORED_OUTLINES = <category_name> ;
--
END IF;
EXCEPTION WHEN OTHERS THEN NULL;
END;

Follow

Get every new post delivered to your Inbox.

Join 70 other followers