SQL_IDs and baselines

There have been some interesting posts from Marcin Przepiorowski and Kerry Osborne about mapping sql plan management (spm) baselines to a corresponding sql id.

The key to sql plan baselines (DBA_SQL_PLAN_BASELINES) is SIGNATURE.

And why signature? It’s not just another change to the key to identifying sql statements (HASH_VALUE -> (NEW_)HASH_VALUE / OLD_HASH_VALUE -> SQL_ID).

It’s use in baselines because it’s a more flexible mechanism than a straight up hash – One sql plan baseline can be used against more than one sql id.

As Christian Antognini explains so effectively in his book, when baselines are involved, a sql statement is “normalized to make it both case-insensitive and independent of the blank spaces in the text”.

So, when you look at statements in V$SQL, we’re not talking here about the FORCE_MATCHING_SIGNATURE but EXACT_MATCHING_SIGNATURE.

For example, let’s find three statements with different sql ids but the same signature:

SQL> create table t1
  2  (col1 number);

Table created.

SQL> select /*+ find_me */
  2         *
  3  from   t1
  4  where        col1 = 3;

no rows selected

SQL> select * from table(dbms_xplan.display_cursor());

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID  cr6chh7p7vvzt, child number 0
-------------------------------------
select /*+ find_me */        * from   t1 where        col1 = 3

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |     2 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |    13 |     2   (0)| 00:00:01 |

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("COL1"=3)


22 rows selected.

SQL> select /*+ find_me */
  2         *
  3  from   t1
  4  where  col1 =        3;

no rows selected

SQL> select * from table(dbms_xplan.display_cursor());

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID  8j52h3wtgvu3n, child number 0
-------------------------------------
select /*+ find_me */        * from   t1 where  col1 =        3

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |     2 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |    13 |     2   (0)| 00:00:01 |

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("COL1"=3)


22 rows selected.

SQL> SELECT /*+ find_me */ *
  2  FROM   t1
  3  WHERE  col1 =        3;

no rows selected

SQL> select * from table(dbms_xplan.display_cursor());

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID  894k8t6nu8kbf, child number 0
-------------------------------------
SELECT /*+ find_me */ * FROM   t1 WHERE  col1 =        3

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |     2 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |    13 |     2   (0)| 00:00:01 |

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("COL1"=3)

22 rows selected.

SQL> select sql_id, hash_value, old_hash_value, exact_matching_signature, force_matching_signature
  2  from v$sql 
  3  where sql_id IN ('cr6chh7p7vvzt','8j52h3wtgvu3n','894k8t6nu8kbf');

SQL_ID        HASH_VALUE OLD_HASH_VALUE EXACT_MATCHING_SIGNATURE FORCE_MATCHING_SIGNATURE
------------- ---------- -------------- ------------------------ ------------------------
894k8t6nu8kbf 2846116206     4072117629               4.4562E+18               1.2887E+19
8j52h3wtgvu3n  855500916     4269126066               4.4562E+18               1.2887E+19
cr6chh7p7vvzt 3934121977      717076022               4.4562E+18               1.2887E+19

Now, let’s load them into a SPM baseline:

SQL> declare
  2   l_op pls_integer;
  3  begin
  4   l_op := dbms_spm.load_plans_from_cursor_cache(sql_id =>'cr6chh7p7vvzt');
  5   l_op := dbms_spm.load_plans_from_cursor_cache(sql_id =>'8j52h3wtgvu3n');
  6   l_op := dbms_spm.load_plans_from_cursor_cache(sql_id =>'894k8t6nu8kbf');
  7  end;
  8  /

PL/SQL procedure successfully completed.

SQL> set long 10000
SQL> select to_char(signature), sql_text from dba_sql_plan_baselines where signature = 4456219368825
610060 order by created desc;

TO_CHAR(SIGNATURE)
----------------------------------------
SQL_TEXT
--------------------------------------------------------------------------------
4456219368825610060
select /*+ find_me */
       *
from   t1
where        col1 = 3


SQL> 

Note how the SQL text is from the first statement, the only statement that caused the baseline plan to be created.

What’s the primary purpose of the SQL text in DBA_SQL_PLAN_BASELINES – well, probably as Christian again suggests, to check that the statement for the baseline is actually equivalent to the one being run in the event of hash collisons.

So, now let’s re-run those SELECTs from earlier and watch the baseline kick in for all three:

SQL> select /*+ find_me */
  2         *
  3  from   t1
  4  where        col1 = 3;

no rows selected

SQL> select * from table(dbms_xplan.display_cursor());

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID  cr6chh7p7vvzt, child number 0
-------------------------------------
select /*+ find_me */        * from   t1 where        col1 = 3

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |     2 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |    13 |     2   (0)| 00:00:01 |

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("COL1"=3)

Note
-----
   - SQL plan baseline SQL_PLAN_3vpxbwzn11rucdbd90e8e used for this statement

23 rows selected.

SQL> select /*+ find_me */
  2         *
  3  from   t1
  4  where  col1 =        3;

no rows selected

SQL> select * from table(dbms_xplan.display_cursor());

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID  8j52h3wtgvu3n, child number 0
-------------------------------------
select /*+ find_me */        * from   t1 where  col1 =        3

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |     2 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |    13 |     2   (0)| 00:00:01 |

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("COL1"=3)

Note
-----
   - SQL plan baseline SQL_PLAN_3vpxbwzn11rucdbd90e8e used for this statement

23 rows selected.

SQL> SELECT /*+ find_me */ *
  2  FROM   t1
  3  WHERE  col1 =        3;

no rows selected

SQL> select * from table(dbms_xplan.display_cursor());

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID  894k8t6nu8kbf, child number 0
-------------------------------------
SELECT /*+ find_me */ * FROM   t1 WHERE  col1 =        3

Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |     2 (100)|          |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |    13 |     2   (0)| 00:00:01 |

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("COL1"=3)

Note
-----
   - SQL plan baseline SQL_PLAN_3vpxbwzn11rucdbd90e8e used for this statement

23 rows selected.

SQL> 

That’s it really.

So, just showing that signature to sql id is not necessarily a one-to-one relationship but can be one-to-many.

Whilst we’re talking baselines, have you read these interesting posts from Maxym Kharchenko (they’re from a while back but I’ve only recently discovered his blog) showing that, amongst other things, because baselines are essentially based on a sql text match, they can kick in where you don’t intend them to, for example statements that look the same but aren’t and might involve objects with the same name but in different schemas, even different structures. When you think about it, it makes sense how it can happen but it surely can’t be intended that way.

JF – Join Factorization

Just a quickie. No nice isolated demo etc.

I was just looking at rewriting a query that someone asked me to review.

I was playing with what should be the initial driving logic of a bigger query, changing it from a DECODE to a UNION ALL (or an OR) to make use of a FILTER condition.

i.e. from something like:

SELECT t.*
FROM   yyyyy c
,      xxxxx t
WHERE  c.trga_code   = 'LON'
AND    t.cmpy_num    = c.cmpy_num
AND    DECODE(:b1,'S',t.value_date,t.tran_date) >= TO_DATE(:b2,'YYYYMMDD')
AND    DECODE(:b1,'S',t.value_date,t.tran_date) <= TO_DATE(:b4,'YYYYMMDD');

to something not dissimilar to:

SELECT t.*
FROM   yyyyy c
,      xxxxx t
WHERE  c.trga_code   = 'LON'
AND    t.cmpy_num    = c.cmpy_num
AND    :b1           = 'S'
AND    t.value_date >= TO_DATE(:b2,'YYYYMMDD')
AND    t.value_date <= TO_DATE(:b4,'YYYYMMDD')
UNION ALL
SELECT t.*
FROM   yyyyy c
,      xxxxx t 
WHERE  c.trga_code  = 'LON'
AND    t.cmpy_num   = c.cmpy_num
AND    :b1         != 'S'
AND    t.tran_date >= TO_DATE(:b2,'YYYYMMDD')
AND    t.tran_date <= TO_DATE(:b4,'YYYYMMDD');

And I got an unexpected execution plan:

---------------------------------------------------------------------------------------------
| Id  | Operation                              | Name               | Rows  | Pstart| Pstop |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |                    | 13271 |       |       |
|*  1 |  HASH JOIN                             |                    | 13271 |       |       |
|*  2 |   INDEX RANGE SCAN                     | YYYYY_3_IDX        |   268 |       |       |
|   3 |   VIEW                                 | VW_JF_SET$49BA79CF | 13304 |       |       |
|   4 |    UNION-ALL                           |                    |       |       |       |
|*  5 |     FILTER                             |                    |       |       |       |
|   6 |      PARTITION LIST ALL                |                    |  6652 |     1 |    11 |
|*  7 |       TABLE ACCESS BY LOCAL INDEX ROWID| XXXXX              |  6652 |     1 |    11 |
|*  8 |        INDEX RANGE SCAN                | XXXXX_16_IDX       |   112K|     1 |    11 |
|*  9 |     FILTER                             |                    |       |       |       |
|  10 |      PARTITION LIST ALL                |                    |  6652 |     1 |    11 |
|* 11 |       TABLE ACCESS BY LOCAL INDEX ROWID| XXXXX              |  6652 |     1 |    11 |
|* 12 |        INDEX RANGE SCAN                | XXXXX_4_IDX        | 67411 |     1 |    11 |
---------------------------------------------------------------------------------------------

That’s not quite what I was expecting – weird.

I was sitting here thinking “what’s that? … vw_jf? vw_jf?… hang on … vw_jf … jf … jf rings a bell… join factorization”.
Bingo.

See the Oracle Optimizer Blog for more information on Join Factorization and where they summarise the feature as

Join factorization is a cost-based transformation.
It can factorize common computations from branches in a UNION ALL query which can lead to huge performance improvement.

Fair enough. But I don’t want it. Not just yet. Let’s turn it off.
(Although I’ve mentioned before that I don’t tend to like being too prescriptive regarding a “correct plan”, in this case, for my purposes, I don’t want it doing that. I might let the CBO reconsider it later once I piece everything back together but for now…)

V$SQL_HINT mentions a NO_FACTORIZE_JOIN hint.

Unfortunately (or perhaps fortunately given the vast array of undocumented hints exposed here), it doesn’t tell us how to use it and I fiddled around for a bit but couldn’t get it to work.

So, the proper approach is to use ALTER SESSION or OPT_PARAM to change the setting of _optimizer_join_factorization (don’t go changing the setting of hidden parameters without the approval of Oracle Support, etc, etc, etc).

Which gave me what I wanted (for now).

--------------------------------------------------------------------------------------------- 
| Id  | Operation                             | Name                | Rows  | Pstart| Pstop | 
--------------------------------------------------------------------------------------------- 
|   0 | SELECT STATEMENT                      |                     | 13272 |       |       | 
|   1 |  UNION-ALL                            |                     |       |       |       | 
|*  2 |   FILTER                              |                     |       |       |       | 
|*  3 |    HASH JOIN                          |                     |  6636 |       |       | 
|*  4 |     INDEX RANGE SCAN                  | YYYYY_3_IDX         |   268 |       |       | 
|   5 |     PARTITION LIST SUBQUERY           |                     |  6652 |KEY(SQ)|KEY(SQ)| 
|*  6 |      TABLE ACCESS BY LOCAL INDEX ROWID| XXXXX               |  6652 |KEY(SQ)|KEY(SQ)| 
|*  7 |       INDEX RANGE SCAN                | XXXXX_16_IDX        |   112K|KEY(SQ)|KEY(SQ)| 
|*  8 |   FILTER                              |                     |       |       |       | 
|*  9 |    HASH JOIN                          |                     |  6636 |       |       | 
|* 10 |     INDEX RANGE SCAN                  | YYYYY_3_IDX         |   268 |       |       | 
|  11 |     PARTITION LIST SUBQUERY           |                     |  6652 |KEY(SQ)|KEY(SQ)| 
|* 12 |      TABLE ACCESS BY LOCAL INDEX ROWID| XXXXX               |  6652 |KEY(SQ)|KEY(SQ)| 
|* 13 |       INDEX RANGE SCAN                | XXXXX_4_IDX         | 67411 |KEY(SQ)|KEY(SQ)| 
--------------------------------------------------------------------------------------------- 

P.S. This is not meant to be an investigation into the whys and wherefores of why JF was picked and whether it was a good choice, nor any reflection of whether JF is a good thing and whether there are any issues with it.

Further sources on Join Factorization:
Oracle Optimizer Blog – Join Factorization
Jože Senegačnik – Query Transformations
Timur Akhmadeev – Join Factorization

Further information on those Internal View names:
Jonathan Lewis – Internal Views

Interesting cost_io in 10053 trace

17976931348623157081452742373170435679807056752584499
6598917476803157260780028538760589558632766878171540458953514
3824642343213268894641827684675467035375169860499105765512820
7624549009038932894407586850845513394230458323690322294816580
8559332123348274797826204144723168738177180919299881250404026
184124858368.00 to be precise.

I’ve got a problem with a set of sql statements in production. Fortunately the problem is reproducible in a non-prod environment.

These statements all reference one particular virtual column in a composite function-based index.

The sql statements all look something like this:


SELECT col1
FROM my_schema.table1 tab1
WHERE ...
AND NVL(col2,'UNASSIGNED') = SYS_CONTEXT('MY_CONTEXT','MY_COL2_VALUE')
...

And they are raising the following error:


ORA-00904: "MY_SCHEMA"."TAB1"."SYS_NC00062$": invalid identifier

This “SYS_NC00062$” is a hidden/virtual column on this table that is part of a function-based index. I can see it here at least:


select *
from sys.col$
where obj# in (select object_id from dba_objects where object_name = 'TABLE1' and owner = 'MY_SCHEMA') order by col#;

This article isn’t about this error specifically. Not yet at least but I might blog about it later once I know some more.

However, in the meantime, I’ve just started gathering details, running some extended trace files (10046, 10053), etc with a view to finding some extra information and then probably raising an issue on metalink.

I’ve not even started analysing this info yet but as I zipped through the 10053 file, I noted something odd.

Note that in this 10053 trace file, the original reported SYS_NC00062$ has now become SYS_NC00066$ because I dropped and recreated the FBI to see if that helped – it didn’t.

Looking at the cost_io numbers, you get various costs for various access methods – e.g a 16.00, a 3.00, a 15632.00, a 53.00 but then holy moly:


***** Virtual column Adjustment ******
Column name SYS_NC00066$
cost_cpu 300.00
cost_io 17976931348623157081452742373170435679807056752584499
6598917476803157260780028538760589558632766878171540458953514
3824642343213268894641827684675467035375169860499105765512820
7624549009038932894407586850845513394230458323690322294816580
8559332123348274797826204144723168738177180919299881250404026
184124858368.00
***** End virtual column Adjustment ******

I’ve literally done nothing yet but seen that number and thought I’d put in down quickly in an article.

But wow – that’s a big adjustment! What’s that about?

I picked the wrong day to leave my Jonathan Lewis CBO book at home…

Here’s the 10053 trace file.

Application Hanging

Today a problem with the application “hanging”.

The background is that there has been some development in an application against database A.
Database A is now out of action due to a disk controller failure of something akin to that and so the application is now pointing at database B.
When pointing at database A, the application was performing acceptably, not so when pointing at database B where it is “hanging”.

So, straightaway, in response to one of the questions we should ask ourselves when there is a sudden problem – what has changed? – we know that there has been a change in database and the immediate follow-up is what might the differences be between database A and database B.

Fortunately, one of the differences is already known to me and, given the symptoms, is jumping up and down, waving its hands and shouting ‘Look at me’.

Both database A and database B are relatively recent copies of the production database.
All 8.1.7 by the way.
However, the production database has a bad mix of some tables having statistics (some stale) and some having none which can lead the CBO to make some inappropriate decisions due to the difference between default values it uses for those objects without statistics and the data distribution reality.

Database A had had a statistics update subsequent to the refresh.
Database B had not.

Therefore it is highly likely that the symptom of the application hanging is a less than optimal plan on the SQL concerned.

For the application operation that is hanging, quite a few bits of SQL might be issued.
There might be more than one bit of SQL that is performing unacceptably but in any event, we need to work through each suboptimal performer as we find it.

The application mostly uses ORM to generate its lovely queries (although in this case the development that has occurred is to optimise some of the iterative, slow-by-slow dynamic SQL that is generated).

Therefore the best mechanism is to trace the database session concerned, replicate the “hanging” behaviour and see what it is doing. What I am expecting to find is that, on “hanging”, the last piece of SQL in the trace file will be doing more IO than it needs to.

So, a trigger is required which sets the extended trace event for a specific user:


CREATE OR REPLACE TRIGGER xxxxxx
AFTER LOGON ON DATABASE
DECLARE
BEGIN
IF USER = 'yyyyy'
THEN
--
EXECUTE IMMEDIATE 'ALTER SESSION SET events ''10046 trace name context forever, level 12''';
--
END IF;
END;

The next step is to reproduce the “hanging” behaviour via the application.
Sure enough this generates a trace file on the server which is doing many, many “db file sequential reads”.
More than normal? (where normal is the behaviour when it performs acceptably)
Don’t know for sure yet, but it’s a good bet.

The SQL in question is something like this:


SELECT ....
FROM News t0
, (SELECT DISTINCT T1.newsid
FROM nolindex T2
, IMPORTERS T3
, NewsIndex T1
WHERE T3.destindexid = :1
AND T1.sectionid = T2.indexid
AND T2.indexid = T3.sourceindexid) v1
WHERE t0.ID = v1.newsid

The trace file is incomplete because, due to all the IO, it reached its size limit. Also the application was killed because we had the information we needed.

On database B (some stats but not all), running the SQL with a SET AUTOTRACE TRACE EXPLAIN reveals the following:


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=216293 Card=59764261584 Bytes=196445127826608)
1 0 MERGE JOIN (Cost=216293 Card=59764261584 Bytes=196445127826608)
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'NEWS' (Cost=826 Card=1515434 Bytes=4961530916)
3 2 INDEX (FULL SCAN) OF 'PKNEWSTABLE' (UNIQUE) (Cost=26 Card=1515434)
4 1 SORT (JOIN) (Cost=50339 Card=3943706 Bytes=51268178)
5 4 VIEW (Cost=165128 Card=3943706 Bytes=51268178)
6 5 SORT (UNIQUE) (Cost=165128 Card=3943706 Bytes=256340890)
7 6 NESTED LOOPS (Cost=24 Card=3943706 Bytes=256340890)
8 7 MERGE JOIN (CARTESIAN) (Cost=24 Card=2589433 Bytes=134650516)
9 8 TABLE ACCESS (BY INDEX ROWID) OF 'IMPORTERS' (Cost=2 Card=11 Bytes=286)
10 9 INDEX (RANGE SCAN) OF 'DESTINDEXID_IDX' (NON-UNIQUE) (Cost=1 Card=11)
11 8 SORT (JOIN) (Cost=22 Card=235403 Bytes=6120478)
12 11 INDEX (FAST FULL SCAN) OF 'NEWSINDEXSECTLANGNEWSINDEX' (NON-UNIQUE) (Cost=2 Card=235403 Bytes=6120478)
13 7 INDEX (UNIQUE SCAN) OF 'NOLINDEX_PK' (UNIQUE)

There are some big numbers in there.

Next step is to run against database C which should be similar to database A before it died – i.e. refresh from production with statistics on more objects:


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=206 Card=89 Bytes=26700)
1 0 NESTED LOOPS (Cost=206 Card=89 Bytes=26700)
2 1 VIEW (Cost=28 Card=89 Bytes=1157)
3 2 SORT (UNIQUE) (Cost=28 Card=89 Bytes=2314)
4 3 NESTED LOOPS (Cost=26 Card=89 Bytes=2314)
5 4 NESTED LOOPS (Cost=2 Card=8 Bytes=120)
6 5 TABLE ACCESS (FULL) OF 'IMPORTERS' (Cost=2 Card=8 Bytes=80)
7 5 INDEX (UNIQUE SCAN) OF 'NOLINDEX_PK' (UNIQUE)
8 4 INDEX (RANGE SCAN) OF 'NEWSINDEXSECTLANGNEWSINDEX' (NON-UNIQUE) (Cost=3 Card=225099 Bytes=2476089)
9 1 TABLE ACCESS (BY INDEX ROWID) OF 'NEWS' (Cost=2 Card=4695495 Bytes=1347607065)
10 9 INDEX (UNIQUE SCAN) OF 'PKNEWSTABLE' (UNIQUE) (Cost=1 Card=4695495)

Some much smaller numbers and completely different access paths.

The numbers themselves are not an issue – it’s the different access paths that is the problem.
What I mean is that if, via hints for example, we forced the second access path on the poorly performing database, explain plan would still have much bigger numbers.

And so, for possible resolutions, we have a number of options, at least four, possibly more:
Option numero uno: Gather appropriate statistics on all objects so that the CBO has a decent chance.
Option numero the next one: Get rid of the partial statistics so that we have none and go back to using RBO.
Option three: Hint the SQL in the application.
Option four: Gather a stored outline for the query.

Options one and two can be discounted due to organisational “fear” of change. I would say that option one – all objects with appropriate statistics – is much preferable to option two. However, neither is a go-er due to fear that this might have a wider detrimental effect. (Incidentally, option one was attempted previously but the deployment was backed out because, according to rumour, the performance of something somewhere degraded)
Option three might be ok. The downside from an application perspective is that to put the hints in requires an application deployment, an application stop and start, and to take it out requires the same. As part of a wider strategy for this particular application, its suitability is questionable because it’s ORM generated code and therefore it’s probably not possible to control the hinting.
That leaves option four – capture a stored outline for the query on the “good” database, export, import to the “bad” database(s) and enable usage thereof. Might require some shared pool flushage but easily turn-on-and-offable, can be repeated easily for other problematic queries.

And just for completeness, if we were going for option four, the steps would be to first create the stored outline on the “good” database:


CREATE OR REPLACE OUTLINE <outline_name>
FOR CATEGORY <category_name>
ON
<sql_statement>;

(Note that the text of the sql statement must match exactly that that will be executed by the application.)

Next, export the outline from the “good database”, e.g.


exp userid=outln/ query="where category = ''<category_name>" tables=(ol$,ol$hints)

Then, import into the “bad” database, e.g.


imp userid=outln/ full=y ignore=yes

Finally, maybe use a trigger to enable the usage of the stored outline for the problem application user:


CREATE OR REPLACE TRIGGER xxxxxxxxx
AFTER LOGON ON DATABASE
DECLARE
BEGIN
IF USER = yyyyyyyyyyyyy
THEN
--
EXECUTE IMMEDIATE 'ALTER SESSION SET USE_STORED_OUTLINES = <category_name> ;
--
END IF;
EXCEPTION WHEN OTHERS THEN NULL;
END;

Using Stored Outlines to figure out a hint or two

Stored Outlines capture a set of hints which direct the optimizer how to execute a specific query.

I’ve already mentioned previously on a plan here to use these for Optimizer Plan Stability for an upgrade from 8i to 10g (however I think the current thinking is that we’re just going to wing it).

However, I’ve also mentioned that in our current 8i database some objects have some statistics (mostly stale) and some objects have no statistics, and when you mix both into a query the optimizer makes some potential dodgy guesses about how much data is in the objects without stats. And the same guesses/defaults apply to the usage of GLOBAL TEMPORARY TABLES and CASTing ARRAYs to with the TABLE operator.

As a result, some recent queries have been performing suboptimally.

Like most, I prefer to not use hints where possible and recognise that in most cases it is much better to leave a well-informed optimizer to make its own mind up, being a clever little thing with access to lots of information that I don’t. However, unfortunately, there are circumstances where it needs a bit of a hand and that’s where hints come in.

I don’t have a comprehensive understanding of hints and join mechanisms, but I’m working on it slowly. And I certainly can’t look at a query and visualise how it should all come together, well, not unless it’s pretty darn simple.

What I often find in a more complex query is that I’ll try to add in one or two hints and things might improve but they don’t quite arrive where I’d like them to.

So, and I have touched on the titled side-effect usage of Stored Outlines very briefly in another previous post, I tend to use Explain Plan in conjunction with Stored Outlines on a healthy development system (one with relevant statistics from like production data volumes and distribution) to tell me how it a fully-informed optimizer would do things:


create or replace outline myoutline
for category mycategory
on select.....

And then look at the hints here:


select * from user_outline_hints order by stage;

And this might involve creating a dummy table with some specific data in there, fully analysing it, and substituting that for a collection CAST or a GLOBAL TEMPORARY TABLE in the SQL before capturing the outline.

And then I tend to extract some key hints in the right order and bung them into the original SQL. Usually with the desired effect.

This won’t work in all scenarios but it does some/most of the time. In my situation, at least.

I do feel deeply uncomfortable using hints on SQL just because the statistics situation in a production database is not good. And obviously, in most cases that I’m talking about, these are workarounds precisely because of that. But sometimes the perceived risks of making changes (in this case here, sorting out the statistics and in fact, more like ‘often’ than ‘sometimes’ everywhere in IT these days) cause a stalemate that is difficult to break free from.

Varying IN lists

In his column in the current edition of Oracle magazine, Tom Kyte wrote about the recurring issue of varying in lists and provided some solutions for this.

This ties in nicely with a recommendation I had already made at my current client where there is a significant usage of IN lists within dynamic SQL issued by applications on top of the 8.1.7.4 DB. So, if we were using bind variables for these IN lists, then if there were 1 -> n arguments in the IN lists, we would probably find n similar queries in the shared pool, each with a different number of arguments in that IN list.

However, these dynamic varying IN lists are not using bind variables.

 As a result, there is a fair amount of SQL ageing out of the shared pool as these sequence-generated ids creep upwards with every day bringing a new range of ids and, not surprisingly, we are doing too much hard parsing.

So, there is a two-pronged recommendation, firstly to change the CURSOR_SHARING to FORCE in order to do some auto-binding for certain users (to be set using an AFTER LOGON trigger) and, secondly, to use a better approach for IN lists, as detailed in the link above.

The advantages should hopefully be a reduction in hard parsing and, within the application, the end of the limit on the number of arguments in the dynamic string-concatenated IN list.

I have been using an approach very similar to the 8i approach that Tom describes in his column.

An example of the transition, using a two-argument IN list, is from:

select n.slug, n.id
from news n
where n.status=15
and n.sectionid in ('42299, 42295')

to

select n.slug, n.id
from news n
where n.status=15
and n.sectionid in
(
  select * from table ( cast ( f_numinlist('42299, 42295') as tt_number))
)

and, in the above example, ’42299, 42295′ should be bound as a single bind (this is for an application with varying numbers of arguments and, as Tom points out in his article, if there is a finite number of arguments, then the best approach might be to just individually bind each item).

However, on implementing this approach, there was a very definite performance impact.
An explain plan for the original code looked like this:

SQL> select n.slug, n.id
  2 from news n
  3 where n.status=15
  4 and n.sectionid in ('42299, 42295')
  5 /
Execution Plan
----------------------------------------------------------
  0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=48 Bytes=1488)
  1 0 INDEX (RANGE SCAN) OF 'I_NEWS_001' (NON-UNIQUE) (Cost=4 Card=48 Bytes=1488)

An explain plan for the revised statement revealed the following:

SQL> l
  1 select n.slug, n.id
  2 from news n
  3 where n.status=15
  4 and n.sectionid in
  5 (
  6 select * from table ( cast ( f_numinlist('42299, 42295') as tt_number))
  7* )
SQL> /
Execution Plan
----------------------------------------------------------
  0 SELECT STATEMENT Optimizer=CHOOSE (Cost=5365 Card=192277 Bytes=8460188)
  1 0 HASH JOIN (Cost=5365 Card=192277 Bytes=8460188)
  2 1 VIEW OF 'VW_NSO_1' (Cost=14 Card=4072 Bytes=52936)
  3 2 SORT (UNIQUE) (Cost=14 Card=4072)
  4 3 COLLECTION ITERATOR (PICKLER FETCH)
  5 1 INDEX (FAST FULL SCAN) OF 'I_NEWS_001' (NON-UNIQUE) (Cost=841 Card=306123 Bytes=9489813)

My initial thoughts were that a hash join was probably not the most efficient mechanism. I don’t want to digress into a discussion on joining mechanisms partly because my knowledge is less than comprehensive and also because there are far, far better sources out there. I refer the reader to the Oracle documentation – 8i, 9iR2, and 10gR2 – and, of course, to Jonathan Lewis’s book on the CBO.
I also suspected that the problem was probably related to the number of rows that the CBO thought were in the dynamic IN list collection.The reason is right there in the autotrace, but a quick trace of the CBO decision making using event 10053 confirmed the following:

Table stats Table: KOKBF$ Alias: KOKBF$
TOTAL :: (NOT ANALYZED) CDN: 4072 NBLKS: 100 TABLE_SCAN_CST: 2 AVG_ROW_LEN: 100
Column: KOKBC$ Col#: 1 Table: KOKBF$ Alias: KOKBF$
NO STATISTICS (using defaults)
NDV: 127 NULLS: 0 DENS: 7.8585e-03
***************************************
SINGLE TABLE ACCESS PATH
TABLE: KOKBF$ ORIG CDN: 4072 CMPTD CDN: 4072
Access path: tsc Resc: 2 Resp: 2
BEST_CST: 2.00 PATH: 2 Degree: 1

Basically, the CBO is using a default of 4072 rows in the collection generated from the IN list. I have seen a similar problem with GLOBAL TEMPORARY TABLES and which was helped by using a DYNAMIC_SAMPLING hint. Unfortunately, I believe that the latter was introduced in 9i and certainly any attempts to use it in the above problem on my version were less than fruitful.  Same with the CARDINALITY hint.

Anyway, I took the easy route to the solution. I quickly created a normal heap table with 2 rows in it to see how the CBO would react in those circumstances:


SQL> create table domtest
2 (col1 number);

Table created.


SQL>
SQL> insert into domtest values (42299);

1 row created.


SQL>
SQL> insert into domtest values (42295);

1 row created.


SQL>
SQL> begin
2 dbms_stats.gather_table_stats(USER,'DOMTEST');
3 end;
4 /

PL/SQL procedure successfully completed.


SQL>
SQL> set autotrace trace explain
SQL>
SQL> select n.slug, n.id
2 from news n
3 where n.status=15
4 and n.sectionid in
5 (
6 select * from domtest
7 );

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=11 Card=94 Bytes=4136)
1 0 NESTED LOOPS (Cost=11 Card=94 Bytes=4136)
2 1 VIEW OF 'VW_NSO_1' (Cost=3 Card=2 Bytes=26)
3 2 SORT (UNIQUE) (Cost=3 Card=2 Bytes=10)
4 3 TABLE ACCESS (FULL) OF 'DOMTEST' (Cost=1 Card=2 Bytes=10)
5 1 INDEX (RANGE SCAN) OF 'I_NEWS_001' (NON-UNIQUE) (Cost=4 Card=306123 Bytes=9489813)

Under these circumstances, a nested loop seemed more in line with what I was expecting previously.So, I decided to try a nested loop hint, the syntax being


/*+ ordered use_nl (--table / alias--) */

and which says that the table specified should be used as the inner table on the nested loop – so, this should drive off the entries in the IN list array.


SQL> select /*+ ordered use_nl (n) */
2 n.slug, n.id
3 from news n
4 where n.status=15
5 and n.sectionid in
6 (
7 select * from table ( cast ( f_numinlist('42299, 42295') as tt_number))
8 )
9 /


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=16302 Card=192277 Bytes=8460188)
1 0 NESTED LOOPS (Cost=16302 Card=192277 Bytes=8460188)
2 1 VIEW OF 'VW_NSO_1' (Cost=14 Card=4072 Bytes=52936)
3 2 SORT (UNIQUE) (Cost=14 Card=4072)
4 3 COLLECTION ITERATOR (PICKLER FETCH)
5 1 INDEX (RANGE SCAN) OF 'I_NEWS_001' (NON-UNIQUE) (Cost=4 Card=306123 Bytes=9489813)

The numbers are still a little off because the CBO is still using a default on 4072 rows but it runs like a dream.

Autotrace – no cost, part II

I wrote just a bit earlier about how a lack of statistics on some objects in one system was causing the CBO is use some defaults and generate a sub-optimal plan, especially compared to a similar system which does have accurate statistics for all objects.

I tried adding various hints to the SQL in the “bad” system to get to the same plan as the “good” system but, whilst nearly there, I wasn’t quite nailing it.

As I’ve touched on in previous plans, I’ve been looking at stored outlines as part of the strategy to upgrade to 10g. So, I captured the stored outline for the statement in the “good” system using:

create or replace outline myoutline
for category mycategory
on select.....

I was not interested in exporting and enabling this in the “bad” system. But by looking at the outline views in the data dictionary, I was able to more easily figure out which hints to put where in what order.

Nice.

Event 10053 and flushing

At various points today, I have convinced myself that various daft things were happening with Oracle, my hints and/or SQL*Plus when the real cause was just me being really daft.

I’ve been looking at a couple of problem queries today and using extended tracing of the CBO using event 10053.

However, every now and then my tracing would cease to provide anything meaningful.

The reason was that the CBO wasn’t doing it’s calculations because I was repeatedly running the same queries and much of the time the execution plans were in the shared pool.

So I should have been flushing the shared pool, changing the string literals in the SQL, or adding a space here or there to force a hard parse.

Doh!

Autotrace – no cost

This was a bit of a throwback which threw me for a few moments, so long has it been since I came across it.

I was investigating the reported slow performance of a query on an 8.1.7.4 database.

So, I thought I would do a quick and easy autotrace using “set autotrace trace explain”.

Not unsurprisingly, a comparison with another system showed different explain plans, with the slow system having horrendous numbers in the cost, cardinality and bytes details.

As a next step, I thought I would break down the query into simpler parts to try to investigate the differences.

The query itself is not complex but involves two subqueries:

select c.newsid, c.RELNEWSID, c.ITEMORDER
from relatednewsnews c
where c.NEWSID in
 (select b.newsid
  from newsindex b
  where b.sectionid in
  (select a.id
  from descriptions a
  start with a.id in 109193
  connect by childof = prior a.id));

I thought I would strip off the outer layers and compare the inners.
As soon as I stripped off the outer layer and tried:

select b.newsid
  from newsindex b
  where b.sectionid in
  (select a.id
  from descriptions a
  start with a.id in 109193
  connect by childof = prior a.id)

I noticed something odd but the penny did not drop instantly.

Still doing “set autotrace trace explain”, the cost, cardinality and bytes details had disappeared.

After a tumbleweed moment, it dawned on me that this query was using the RBO, rule-based optimizer.

So, this inferred that the tables in the subqueries did not have statistics whilst the outer table did. As explained by Wolfgang Breitling, if one table has statistics then the CBO will be used and I imagine that the CBO will have stuck some default figures into the calculations for the tables with statistics.

At this point, I should trace the 10053 event to see what numbers it is defaulting to, but annoyingly I don’t actually have access to the box and the trace files which would be produced.

I’ve already highlighted accurate statistics on the roadmap to arrive at a healthy system (in big bold letters).

In the meantime, the way forward is probably a set of hints to guide the CBO towards the plan that works well on the other system.

Follow

Get every new post delivered to your Inbox.

Join 62 other followers