Cardinality Feedback

True to form, Kerry Osborne has another excellent post, this time on Cardinality Feedback.

If the mark of a good post is that it sets you off thinking and investigating then he’s nailed it again.

I started to comment on his post, but it was getting long and messy and I thought it easy to make a comment with a post of my own. Apart from anything else, I always seem to cock up the formatting when making longer comments with examples, etc.

So, hopefully, you’ve read Kerry’s article and he asked where Oracle stored the opt_estimate fudge factor feedback that he could see in his trace and that came through when using a baseline created from the plan changed by cardinality feedback.

My initial reaction was that I thought they would be independent and so began this little investigation (which isn’t finished yet).

First tangent of the day … on the subject of OPT_ESTIMATE – it was my belief that this was one of the principal mechanisms used by sql plan baselines. I know it was very popular with standalone sql profiles. I’ve got a controlled number of baselines in my environment – a couple of hundred – none of them use OPT_ESTIMATE in SQLOBJ$DATA.COMP_DATA. Surprising.

Anyway, I thought in terms of having a standalone demo on Cardinality Feedback, why reinvent the wheel?

On his blog article on the subject, Tom Kyte has a standalone demo. Perfect I thought, I’ll reuse it.

And it goes a little something like this:

1. Create table and package:

create or replace type str2tblType as table of varchar2(30);
/
create or replace function str2tbl( 
  p_str in varchar2, 
  p_delim in varchar2 default ',' )
return str2tblType
PIPELINED
as
  l_str      long default p_str || p_delim;
  l_n        number;
begin
   loop
       l_n := instr( l_str, p_delim );
       exit when (nvl(l_n,0) = 0);
       pipe row ( ltrim(rtrim(substr(l_str,1,l_n-1))) );
       l_str := substr( l_str, l_n+1 );
   end loop;
   return;
end;
/

2. Set up variable in sql*plus and execute sql statement once, get plan

var in_list varchar2(255)
exec :in_list := 'DBMS_PIPE,DBMS_OUTPUT,UTL_FILE';

select * from TABLE(cast( str2tbl(:in_list ) as str2tblType) ) t;

select * from table(dbms_xplan.display_cursor);

Which produced the following plan:

SQL> select * from TABLE(cast( str2tbl(:in_list ) as str2tblType) ) t;

COLUMN_VALUE
------------------------------
DBMS_PIPE
DBMS_OUTPUT
UTL_FILE

3 rows selected.

SQL> 
SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID  d7bc3g2ccymkt, child number 0
-------------------------------------
select * from TABLE(cast( str2tbl(:in_list ) as str2tblType) ) t

Plan hash value: 2407808827

---------------------------------------------------------------------------------------------
| Id  | Operation                         | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |         |       |       |    34 (100)|          |
|   1 |  COLLECTION ITERATOR PICKLER FETCH| STR2TBL |  8168 | 16336 |    34   (3)| 00:00:01 |
---------------------------------------------------------------------------------------------


13 rows selected.

SQL> 

Hopefully, no surprises there.

According to the original demo, if I rinse and repeat and should get some feedback kicking right in there.

SQL> select * from TABLE(cast( str2tbl(:in_list ) as str2tblType) ) t;

COLUMN_VALUE
------------------------------
DBMS_PIPE
DBMS_OUTPUT
UTL_FILE

3 rows selected.

SQL> 
SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------
SQL_ID  d7bc3g2ccymkt, child number 0
-------------------------------------
select * from TABLE(cast( str2tbl(:in_list ) as str2tblType) ) t

Plan hash value: 2407808827

---------------------------------------------------------------------------------------------
| Id  | Operation                         | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |         |       |       |    34 (100)|          |
|   1 |  COLLECTION ITERATOR PICKLER FETCH| STR2TBL |  8168 | 16336 |    34   (3)| 00:00:01 |
---------------------------------------------------------------------------------------------


13 rows selected.

SQL> 

Oh!

Nope.

And doesn’t matter how often I repeated, same difference.

So, this probably the right time to mention version, right?

SQL> select * from v$version;

BANNER
------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

At this point, something I’d read by the Optimzer Development Group rang a bell:

For similar reasons, execution statistics for queries containing bind variables can be problematic. In cardinality feedback, we limit the feedback to portions of the plan whose estimates are not affected by bind variables. For instance, if a table has a filter comparing a column to a bind value, the cardinality of that table will not be used. However, cardinality feedback can still be used for cardinalities elsewhere in the plan.

Sounds relevant, doesn’t it?

Let’s substitute the bind with a literal.

So, we’re going to run this:

select * from TABLE(cast( str2tbl('DBMS_PIPE,DBMS_OUTPUT,UTL_FILE' ) as str2tblType) ) t;

select * from table(dbms_xplan.display_cursor);

which gave me:

SQL> select * from TABLE(cast( str2tbl('DBMS_PIPE,DBMS_OUTPUT,UTL_FILE' ) as str2tblType) ) t;

COLUMN_VALUE
------------------------------
DBMS_PIPE
DBMS_OUTPUT
UTL_FILE

3 rows selected.

SQL> 
SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------
SQL_ID  8udfh089hyh6d, child number 0
-------------------------------------
select * from TABLE(cast( str2tbl('DBMS_PIPE,DBMS_OUTPUT,UTL_FILE' ) as
str2tblType) ) t

Plan hash value: 2407808827

---------------------------------------------------------------------------------------------
| Id  | Operation                         | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |         |       |       |    34 (100)|          |
|   1 |  COLLECTION ITERATOR PICKLER FETCH| STR2TBL |  8168 | 16336 |    34   (3)| 00:00:01 |
---------------------------------------------------------------------------------------------


14 rows selected.

And again

SQL> select * from TABLE(cast( str2tbl('DBMS_PIPE,DBMS_OUTPUT,UTL_FILE' ) as str2tblType) ) t;

COLUMN_VALUE
------------------------------
DBMS_PIPE
DBMS_OUTPUT
UTL_FILE

3 rows selected.

SQL> 
SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------
SQL_ID  8udfh089hyh6d, child number 1
-------------------------------------
select * from TABLE(cast( str2tbl('DBMS_PIPE,DBMS_OUTPUT,UTL_FILE' ) as
str2tblType) ) t

Plan hash value: 2407808827

---------------------------------------------------------------------------------------------
| Id  | Operation                         | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |         |       |       |    33 (100)|          |
|   1 |  COLLECTION ITERATOR PICKLER FETCH| STR2TBL |     6 |    12 |    33   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Note
-----
   - cardinality feedback used for this statement


18 rows selected.

SQL> 

So, cardinality feedback successfully demoed with cardinality estimates reduced from the blocksize-related default for a collection to something altogether more accurate.

Now we need to move on to a case where these improved cardinality estimates actually affect the plan.

Fortunately, Tom’s demo does just that with a new table:

create table data  as
select * from all_objects;

create index data_idx on data(object_name);

exec dbms_stats.gather_table_stats( user, 'DATA' );

And a new statement:

with T as
( select /*+ find_me */ 
         distinct * 
  from   TABLE(cast( str2tbl( 'DBMS_PIPE,DBMS_OUTPUT,UTL_FILE' ) as str2tblType) ) t 
  where rownum > 0 )
select * from data, t 
where data.object_name = t.column_value;

select * from table(dbms_xplan.display_cursor(format=>'-bytes'));

which gives the following (actual sql output snipped):

SQL> with T as
  2  ( select /*+ find_me */
  3           distinct *
  4    from   TABLE(cast( str2tbl( 'DBMS_PIPE,DBMS_OUTPUT,UTL_FILE' ) as str2tblType) ) t
  5    where rownum > 0 )
  6  select * from data, t
  7  where data.object_name = t.column_value;

.......

5 rows selected.

SQL> 
SQL> select * from table(dbms_xplan.display_cursor(format=>'-bytes'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID  c4m3t9jwvs3ht, child number 0
-------------------------------------
with T as ( select /*+ find_me */          distinct *   from
TABLE(cast( str2tbl( 'DBMS_PIPE,DBMS_OUTPUT,UTL_FILE' ) as str2tblType)
) t   where rownum > 0 ) select * from data, t where data.object_name =
t.column_value

Plan hash value: 892089582

------------------------------------------------------------------------------------------
| Id  | Operation                              | Name    | Rows  | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |         |       |   318 (100)|          |
|*  1 |  HASH JOIN                             |         | 15014 |   318   (5)| 00:00:02 |
|   2 |   VIEW                                 |         |  8168 |    36   (9)| 00:00:01 |
|   3 |    HASH UNIQUE                         |         |  8168 |    36   (9)| 00:00:01 |
|   4 |     COUNT                              |         |       |            |          |
|*  5 |      FILTER                            |         |       |            |          |
|   6 |       COLLECTION ITERATOR PICKLER FETCH| STR2TBL |  8168 |    34   (3)| 00:00:01 |
|   7 |   TABLE ACCESS FULL                    | DATA    | 58781 |   279   (3)| 00:00:02 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("DATA"."OBJECT_NAME"="T"."COLUMN_VALUE")
   5 - filter(ROWNUM>0)


28 rows selected.

SQL> 

As expected, no cardinality feedback yet, collection estimate back to default.

And I’m just going to monitor what’s in V$SQL:

SQL> select sql_id, child_number, plan_hash_value, executions, is_shareable, sql_plan_baseline 
  2  from v$sql 
  3  where sql_id = 'c4m3t9jwvs3ht';

SQL_ID        CHILD_NUMBER PLAN_HASH_VALUE EXECUTIONS I SQL_PLAN_BASELINE
------------- ------------ --------------- ---------- - ------------------------------
c4m3t9jwvs3ht            0       892089582          1 Y

1 row selected.

SQL> 

Now re-run:

SQL> with T as
  2  ( select /*+ find_me */ 
  3           distinct * 
  4    from   TABLE(cast( str2tbl( 'DBMS_PIPE,DBMS_OUTPUT,UTL_FILE' ) as str2tblType) ) t 
  5    where rownum > 0 )
  6  select * from data, t 
  7  where data.object_name = t.column_value;

...............

5 rows selected.

SQL> 
SQL> select * from table(dbms_xplan.display_cursor(format=>'-bytes'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID  c4m3t9jwvs3ht, child number 1
-------------------------------------
with T as ( select /*+ find_me */          distinct *   from
TABLE(cast( str2tbl( 'DBMS_PIPE,DBMS_OUTPUT,UTL_FILE' ) as str2tblType)
) t   where rownum > 0 ) select * from data, t where data.object_name =
t.column_value

Plan hash value: 3947981921

--------------------------------------------------------------------------------------------
| Id  | Operation                               | Name     | Rows  | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |          |       |    43 (100)|          |
|   1 |  NESTED LOOPS                           |          |       |            |          |
|   2 |   NESTED LOOPS                          |          |     6 |    43   (3)| 00:00:01 |
|   3 |    VIEW                                 |          |     3 |    34   (3)| 00:00:01 |
|   4 |     HASH UNIQUE                         |          |     3 |    34   (3)| 00:00:01 |
|   5 |      COUNT                              |          |       |            |          |
|*  6 |       FILTER                            |          |       |            |          |
|   7 |        COLLECTION ITERATOR PICKLER FETCH| STR2TBL  |     6 |    33   (0)| 00:00:01 |
|*  8 |    INDEX RANGE SCAN                     | DATA_IDX |     2 |     2   (0)| 00:00:01 |
|   9 |   TABLE ACCESS BY INDEX ROWID           | DATA     |     2 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - filter(ROWNUM>0)
   8 - access("DATA"."OBJECT_NAME"="T"."COLUMN_VALUE")

Note
-----
   - cardinality feedback used for this statement


34 rows selected.

SQL> 

And in V$SQL:

SQL> select sql_id, child_number, plan_hash_value, executions, is_shareable, sql_plan_baseline 
  2  from v$sql 
  3  where sql_id = 'c4m3t9jwvs3ht';

SQL_ID        CHILD_NUMBER PLAN_HASH_VALUE EXECUTIONS I SQL_PLAN_BASELINE
------------- ------------ --------------- ---------- - ------------------------------
c4m3t9jwvs3ht            0       892089582          1 N
c4m3t9jwvs3ht            1      3947981921          1 Y

2 rows selected.

SQL> 

So, cardinality feedback has come to the rescue, adjusted our estimates and this time we’ve got a better plan for the numbers concerned with NESTED LOOPS rather than a HASH JOIN.

What happens if, as Kerry did, I create a baseline for this improved statement using his create_baseline.sql script?

SQL> @create_baseline
Enter value for sql_id: c4m3t9jwvs3ht
Enter value for plan_hash_value: 3947981921
Enter value for fixed (NO): 
Enter value for enabled (YES): 
Enter value for plan_name (ID_sqlid_planhashvalue): my_cf_baseline
old  16: l_sql_id := '&&sql_id';
new  16: l_sql_id := 'c4m3t9jwvs3ht';
old  17: l_plan_hash_value := to_number('&&plan_hash_value');
new  17: l_plan_hash_value := to_number('3947981921');
old  18: l_fixed := '&&fixed';
new  18: l_fixed := 'NO';
old  19: l_enabled := '&&enabled';
new  19: l_enabled := 'YES';
old  40:     decode('&&plan_name','X0X0X0X0','SQLID_'||'&&sql_id'||'_'||'&&plan_hash_value','&&plan_
new  40:     decode('my_cf_baseline','X0X0X0X0','SQLID_'||'c4m3t9jwvs3ht'||'_'||'3947981921','my_cf_
Baseline MY_CF_BASELINE created.
SQL> 

And now I re-run the statement:

SQL> with T as
  2  ( select /*+ find_me */ 
  3           distinct * 
  4    from   TABLE(cast( str2tbl( 'DBMS_PIPE,DBMS_OUTPUT,UTL_FILE' ) as str2tblType) ) t 
  5    where rownum > 0 )
  6  select * from data, t 
  7  where data.object_name = t.column_value;

....

5 rows selected.

SQL> 
SQL> select * from table(dbms_xplan.display_cursor(format=>'-bytes'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID  c4m3t9jwvs3ht, child number 2

An uncaught error happened in prepare_sql_statement : ORA-01403: no data found

NOTE: cannot fetch plan for SQL_ID: c4m3t9jwvs3ht, CHILD_NUMBER: 2
      Please verify value of SQL_ID and CHILD_NUMBER;
      It could also be that the plan is no longer in cursor cache (check v$sql_plan)


8 rows selected.

SQL> 

Oh dear. Can we just ignore that and come back to it at a later date? Let’s stick a pin in it, as they say.

I’ll just try again:

SQL> with T as
  2  ( select /*+ find_me */ 
  3           distinct * 
  4    from   TABLE(cast( str2tbl( 'DBMS_PIPE,DBMS_OUTPUT,UTL_FILE' ) as str2tblType) ) t 
  5    where rownum > 0 )
  6  select * from data, t 
  7  where data.object_name = t.column_value;

........

5 rows selected.

SQL> 
SQL> select * from table(dbms_xplan.display_cursor(format=>'-bytes'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
SQL_ID  c4m3t9jwvs3ht, child number 2
-------------------------------------
with T as ( select /*+ find_me */          distinct *   from
TABLE(cast( str2tbl( 'DBMS_PIPE,DBMS_OUTPUT,UTL_FILE' ) as str2tblType)
) t   where rownum > 0 ) select * from data, t where data.object_name =
t.column_value

Plan hash value: 3947981921

--------------------------------------------------------------------------------------------
| Id  | Operation                               | Name     | Rows  | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |          |       | 24591 (100)|          |
|   1 |  NESTED LOOPS                           |          |       |            |          |
|   2 |   NESTED LOOPS                          |          | 15014 | 24591   (1)| 00:02:03 |
|   3 |    VIEW                                 |          |  8168 |    36   (9)| 00:00:01 |
|   4 |     HASH UNIQUE                         |          |  8168 |    36   (9)| 00:00:01 |
|   5 |      COUNT                              |          |       |            |          |
|*  6 |       FILTER                            |          |       |            |          |
|   7 |        COLLECTION ITERATOR PICKLER FETCH| STR2TBL  |  8168 |    34   (3)| 00:00:01 |
|*  8 |    INDEX RANGE SCAN                     | DATA_IDX |     2 |     2   (0)| 00:00:01 |
|   9 |   TABLE ACCESS BY INDEX ROWID           | DATA     |     2 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - filter(ROWNUM>0)
   8 - access("DATA"."OBJECT_NAME"="T"."COLUMN_VALUE")

Note
-----
   - SQL plan baseline SQL_PLAN_6w2j9bx7xvu6h5268a54a used for this statement


34 rows selected.

SQL> 

And what do we have in V$SQL?

SQL> select sql_id, child_number, plan_hash_value, executions, is_shareable, sql_plan_baseline 
  2  from v$sql 
  3  where sql_id = 'c4m3t9jwvs3ht';

SQL_ID        CHILD_NUMBER PLAN_HASH_VALUE EXECUTIONS I SQL_PLAN_BASELINE
------------- ------------ --------------- ---------- - ------------------------------
c4m3t9jwvs3ht            0       892089582          1 N
c4m3t9jwvs3ht            2      3947981921          1 Y SQL_PLAN_6w2j9bx7xvu6h5268a54a

2 rows selected.

SQL> 

So we can see that the baseline worked.

We can also see that the baseline, in this example, operated independently of the cardinality feedback so whilst we got our nice plan from the more accurate cardinality, we got the estimates from the default estimate of the collection.

Which is not exactly what I expected.

From Kerry’s post, he managed to get a statement using both a baseline and cardinality feedback as illustrated by his dbms_xplan output of:

Note
-----
   - SQL plan baseline SQLID_0CM4R08VJ075R_1255158658 used for this statement
   - cardinality feedback used for this statement

This is what I expected to get and then I was going to flush the shared pool in the hope of having a baseline and the original non-feedback estimates.

But no matter how many times I re-ran the statement I couldn’t get the cardinality feedback to kick back in.

So, unfortunately, I got to that end result without the all-important bit in the middle. Which is a bit weird. And warrants further investigation at some point.

I thought maybe the MONITOR hint might be the key difference to getting feedback working with baselines but not according to what I’ve tried so far. I couldn’t get the two features to work together.

But the bottom line is that I think this shows that the baseline did not use anything to maintain that those nice accurate estimates from cardinality feedback, only the end result plan.

And that’s backed up by the set of hints in the baseline which doesn’t include any OPT_ESTIMATE.

Suffice to say that this is really fledgling functionality and, as such, you expect subtle and no-so-subtle tweaks along the way as illustrated by the possible change of behaviour with bind variables when comparing my results above with the ones in Tom’s original article.

Advertisements

JF – Join Factorization

Just a quickie. No nice isolated demo etc.

I was just looking at rewriting a query that someone asked me to review.

I was playing with what should be the initial driving logic of a bigger query, changing it from a DECODE to a UNION ALL (or an OR) to make use of a FILTER condition.

i.e. from something like:

SELECT t.*
FROM   yyyyy c
,      xxxxx t
WHERE  c.trga_code   = 'LON'
AND    t.cmpy_num    = c.cmpy_num
AND    DECODE(:b1,'S',t.value_date,t.tran_date) >= TO_DATE(:b2,'YYYYMMDD')
AND    DECODE(:b1,'S',t.value_date,t.tran_date) <= TO_DATE(:b4,'YYYYMMDD');

to something not dissimilar to:

SELECT t.*
FROM   yyyyy c
,      xxxxx t
WHERE  c.trga_code   = 'LON'
AND    t.cmpy_num    = c.cmpy_num
AND    :b1           = 'S'
AND    t.value_date >= TO_DATE(:b2,'YYYYMMDD')
AND    t.value_date <= TO_DATE(:b4,'YYYYMMDD')
UNION ALL
SELECT t.*
FROM   yyyyy c
,      xxxxx t 
WHERE  c.trga_code  = 'LON'
AND    t.cmpy_num   = c.cmpy_num
AND    :b1         != 'S'
AND    t.tran_date >= TO_DATE(:b2,'YYYYMMDD')
AND    t.tran_date <= TO_DATE(:b4,'YYYYMMDD');

And I got an unexpected execution plan:

---------------------------------------------------------------------------------------------
| Id  | Operation                              | Name               | Rows  | Pstart| Pstop |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |                    | 13271 |       |       |
|*  1 |  HASH JOIN                             |                    | 13271 |       |       |
|*  2 |   INDEX RANGE SCAN                     | YYYYY_3_IDX        |   268 |       |       |
|   3 |   VIEW                                 | VW_JF_SET$49BA79CF | 13304 |       |       |
|   4 |    UNION-ALL                           |                    |       |       |       |
|*  5 |     FILTER                             |                    |       |       |       |
|   6 |      PARTITION LIST ALL                |                    |  6652 |     1 |    11 |
|*  7 |       TABLE ACCESS BY LOCAL INDEX ROWID| XXXXX              |  6652 |     1 |    11 |
|*  8 |        INDEX RANGE SCAN                | XXXXX_16_IDX       |   112K|     1 |    11 |
|*  9 |     FILTER                             |                    |       |       |       |
|  10 |      PARTITION LIST ALL                |                    |  6652 |     1 |    11 |
|* 11 |       TABLE ACCESS BY LOCAL INDEX ROWID| XXXXX              |  6652 |     1 |    11 |
|* 12 |        INDEX RANGE SCAN                | XXXXX_4_IDX        | 67411 |     1 |    11 |
---------------------------------------------------------------------------------------------

That’s not quite what I was expecting – weird.

I was sitting here thinking “what’s that? … vw_jf? vw_jf?… hang on … vw_jf … jf … jf rings a bell… join factorization”.
Bingo.

See the Oracle Optimizer Blog for more information on Join Factorization and where they summarise the feature as

Join factorization is a cost-based transformation.
It can factorize common computations from branches in a UNION ALL query which can lead to huge performance improvement.

Fair enough. But I don’t want it. Not just yet. Let’s turn it off.
(Although I’ve mentioned before that I don’t tend to like being too prescriptive regarding a “correct plan”, in this case, for my purposes, I don’t want it doing that. I might let the CBO reconsider it later once I piece everything back together but for now…)

V$SQL_HINT mentions a NO_FACTORIZE_JOIN hint.

Unfortunately (or perhaps fortunately given the vast array of undocumented hints exposed here), it doesn’t tell us how to use it and I fiddled around for a bit but couldn’t get it to work.

So, the proper approach is to use ALTER SESSION or OPT_PARAM to change the setting of _optimizer_join_factorization (don’t go changing the setting of hidden parameters without the approval of Oracle Support, etc, etc, etc).

Which gave me what I wanted (for now).

--------------------------------------------------------------------------------------------- 
| Id  | Operation                             | Name                | Rows  | Pstart| Pstop | 
--------------------------------------------------------------------------------------------- 
|   0 | SELECT STATEMENT                      |                     | 13272 |       |       | 
|   1 |  UNION-ALL                            |                     |       |       |       | 
|*  2 |   FILTER                              |                     |       |       |       | 
|*  3 |    HASH JOIN                          |                     |  6636 |       |       | 
|*  4 |     INDEX RANGE SCAN                  | YYYYY_3_IDX         |   268 |       |       | 
|   5 |     PARTITION LIST SUBQUERY           |                     |  6652 |KEY(SQ)|KEY(SQ)| 
|*  6 |      TABLE ACCESS BY LOCAL INDEX ROWID| XXXXX               |  6652 |KEY(SQ)|KEY(SQ)| 
|*  7 |       INDEX RANGE SCAN                | XXXXX_16_IDX        |   112K|KEY(SQ)|KEY(SQ)| 
|*  8 |   FILTER                              |                     |       |       |       | 
|*  9 |    HASH JOIN                          |                     |  6636 |       |       | 
|* 10 |     INDEX RANGE SCAN                  | YYYYY_3_IDX         |   268 |       |       | 
|  11 |     PARTITION LIST SUBQUERY           |                     |  6652 |KEY(SQ)|KEY(SQ)| 
|* 12 |      TABLE ACCESS BY LOCAL INDEX ROWID| XXXXX               |  6652 |KEY(SQ)|KEY(SQ)| 
|* 13 |       INDEX RANGE SCAN                | XXXXX_4_IDX         | 67411 |KEY(SQ)|KEY(SQ)| 
--------------------------------------------------------------------------------------------- 

P.S. This is not meant to be an investigation into the whys and wherefores of why JF was picked and whether it was a good choice, nor any reflection of whether JF is a good thing and whether there are any issues with it.

Further sources on Join Factorization:
Oracle Optimizer Blog – Join Factorization
Jože Senegačnik – Query Transformations
Timur Akhmadeev – Join Factorization

Further information on those Internal View names:
Jonathan Lewis – Internal Views

Index Skip Scan skipping specified column

Everything I can remember reading about the Index Skip Scan access path has always talked about it being used in queries where the leading column in the index is not specified in a query AND where that leading column has a relatively low number of distinct values in that leading column.

For example:

SQL> drop table t1;

Table dropped.

SQL> 
SQL> create table t1
  2  as
  3  select case when mod(rownum,2) = 0
  4              then 'Y'
  5              else 'N'
  6         end flagyn
  7  ,      rownum id
  8  ,      rpad(rownum,200,'0') col2
  9  from   dual 
 10  connect by rownum <=100000;

Table created.

SQL> 
SQL> create index i1 on t1 (flagyn, id);

Index created.

SQL> 
SQL> exec dbms_stats.gather_table_stats(USER,'T1');

PL/SQL procedure successfully completed.

SQL> explain plan for
  2  select *
  3  from   t1
  4  where  id   = 3222;

Explained.

SQL>         
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------

-------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |     1 |   208 |     4   (0)|
|   1 |  TABLE ACCESS BY INDEX ROWID| T1   |     1 |   208 |     4   (0)|
|*  2 |   INDEX SKIP SCAN           | I1   |     1 |       |     3   (0)|
-------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------

   2 - access("ID"=3222)
       filter("ID"=3222)


18 rows selected.

SQL> 

So, I was slightly surprised to the see this access path being used on a query where the leading column WAS specified. I can’t remember having noticed this before in similar circumstances.

Here is the output from the actual situation (test case not supplied).

SQL> explain plan for
  2  SELECT  *
  3  FROM   daily_accr_pos accr
  4  WHERE  accr.inst_num       = 243961
  5  AND    accr.accr_date      >= to_date(20110221120000,'YYYYMMDDHHMISS')
  6  AND    accr.accr_date      <= to_date(20110303120000,'YYYYMMDDHHMISS');

Explained.

SQL>         
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name                 | Rows  | Bytes | Cost (%CPU)|
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                      |     1 |    86 |    11  (19)|
|   1 |  TABLE ACCESS BY INDEX ROWID| DAILY_ACCR_POS       |     1 |    86 |    11  (19)|
|*  2 |   INDEX SKIP SCAN           | DAILY_ACCR_POS_8_IDX |     1 |       |    10  (20)|
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------

   2 - access("ACCR"."ACCR_DATE">=TO_DATE(' 2011-02-21 12:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "ACCR"."INST_NUM"=243961 AND "ACCR"."ACCR_DATE"<=TO_DATE('
              2011-03-03 12:00:00', 'syyyy-mm-dd hh24:mi:ss'))
       filter("ACCR"."INST_NUM"=243961)

20 rows selected.

Note that this index is defined on (ACCR_DATE, INST_NUM), both of which are supplied literals in the query above and also noted as ACCESS PREDICATES in the predicate section.

Some extra metadata about the index:

SQL> select index_name iname
  2  ,      uniqueness uq
  3  ,      blevel
  4  ,      leaf_blocks num_lf
  5  ,      distinct_keys dst_ky
  6  ,      avg_leaf_blocks_per_key avg_lf
  7  ,      avg_data_blocks_per_key avg_dt 
  8  ,      clustering_factor cf
  9  ,      num_rows  
 10  from   dba_indexes 
 11  where  index_name = 'DAILY_ACCR_POS_8_IDX';

INAME                UQ            BLEVEL     NUM_LF     DST_KY     AVG_LF     AVG_DT         CF
-------------------- --------- ---------- ---------- ---------- ---------- ---------- ----------
  NUM_ROWS
----------
DAILY_ACCR_POS_8_IDX NONUNIQUE          3     167844   30386593          1          1   44902638
  50323152


SQL> 
SQL> select column_name
  2  ,      column_position
  3  from   dba_ind_columns 
  4  where  index_name = 'DAILY_ACCR_POS_8_IDX';

COLUMN_NAME     COLUMN_POSITION
--------------- ---------------
ACCR_DATE                     1
INST_NUM                      2

SQL> 
SQL> select column_name
  2  ,      data_type
  3  ,      num_distinct num_dis
  4  ,      low_value, display_raw(low_value, data_type) lo_val
  5  ,      high_value, display_raw(high_value, data_type) hi_val
  6  ,      density
  7  ,      num_nulls
  8  from   dba_tab_columns 
  9  where  table_name = 'DAILY_ACCR_POS' 
 10  and    column_name in ('ACCR_DATE','INST_NUM');

COLUMN_NAME     DATA_TYP    NUM_DIS LOW_VALUE       LO_VAL          HIGH_VALUE      HI_VAL
--------------- -------- ---------- --------------- --------------- --------------- ---------------
   DENSITY  NUM_NULLS
---------- ----------
INST_NUM        NUMBER        91824 C112            17              C3646453        999982
 .00001089          0

ACCR_DATE       DATE           5693 77C30B07010101  07-NOV-1995     78950C1F010101  31-DEC-2049
.000175654          0


SQL> 

SQL_PLAN – other_xml

Just a quick follow-up to my previous post on investigating bad plans using AWR & ASH.

In that article, I mentioned

  1. that it was a shame that the ACCESS_PREDICATES and FILTER_PREDICATES columns were not populated (how this cannot be classified as a bug is beyond comprehension) and
  2. that I was using DBA_HIST_SQLBIND to investigate various peeked binds for that particular SQL based on the LAST_CAPTURED column

However, I should also have mentioned that, from 10gR2 onwards, populated in both V$SQL_PLAN and DBA_HIST_SQL_PLAN, there is a very useful column called OTHER_XML.

From the documentation:

OTHER_XML: “Provides extra information specific to an execution step of the execution plan. The content of this column is structured using XML because it allows multiple pieces of information to be stored, including the following:

  • Name of the schema against which the query was parsed.
  • Release number of the Oracle Database that produced the explain plan.
  • Hash value associated with the execution plan.
  • Name (if any) of the outline or the SQL profile used to build the execution plan.
  • Indication of whether or not dynamic sampling was used to produce the plan.
  • The outline data, a set of optimizer hints that can be used to regenerate the same plan.

Here is a sample of the peeked bind section from my sql with the problem plan:


  <peeked_binds>
    <bind nam=”:B3″ pos=”3″ dty=”2″ pre=”0″ scl=”0″ mxl=”22″>c3023102</bind>
    <bind nam=”:B2″ pos=”4″ dty=”1″ csi=”873″ frm=”1″ mxl=”32″>524d46</bind>
    <bind nam=”:B1″ pos=”5″ dty=”12″ mxl=”7″>786d0a1f010101</bind>
    <bind nam=”:B3″ pos=”13″ ppo=”3″ dty=”2″ pre=”0″ scl=”0″ mxl=”22″/>
    <bind nam=”:B2″ pos=”14″ ppo=”4″ dty=”1″ csi=”873″ frm=”1″ mxl=”32″/>
    <bind nam=”:B1″ pos=”15″ ppo=”5″ dty=”12″ mxl=”7″/>
    <bind nam=”:B3″ pos=”16″ ppo=”3″ dty=”2″ pre=”0″ scl=”0″ mxl=”22″/>
    <bind nam=”:B1″ pos=”19″ ppo=”5″ dty=”12″ mxl=”7″/>
    <bind nam=”:B3″ pos=”20″ ppo=”3″ dty=”2″ pre=”0″ scl=”0″ mxl=”22″/>
    <bind nam=”:B5″ pos=”21″ ppo=”1″ dty=”2″ pre=”0″ scl=”0″ mxl=”22″/>
    <bind nam=”:B3″ pos=”22″ ppo=”3″ dty=”2″ pre=”0″ scl=”0″ mxl=”22″/>
    <bind nam=”:B6″ pos=”23″ dty=”2″ pre=”0″ scl=”0″ mxl=”22″>c10d</bind>
    <bind nam=”:B1″ pos=”24″ ppo=”5″ dty=”12″ mxl=”7″/>
    <bind nam=”:B6″ pos=”26″ ppo=”23″ dty=”2″ pre=”0″ scl=”0″ mxl=”22″/>
    <bind nam=”:B1″ pos=”27″ ppo=”5″ dty=”12″ mxl=”7″/>
    <bind nam=”:B1″ pos=”29″ ppo=”5″ dty=”12″ mxl=”7″/>
    <bind nam=”:B3″ pos=”31″ ppo=”3″ dty=”2″ pre=”0″ scl=”0″ mxl=”22″/>
    <bind nam=”:B6″ pos=”32″ ppo=”23″ dty=”2″ pre=”0″ scl=”0″ mxl=”22″/>
    <bind nam=”:B1″ pos=”33″ ppo=”5″ dty=”12″ mxl=”7″/>
  </peeked_binds>

I’m not sure if the repetition of bind names and positions but not values is normal – one for further investigation.

But bind variable 1 is what I was talking about previously as my date bind with the data skew and we can convert the values back out of their raw value using, for example, the overloaded DBMS_STATS.CONVERT_RAW_VALUE.

So, from above, taking my B1 bind of “786d0a1f010101”:

declare
l_date date;
begin
dbms_stats.convert_raw_value(rawval => '786d0a1f010101', resval => l_date);
dbms_output.put_line('dt: '||to_char(l_date,'DD/MM/YYYY HH24:MI:SS'));
end;
/

dt: 31/10/2009 00:00:00

Some other posts regarding OTHER_XML:
Kerry Osborne using OTHER_XML to create test scripts with bind variables
Randolf Geist using OTHER_XML to get outlines / create profiles

Top 3 Oracle Features of the Decade

Yesterday I made an early break for the annual New Year resolutions post.

However, there’s another end-of-year post that’s up for consideration in this particular year and, rather than looking forward to the coming year, that’s a look back over the past 10, 2000-2010.

So, I ask you what are your top 3 Oracle RDBMS features over the past decade.

Oracle 8i was 1999 so we’re really talking about:

  1. 9iR1 in 2001
  2. 9iR2 in 2002
  3. 10gR1 in 2003
  4. 10gR2 in 2005
  5. 11gR1 in 2007
  6. 11gR2 in 2009

A Top 3 is very, very challenging when you think of some of the enhancements which have come along.

Obviously it depends on your own, personal perspective – naturally, mine are going to have a Development bias.

Each version fixes a significant number of bugs from the previous release, but I seem to remember 8i being a rush job for the buzzwords ahead of the new millenium (the early patches at least), 9iR1 a hatchet job and 9iR2 being a relative bastion of stability.

Then again it’s also a long time ago now and my memories may be unreliable.

The New Features Guides of 9i (links above) lists amongst others:

  • LogMiner improvements, DataGuard, RAC, Flashback Query, Some online redefinition & reorganisation features, VPD enhancements, Automatic Undo Management, dynamic memory management, spfles, RMAN improvements, native XML functionality and XMLDB, the MERGE statement, TIMESTAMPs and INTERVALs, CASE statement, External Tables, associative arrays/index-by tables indexed by VARCHAR2, Streams, CDC, Index Skip Scans, ANSI SQL, OMF, multiple blocksizes, dynamic sampling, table compression, subquery factoring, pipelined table functions, etc.

A list of new features from 10g (or my list at least) is shorter, but even so it seems a bigger hitter in terms of weighty marketing-savvy acronyms and features:

  • ADDM, ASH, ASM, AWR, Automatic SQL Tuning, DataPump, Job Scheduler, SQL Access Advisor, HTMLDB, Online Table Redefinition, Oracle XE, DBMS_XPLAN…

11g has always felt like 10gR3. Nevertheless, there have been still some impressive features therein:

  • Adaptive Cursor Sharing, Result Cache, Database Resident Connection Pool Caching, Invisible Indexes, SecureFiles, Binary XML…

And then recently Exadata and the Oracle Database Machine (although I think it’s too soon to make any judgement on these and if they have a big impact then it will be mostly in the next decade).

For the summaries above, I’ve missed out a lot (probably accidentally – let me know – but I was rushing towards the end).

But I’ve focused on the initial introductions of headline new features.

However, I would argue that the gradual evolution of some of the main features and functionality has had some of the biggest impact as we are now at the end of the decade compared to the end of 1999. In addition, some of the tweaks and internals have also had a big impact whilst avoiding the headlines – mutexes for example.

Think about how the CBO has moved on over the past ten years.

Or how HTMLDB has morphed and moved on to APEX.

Or how parallel capabilities have developed. Or partitoning.

Maybe you like your GUIs like OEM.

Or the evolution of IDEs has been revolutionary for you e.g. SQL Developer, or the explosion of functionality in third party tools like Toad or Quest Code Tester for Oracle.

Or the combination of several features – external tables plus pipelined functions are pretty cool for loading in data, for example. Or partitioning plus parallel.

All of these are up for consideration.

What’s not up for consideration is stuff that predates the decade. There are several features which I thought of while doing this and which turned out to be introduced in 8i or even before. Man, tempus fugit…

So, my Top 3 is based on what I do or use in Oracle most days (or even what I don’t have to do anymore) because of features introduced or evolved over the past decade. The evolution of collection functionality in SQL and PLSQL came close. A bit further behind was some of the XML capabilities. A whole host of ineligible features from older versions were scrubbed out. However, in no particular order, my vote goes to:

  • Analytics – It’s amazing how many time I end up using analytic functions. As Tom Kyte says “Analytics rock, Analytics roll”. Countless multipasses of data have been avoided with these babies.(Analytics ruled ineligible as an 8i introduction, dammit)
  • The WITH clause aka Subquery Factoring – Most of my sql, if it has any sort of complexity, ends up using this feature; fantastic for breaking up and understanding a complex bit of sql that you’ve never seen before; great at encouraging you to think in SETS.
  • AWR / ASH / Wait model evolution – Shame AWR & ASH are licensed separately and there are alternatives for earlier Oracle versions or if you don’t want to fork out the big bucks. But so useful for diagnosing performance problems, particularly retrospectively and/or identifying session problems which might have drowned previously in a system-level Statspack.
  • DBMS_XPLAN – Another feature that I use day in, day out. So easy to get explain plans, actual plans from memory or from AWR. Invaluable!

I would imagine that if you were a production DBA then the evolution of RMAN and Flashback has been fantastically useful and time saving.

These are not things that I’ve tended to use much in my role.

But if something’s revolutionised your decade, let me know.

_replace_virtual_columns = false

A follow-up to my post yesterday.

A quick recap on the situation:

A sql statement such as this:


SELECT col1
FROM my_schema.table1 tab1
WHERE …
AND NVL(col2,’UNASSIGNED’) = SYS_CONTEXT(’MY_CONTEXT’,'MY_COL2_VALUE’)

is raising an error such as this:


ORA-00904: “MY_SCHEMA”.”TAB1″.”SYS_NC00062$”: invalid identifier

where SYS_NC00062$ is not a column directly specified in the sql but a virtual column related to a function-based index.

This matches a bug which is currently under development and not yet published (Bug 7026584).

The explanation is that the 11g optimizer uses a new mechanism to work with function-based indexes at parse time, a new mechanism which as yet not fully robust.


The problem comes when virtual column processing is done from qkacol where it clones the predicates and later tries to resolve the copied columns in the predicate with the supplied fro. The root cause here, the unnesting driver generates an alias for table and adds it for a correlated column. [sic]

In addition to the 10046 and 10053 trace files, a 904 errorstack trace file provided the information required:


alter system set events='904 trace name errorstack';
...
alter system set events='904 trace name errorstack off';

Oracle’s currently (at the time of writing this) recommended solution to that bug to set undocumeted parameter _replace_virtual_columns to false.

I’m still awaiting an explanation on that cost_io virtual column adjustment.