Varying IN lists

In his column in the current edition of Oracle magazine, Tom Kyte wrote about the recurring issue of varying in lists and provided some solutions for this.

This ties in nicely with a recommendation I had already made at my current client where there is a significant usage of IN lists within dynamic SQL issued by applications on top of the 8.1.7.4 DB. So, if we were using bind variables for these IN lists, then if there were 1 -> n arguments in the IN lists, we would probably find n similar queries in the shared pool, each with a different number of arguments in that IN list.

However, these dynamic varying IN lists are not using bind variables.

 As a result, there is a fair amount of SQL ageing out of the shared pool as these sequence-generated ids creep upwards with every day bringing a new range of ids and, not surprisingly, we are doing too much hard parsing.

So, there is a two-pronged recommendation, firstly to change the CURSOR_SHARING to FORCE in order to do some auto-binding for certain users (to be set using an AFTER LOGON trigger) and, secondly, to use a better approach for IN lists, as detailed in the link above.

The advantages should hopefully be a reduction in hard parsing and, within the application, the end of the limit on the number of arguments in the dynamic string-concatenated IN list.

I have been using an approach very similar to the 8i approach that Tom describes in his column.

An example of the transition, using a two-argument IN list, is from:

select n.slug, n.id
from news n
where n.status=15
and n.sectionid in ('42299, 42295')

to

select n.slug, n.id
from news n
where n.status=15
and n.sectionid in
(
  select * from table ( cast ( f_numinlist('42299, 42295') as tt_number))
)

and, in the above example, '42299, 42295' should be bound as a single bind (this is for an application with varying numbers of arguments and, as Tom points out in his article, if there is a finite number of arguments, then the best approach might be to just individually bind each item).

However, on implementing this approach, there was a very definite performance impact.
An explain plan for the original code looked like this:

SQL> select n.slug, n.id
  2 from news n
  3 where n.status=15
  4 and n.sectionid in ('42299, 42295')
  5 /
Execution Plan
----------------------------------------------------------
  0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=48 Bytes=1488)
  1 0 INDEX (RANGE SCAN) OF 'I_NEWS_001' (NON-UNIQUE) (Cost=4 Card=48 Bytes=1488)

An explain plan for the revised statement revealed the following:

SQL> l
  1 select n.slug, n.id
  2 from news n
  3 where n.status=15
  4 and n.sectionid in
  5 (
  6 select * from table ( cast ( f_numinlist('42299, 42295') as tt_number))
  7* )
SQL> /
Execution Plan
----------------------------------------------------------
  0 SELECT STATEMENT Optimizer=CHOOSE (Cost=5365 Card=192277 Bytes=8460188)
  1 0 HASH JOIN (Cost=5365 Card=192277 Bytes=8460188)
  2 1 VIEW OF 'VW_NSO_1' (Cost=14 Card=4072 Bytes=52936)
  3 2 SORT (UNIQUE) (Cost=14 Card=4072)
  4 3 COLLECTION ITERATOR (PICKLER FETCH)
  5 1 INDEX (FAST FULL SCAN) OF 'I_NEWS_001' (NON-UNIQUE) (Cost=841 Card=306123 Bytes=9489813)

My initial thoughts were that a hash join was probably not the most efficient mechanism. I don't want to digress into a discussion on joining mechanisms partly because my knowledge is less than comprehensive and also because there are far, far better sources out there. I refer the reader to the Oracle documentation - 8i, 9iR2, and 10gR2 - and, of course, to Jonathan Lewis's book on the CBO.
I also suspected that the problem was probably related to the number of rows that the CBO thought were in the dynamic IN list collection.The reason is right there in the autotrace, but a quick trace of the CBO decision making using event 10053 confirmed the following:

Table stats Table: KOKBF$ Alias: KOKBF$
TOTAL :: (NOT ANALYZED) CDN: 4072 NBLKS: 100 TABLE_SCAN_CST: 2 AVG_ROW_LEN: 100
Column: KOKBC$ Col#: 1 Table: KOKBF$ Alias: KOKBF$
NO STATISTICS (using defaults)
NDV: 127 NULLS: 0 DENS: 7.8585e-03
***************************************
SINGLE TABLE ACCESS PATH
TABLE: KOKBF$ ORIG CDN: 4072 CMPTD CDN: 4072
Access path: tsc Resc: 2 Resp: 2
BEST_CST: 2.00 PATH: 2 Degree: 1

Basically, the CBO is using a default of 4072 rows in the collection generated from the IN list. I have seen a similar problem with GLOBAL TEMPORARY TABLES and which was helped by using a DYNAMIC_SAMPLING hint. Unfortunately, I believe that the latter was introduced in 9i and certainly any attempts to use it in the above problem on my version were less than fruitful.  Same with the CARDINALITY hint.

Anyway, I took the easy route to the solution. I quickly created a normal heap table with 2 rows in it to see how the CBO would react in those circumstances:


SQL> create table domtest
2 (col1 number);

Table created.


SQL>
SQL> insert into domtest values (42299);

1 row created.


SQL>
SQL> insert into domtest values (42295);

1 row created.


SQL>
SQL> begin
2 dbms_stats.gather_table_stats(USER,'DOMTEST');
3 end;
4 /

PL/SQL procedure successfully completed.


SQL>
SQL> set autotrace trace explain
SQL>
SQL> select n.slug, n.id
2 from news n
3 where n.status=15
4 and n.sectionid in
5 (
6 select * from domtest
7 );

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=11 Card=94 Bytes=4136)
1 0 NESTED LOOPS (Cost=11 Card=94 Bytes=4136)
2 1 VIEW OF 'VW_NSO_1' (Cost=3 Card=2 Bytes=26)
3 2 SORT (UNIQUE) (Cost=3 Card=2 Bytes=10)
4 3 TABLE ACCESS (FULL) OF 'DOMTEST' (Cost=1 Card=2 Bytes=10)
5 1 INDEX (RANGE SCAN) OF 'I_NEWS_001' (NON-UNIQUE) (Cost=4 Card=306123 Bytes=9489813)

Under these circumstances, a nested loop seemed more in line with what I was expecting previously.So, I decided to try a nested loop hint, the syntax being


/*+ ordered use_nl (--table / alias--) */

and which says that the table specified should be used as the inner table on the nested loop - so, this should drive off the entries in the IN list array.


SQL> select /*+ ordered use_nl (n) */
2 n.slug, n.id
3 from news n
4 where n.status=15
5 and n.sectionid in
6 (
7 select * from table ( cast ( f_numinlist('42299, 42295') as tt_number))
8 )
9 /


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=16302 Card=192277 Bytes=8460188)
1 0 NESTED LOOPS (Cost=16302 Card=192277 Bytes=8460188)
2 1 VIEW OF 'VW_NSO_1' (Cost=14 Card=4072 Bytes=52936)
3 2 SORT (UNIQUE) (Cost=14 Card=4072)
4 3 COLLECTION ITERATOR (PICKLER FETCH)
5 1 INDEX (RANGE SCAN) OF 'I_NEWS_001' (NON-UNIQUE) (Cost=4 Card=306123 Bytes=9489813)

The numbers are still a little off because the CBO is still using a default on 4072 rows but it runs like a dream.

About these ads

One Response to Varying IN lists

  1. Pingback: Varying IN lists - part II « OraStory

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 68 other followers

%d bloggers like this: