Scalar Subselect Costing

This issue is an oldie but deserving of a quick post to stop me going off on a tangent in another post.

It is an oddity of scalar subselects/subqueries that their cost is not taken into account in the top level cost of a query.

In older versions of Oracle, it used to be the case that you didn’t even see the scalar subquery in the execution plan.

However, even in the latest versions, the cost still isn’t accounted for.

Always something to keep in mind.

For example:

SQL> create table t1
  2  (col1 number not null);

Table created.

SQL> 
SQL> insert into t1
  2  select rownum
  3  from   dual
  4  connect by rownum <= 10000;

10000 rows created.

SQL> 
SQL> commit;

Commit complete.

SQL> 
SQL> create table t2
  2  (col1 number not null primary key);

Table created.

SQL> 
SQL> 
SQL> insert into t2
  2  select rownum
  3  from   dual
  4  connect by rownum <= 10000;

10000 rows created.

SQL> 
SQL> commit;

Commit complete.

SQL> 

Let’s do a scalar subselect to do an index lookup on t2 for every row in t1:

SQL> explain plan for
  2  select t1.col1
  3  ,      (select t2.col1 from t2 where t2.col1 = t1.col1)
  4  from   t1;

Explained.

SQL> 
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
Plan hash value: 2339000913

----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              | 10000 |   126K|     8   (0)| 00:00:01 |
|*  1 |  INDEX UNIQUE SCAN| SYS_C0078310 |     1 |    13 |     1   (0)| 00:00:01 |
|   2 |  TABLE ACCESS FULL| T1           | 10000 |   126K|     8   (0)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T2"."COL1"=:B1)

You can see that the cost of the scalar subquery is 1 per execution and it’s not accounted for at the top level.

Let’s force a full table scan of the row-by-row lookup:

SQL> explain plan for
  2  select t1.col1
  3  ,      (select /*+ full(t2) */ t2.col1 from t2 where t2.col1 = t1.col1)
  4  from   t1;

Explained.

SQL> 
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------
Plan hash value: 637946564

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 10000 |   126K|     8   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T2   |     1 |    13 |     2   (0)| 00:00:01 |
|   2 |  TABLE ACCESS FULL| T1   | 10000 |   126K|     8   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("T2"."COL1"=:B1)

Obviously a much more expensive operation but, again, not properly accounted for in the overall costing.

Wouldn’t it be preferable that as the optimizer has estimated the number of rows in the top level select:

|   0 | SELECT STATEMENT  |      | 10000 |   126K|     8   (0)| 00:00:01 |

and it has estimated the cost per execution of the scalar subselect:

|*  1 |  TABLE ACCESS FULL| T2   |     1 |    13 |     2   (0)| 00:00:01 |

that the top level cost include to some degree the cost of scalar subselect per execution * estimated executions?

For example, if we code a join roughly equivalent to the scalar subselect then:

SQL> explain plan for
  2  select /*+ 
  3           full(t2) 
  4           use_nl(t2)
  5           */
  6         t1.col1
  7  ,      t2.col1
  8  from   t1
  9  ,      t2
 10  where t2.col1 (+) = t1.col1;

Explained.

SQL> 
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------
Plan hash value: 2453408398

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |   253K| 66919   (7)| 00:05:35 |
|   1 |  NESTED LOOPS OUTER|      | 10000 |   253K| 66919   (7)| 00:05:35 |
|   2 |   TABLE ACCESS FULL| T1   | 10000 |   126K|     8   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| T2   |     1 |    13 |     7  (15)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("T2"."COL1"(+)="T1"."COL1")

Also see:
http://jonathanlewis.wordpress.com/2007/10/12/scalar-subqueries/

http://oracle-randolf.blogspot.com/2010/01/when-your-projection-is-not-cost-free.html

http://blog.sydoracle.com/2005/09/explain-plans-and-scalar-subqueries.html

CPU costing is off

Just a quick note that if you see in your executions plans via DBMS_XPLAN that “CPU costing is off”, then it might not be, you might just have old versions of PLAN_TABLE hanging around.

I’ve just spent a little while trying to figure out why CPU costing might be off, after which a quick drop of a local PLAN_TABLE fixed everything.

So, not so much that CPU costing was off, more that some of the CPU costing information was not available in the PLAN_TABLE.

Interestingly, I did NOT have a message saying that “‘PLAN_TABLE’ is old version”.

I did spend a few minutes trying to recreate the situation with old versions of PLAN_TABLE locally, but all I could get was a message that PLAN_TABLE was old version, nothing about cpu costing being off. Bit weird but probably not worth spending any more time over.

Odd execution plan

I came across an execution plan earlier that struck me as a little odd.
Maybe I’ve seen this plenty of times before but never noticed.

Here’s the query:

SELECT /*+ dom2 */ * 
FROM   xxxxxxx opcl
WHERE  opcl.cal_date BETWEEN TO_DATE('20100901', 'YYYYMMDD' )
                     AND     TO_DATE('20100902', 'YYYYMMDD' )
AND    opcl.active_flag = 'Y'
AND    opcl.cont_tag   != 0
AND    NOT EXISTS (SELECT 'X'
                   FROM   yyyyyyy  r
                   WHERE  r.inst_num    = opcl.inst_num
                   AND    r.status      = 'R'
                   AND    opcl.cal_date < r.conv_date);

Here’s the plan:

-----------------------------------------------------------------------------------------------
| Id  | Operation                     |  Name                    | Rows  | Bytes | Cost (%CPU)|
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                          |  5597 |   333K|  1034   (1)|
|   1 |  TABLE ACCESS BY INDEX ROWID  | XXXXXXXXXXXXXXXXXXX      |  5597 |   333K|  1034   (1)|
|*  2 |   INDEX RANGE SCAN            | XXXXXXXXXXXXXXXXXXX_IDX  |  5597 |       |  1710   (1)|
|*  3 |    TABLE ACCESS BY INDEX ROWID| YYYYYYYYYYYY             |     1 |    16 |     2  (50)|
|*  4 |     INDEX UNIQUE SCAN         | YYYY_PK                  |     1 |       |     2  (50)|
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
                                                   
   2 - access("SYS_ALIAS_2"."CAL_DATE">=TO_DATE(' 2010-09-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "SYS_ALIAS_2"."ACTIVE_FLAG"='Y' AND "SYS_ALIAS_2"."CAL_DATE"<=TO_DATE('
              2010-09-02 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
       filter("SYS_ALIAS_2"."ACTIVE_FLAG"='Y' AND "SYS_ALIAS_2"."CONT_TAG"<>0 AND  NOT EXISTS
              (SELECT /*+ */ 0 FROM "INST_REDENOM" "R" WHERE "R"."INST_NUM"=:B1 AND "R"."CONV_DATE">:B2 AND
              "R"."STATUS"='R'))                                                                                              
   3 - filter("R"."CONV_DATE">:B1 AND "R"."STATUS"='R')
   4 - access("R"."INST_NUM"=:B1)

Where’s the join mechanism?
It has what looks like four standalone operations.

It took me a few minutes to figure out what it really meant.

I think if it had been presented like this, it would have been more obvious:

-----------------------------------------------------------------
| Id  | Operation                     |  Name                    
-----------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                          
|   1 |  TABLE ACCESS BY INDEX ROWID     | XXXXXXXXXXXXXXXXXXX      
           FILTER
|*  2 |      INDEX RANGE SCAN            | XXXXXXXXXXXXXXXXXXX_IDX  
|*  3 |      TABLE ACCESS BY INDEX ROWID | YYYYYYYYYYYY            
|*  4 |        INDEX UNIQUE SCAN         | YYYY_PK                  
-----------------------------------------------------------------

And now I’ve just re-read the section on Filtering (pages 211-214) in Cost-based Oracle Fundamentals where Jonathan mentions that the line in the execution plan can be suppressed. Seems to tally.

Follow

Get every new post delivered to your Inbox.

Join 72 other followers