Scalar Subselect Costing

This issue is an oldie but deserving of a quick post to stop me going off on a tangent in another post.

It is an oddity of scalar subselects/subqueries that their cost is not taken into account in the top level cost of a query.

In older versions of Oracle, it used to be the case that you didn’t even see the scalar subquery in the execution plan.

However, even in the latest versions, the cost still isn’t accounted for.

Always something to keep in mind.

For example:

SQL> create table t1
  2  (col1 number not null);

Table created.

SQL> 
SQL> insert into t1
  2  select rownum
  3  from   dual
  4  connect by rownum <= 10000;

10000 rows created.

SQL> 
SQL> commit;

Commit complete.

SQL> 
SQL> create table t2
  2  (col1 number not null primary key);

Table created.

SQL> 
SQL> 
SQL> insert into t2
  2  select rownum
  3  from   dual
  4  connect by rownum <= 10000;

10000 rows created.

SQL> 
SQL> commit;

Commit complete.

SQL> 

Let’s do a scalar subselect to do an index lookup on t2 for every row in t1:

SQL> explain plan for
  2  select t1.col1
  3  ,      (select t2.col1 from t2 where t2.col1 = t1.col1)
  4  from   t1;

Explained.

SQL> 
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
Plan hash value: 2339000913

----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              | 10000 |   126K|     8   (0)| 00:00:01 |
|*  1 |  INDEX UNIQUE SCAN| SYS_C0078310 |     1 |    13 |     1   (0)| 00:00:01 |
|   2 |  TABLE ACCESS FULL| T1           | 10000 |   126K|     8   (0)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T2"."COL1"=:B1)

You can see that the cost of the scalar subquery is 1 per execution and it’s not accounted for at the top level.

Let’s force a full table scan of the row-by-row lookup:

SQL> explain plan for
  2  select t1.col1
  3  ,      (select /*+ full(t2) */ t2.col1 from t2 where t2.col1 = t1.col1)
  4  from   t1;

Explained.

SQL> 
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------
Plan hash value: 637946564

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 10000 |   126K|     8   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T2   |     1 |    13 |     2   (0)| 00:00:01 |
|   2 |  TABLE ACCESS FULL| T1   | 10000 |   126K|     8   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("T2"."COL1"=:B1)

Obviously a much more expensive operation but, again, not properly accounted for in the overall costing.

Wouldn’t it be preferable that as the optimizer has estimated the number of rows in the top level select:

|   0 | SELECT STATEMENT  |      | 10000 |   126K|     8   (0)| 00:00:01 |

and it has estimated the cost per execution of the scalar subselect:

|*  1 |  TABLE ACCESS FULL| T2   |     1 |    13 |     2   (0)| 00:00:01 |

that the top level cost include to some degree the cost of scalar subselect per execution * estimated executions?

For example, if we code a join roughly equivalent to the scalar subselect then:

SQL> explain plan for
  2  select /*+ 
  3           full(t2) 
  4           use_nl(t2)
  5           */
  6         t1.col1
  7  ,      t2.col1
  8  from   t1
  9  ,      t2
 10  where t2.col1 (+) = t1.col1;

Explained.

SQL> 
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------
Plan hash value: 2453408398

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |   253K| 66919   (7)| 00:05:35 |
|   1 |  NESTED LOOPS OUTER|      | 10000 |   253K| 66919   (7)| 00:05:35 |
|   2 |   TABLE ACCESS FULL| T1   | 10000 |   126K|     8   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| T2   |     1 |    13 |     7  (15)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("T2"."COL1"(+)="T1"."COL1")

Also see:
http://jonathanlewis.wordpress.com/2007/10/12/scalar-subqueries/

http://oracle-randolf.blogspot.com/2010/01/when-your-projection-is-not-cost-free.html

http://blog.sydoracle.com/2005/09/explain-plans-and-scalar-subqueries.html

One Response to Scalar Subselect Costing

  1. lkafle says:

    cool Oracle subselect article on Subselect

Leave a comment