Performance Issue with inline view, UNION ALL and Virtual Column

This is the distillation of a performance problem in some regulatory reporting where a mistake in the optimizer cardinality estimates causes all sorts of knock-on performance issues. This post does not look at those knock-on effects but examines what is happening with the optimizer in this case. Most of the digging in the issue belongs to Valerii Satybaldyev and thanks to Sayan Malakshinov, notably for the contributions on the oracle-l thread.

Script to reproduce (currently on 19.6):

drop table t1;


create table t1
(id            number not null
,version       number not null
,create_ts     timestamp not null
,modify_ts     timestamp
,status        varchar2(24) generated always as (NVL2("MODIFY_TS",'SUPERSEDED','LATEST'))
,id2           number not null
,yn            varchar2(1) not null
,business_date date not null);

insert into t1
(id, version, create_ts, id2, yn, business_date)
select rownum
,      1
,      systimestamp
,      rownum
,      case when mod(rownum,2) = 1 then 'Y' else 'N' end
,      trunc(sysdate,'MON') + mod(rownum,10)
from   dual
connect by level <= 1000;

exec dbms_stats.gather_table_stats(USER,'T1');
 

explain plan for
with x as
(select * from t1
 union all
 select * from t1)
select *
from x
where yn = 'Y';

select * from table(dbms_xplan.display);

The problem comes with the estimate of 1 row in the outermost SELECT, particularly for subsequent joins this cardinality estimate causes significant performance problems.

------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |     1 |    42 |   547   (0)| 00:00:01 |
|   1 |  VIEW                       |      |  1000 | 40000 |   548   (0)| 00:00:01 |
|   2 |   UNION-ALL                 |      |       |       |            |          |
|*  3 |    TABLE ACCESS STORAGE FULL| T1   |   500 | 20000 |   274   (0)| 00:00:01 |
|*  4 |    TABLE ACCESS STORAGE FULL| T1   |   500 | 20000 |   274   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - storage("T1"."YN"='Y')
       filter("T1"."YN"='Y')
   4 - storage("T1"."YN"='Y')
       filter("T1"."YN"='Y')

If we look at the 10053 trace, we can see the root cause of the problem.
Compare the BASE STATISTICAL INFORMATION snippets for T1 and for the inline view X.
Note in particular the numbers for column YN which is the predicate as well as for ID2 and BUSINESS_DATE.
Note that these are columns which appear after the virtual column STATUS.
(in the original trace file, the entry under T1 for column #8 comes before column #7, I have reordered below just to make the correlation between T1 and X more obvious)

Table Stats::
  Table: T1  Alias: T1
  #Rows: 1000  SSZ: 0  LGR: 0  #Blks:  1006  AvgRowLen:  40.00  NEB: 0  ChainCnt:  0.00  ScanRate:  0.00  SPC: 0  RFL: 0  RNF: 0  CBK: 0  CHR: 0  KQDFLG: 1
  #IMCUs: 0  IMCRowCnt: 0  IMCJournalRowCnt: 0  #IMCBlocks: 0  IMCQuotient: 0.000000
  Column (#8): BUSINESS_DATE(DATE)
    AvgLen: 8 NDV: 10 Nulls: 0 Density: 0.100000 Min: 2459336.000000 Max: 2459345.000000
  Column (#7): YN(VARCHAR2)
    AvgLen: 2 NDV: 2 Nulls: 0 Density: 0.500000
  Column (#6): ID2(NUMBER)
    AvgLen: 4 NDV: 1000 Nulls: 0 Density: 0.001000 Min: 1.000000 Max: 1000.000000
  Column (#4): MODIFY_TS(TIMESTAMP)
    AvgLen: 1 NDV: 0 Nulls: 1000 Density: 0.000000
  Column (#3): CREATE_TS(TIMESTAMP)
    AvgLen: 11 NDV: 1 Nulls: 0 Density: 1.000000 Min: 2459347.407141 Max: 2459347.407141
  Column (#2): VERSION(NUMBER)
    AvgLen: 3 NDV: 1 Nulls: 0 Density: 1.000000 Min: 1.000000 Max: 1.000000
  Column (#1): ID(NUMBER)
    AvgLen: 4 NDV: 1000 Nulls: 0 Density: 0.001000 Min: 1.000000 Max: 1000.000000

Table Stats::
  Table:  X  Alias:  X  (NOT ANALYZED)
  #Rows: 2000  SSZ: 0  LGR: 0  #Blks:  2012  AvgRowLen:  40.00  NEB: 0  ChainCnt:  0.00  ScanRate:  0.00  SPC: 0  RFL: 0  RNF: 0  CBK: 0  CHR: 0  KQDFLG: 1
  #IMCUs: 0  IMCRowCnt: 0  IMCJournalRowCnt: 0  #IMCBlocks: 0  IMCQuotient: 0.000000
  Column (#8): BUSINESS_DATE(DATE)  NO STATISTICS (using defaults)
    AvgLen: 7 NDV: 0 Nulls: 0 Density: 0.000000
  Column (#7): YN(VARCHAR2)
    AvgLen: 8 NDV: 10 Nulls: 0 Density: 0.100000 Min: 2459336.000000 Max: 2459345.000000
  Column (#6): ID2(NUMBER)
    AvgLen: 2 NDV: 2 Nulls: 0 Density: 0.500000
  Column (#5): STATUS(VARCHAR2)
    AvgLen: 4 NDV: 1000 Nulls: 0 Density: 0.001000 Min: 1.000000 Max: 1000.000000
  Column (#4): MODIFY_TS(TIMESTAMP)
    AvgLen: 1 NDV: 0 Nulls: 2000 Density: 0.000000 Min: 0.000000 Max: 0.000000
  Column (#3): CREATE_TS(TIMESTAMP)
    AvgLen: 11 NDV: 1 Nulls: 0 Density: 1.000000 Min: 2459347.407141 Max: 2459347.407141
  Column (#2): VERSION(NUMBER)
    AvgLen: 3 NDV: 1 Nulls: 0 Density: 1.000000 Min: 1.000000 Max: 1.000000
  Column (#1): ID(NUMBER)
    AvgLen: 4 NDV: 1000 Nulls: 0 Density: 0.001000 Min: 1.000000 Max: 1000.000000

For inline view X, column STATUS inherits the stats for ID2, ID2 gets those of YN, YN those of BUSINESS_DATE and BUSINESS_DATE gets NO STATISTICS.
So for our predicate on YN we get completely the wrong base statistics.
Then in addition, because the supplied value is out of range from the wrong statistics, estimates get further pro-rated.

Potential workaround is not risk-free as it moves the implicit order of the selected columns which could affect certain code.

alter table t1 modify status invisible;
alter table t1 modify status visible;

explain plan for
with x as
(select * from t1
 union all
 select * from t1)
select *
from x
where yn = 'Y';

select * from table(dbms_xplan.display);

Plan hash value: 3505968351
------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |  1000 | 90000 |   548   (0)| 00:00:01 |
|   1 |  VIEW                       |      |  1000 | 90000 |   548   (0)| 00:00:01 |
|   2 |   UNION-ALL                 |      |       |       |            |          |
|*  3 |    TABLE ACCESS STORAGE FULL| T1   |   500 | 20000 |   274   (0)| 00:00:01 |
|*  4 |    TABLE ACCESS STORAGE FULL| T1   |   500 | 20000 |   274   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Reason it works is that the virtual column is now logically at the end of the column definitions.

If we compare the trace once “fixed”, then we see the following in the X section:

 Table:  X  Alias:  X  (NOT ANALYZED)
...
  Column (#8): STATUS(VARCHAR2)  NO STATISTICS (using defaults)

Leave a comment