Performance Issue with inline view, UNION ALL and Virtual Column

This is the distillation of a performance problem in some regulatory reporting where a mistake in the optimizer cardinality estimates causes all sorts of knock-on performance issues. This post does not look at those knock-on effects but examines what is happening with the optimizer in this case. Most of the digging in the issue belongs to Valerii Satybaldyev and thanks to Sayan Malakshinov, notably for the contributions on the oracle-l thread.

Script to reproduce (currently on 19.6):

drop table t1;


create table t1
(id            number not null
,version       number not null
,create_ts     timestamp not null
,modify_ts     timestamp
,status        varchar2(24) generated always as (NVL2("MODIFY_TS",'SUPERSEDED','LATEST'))
,id2           number not null
,yn            varchar2(1) not null
,business_date date not null);

insert into t1
(id, version, create_ts, id2, yn, business_date)
select rownum
,      1
,      systimestamp
,      rownum
,      case when mod(rownum,2) = 1 then 'Y' else 'N' end
,      trunc(sysdate,'MON') + mod(rownum,10)
from   dual
connect by level <= 1000;

exec dbms_stats.gather_table_stats(USER,'T1');
 

explain plan for
with x as
(select * from t1
 union all
 select * from t1)
select *
from x
where yn = 'Y';

select * from table(dbms_xplan.display);

The problem comes with the estimate of 1 row in the outermost SELECT, particularly for subsequent joins this cardinality estimate causes significant performance problems.

------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |     1 |    42 |   547   (0)| 00:00:01 |
|   1 |  VIEW                       |      |  1000 | 40000 |   548   (0)| 00:00:01 |
|   2 |   UNION-ALL                 |      |       |       |            |          |
|*  3 |    TABLE ACCESS STORAGE FULL| T1   |   500 | 20000 |   274   (0)| 00:00:01 |
|*  4 |    TABLE ACCESS STORAGE FULL| T1   |   500 | 20000 |   274   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - storage("T1"."YN"='Y')
       filter("T1"."YN"='Y')
   4 - storage("T1"."YN"='Y')
       filter("T1"."YN"='Y')

If we look at the 10053 trace, we can see the root cause of the problem.
Compare the BASE STATISTICAL INFORMATION snippets for T1 and for the inline view X.
Note in particular the numbers for column YN which is the predicate as well as for ID2 and BUSINESS_DATE.
Note that these are columns which appear after the virtual column STATUS.
(in the original trace file, the entry under T1 for column #8 comes before column #7, I have reordered below just to make the correlation between T1 and X more obvious)

Table Stats::
  Table: T1  Alias: T1
  #Rows: 1000  SSZ: 0  LGR: 0  #Blks:  1006  AvgRowLen:  40.00  NEB: 0  ChainCnt:  0.00  ScanRate:  0.00  SPC: 0  RFL: 0  RNF: 0  CBK: 0  CHR: 0  KQDFLG: 1
  #IMCUs: 0  IMCRowCnt: 0  IMCJournalRowCnt: 0  #IMCBlocks: 0  IMCQuotient: 0.000000
  Column (#8): BUSINESS_DATE(DATE)
    AvgLen: 8 NDV: 10 Nulls: 0 Density: 0.100000 Min: 2459336.000000 Max: 2459345.000000
  Column (#7): YN(VARCHAR2)
    AvgLen: 2 NDV: 2 Nulls: 0 Density: 0.500000
  Column (#6): ID2(NUMBER)
    AvgLen: 4 NDV: 1000 Nulls: 0 Density: 0.001000 Min: 1.000000 Max: 1000.000000
  Column (#4): MODIFY_TS(TIMESTAMP)
    AvgLen: 1 NDV: 0 Nulls: 1000 Density: 0.000000
  Column (#3): CREATE_TS(TIMESTAMP)
    AvgLen: 11 NDV: 1 Nulls: 0 Density: 1.000000 Min: 2459347.407141 Max: 2459347.407141
  Column (#2): VERSION(NUMBER)
    AvgLen: 3 NDV: 1 Nulls: 0 Density: 1.000000 Min: 1.000000 Max: 1.000000
  Column (#1): ID(NUMBER)
    AvgLen: 4 NDV: 1000 Nulls: 0 Density: 0.001000 Min: 1.000000 Max: 1000.000000

Table Stats::
  Table:  X  Alias:  X  (NOT ANALYZED)
  #Rows: 2000  SSZ: 0  LGR: 0  #Blks:  2012  AvgRowLen:  40.00  NEB: 0  ChainCnt:  0.00  ScanRate:  0.00  SPC: 0  RFL: 0  RNF: 0  CBK: 0  CHR: 0  KQDFLG: 1
  #IMCUs: 0  IMCRowCnt: 0  IMCJournalRowCnt: 0  #IMCBlocks: 0  IMCQuotient: 0.000000
  Column (#8): BUSINESS_DATE(DATE)  NO STATISTICS (using defaults)
    AvgLen: 7 NDV: 0 Nulls: 0 Density: 0.000000
  Column (#7): YN(VARCHAR2)
    AvgLen: 8 NDV: 10 Nulls: 0 Density: 0.100000 Min: 2459336.000000 Max: 2459345.000000
  Column (#6): ID2(NUMBER)
    AvgLen: 2 NDV: 2 Nulls: 0 Density: 0.500000
  Column (#5): STATUS(VARCHAR2)
    AvgLen: 4 NDV: 1000 Nulls: 0 Density: 0.001000 Min: 1.000000 Max: 1000.000000
  Column (#4): MODIFY_TS(TIMESTAMP)
    AvgLen: 1 NDV: 0 Nulls: 2000 Density: 0.000000 Min: 0.000000 Max: 0.000000
  Column (#3): CREATE_TS(TIMESTAMP)
    AvgLen: 11 NDV: 1 Nulls: 0 Density: 1.000000 Min: 2459347.407141 Max: 2459347.407141
  Column (#2): VERSION(NUMBER)
    AvgLen: 3 NDV: 1 Nulls: 0 Density: 1.000000 Min: 1.000000 Max: 1.000000
  Column (#1): ID(NUMBER)
    AvgLen: 4 NDV: 1000 Nulls: 0 Density: 0.001000 Min: 1.000000 Max: 1000.000000

For inline view X, column STATUS inherits the stats for ID2, ID2 gets those of YN, YN those of BUSINESS_DATE and BUSINESS_DATE gets NO STATISTICS.
So for our predicate on YN we get completely the wrong base statistics.
Then in addition, because the supplied value is out of range from the wrong statistics, estimates get further pro-rated.

Potential workaround is not risk-free as it moves the implicit order of the selected columns which could affect certain code.

alter table t1 modify status invisible;
alter table t1 modify status visible;

explain plan for
with x as
(select * from t1
 union all
 select * from t1)
select *
from x
where yn = 'Y';

select * from table(dbms_xplan.display);

Plan hash value: 3505968351
------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |  1000 | 90000 |   548   (0)| 00:00:01 |
|   1 |  VIEW                       |      |  1000 | 90000 |   548   (0)| 00:00:01 |
|   2 |   UNION-ALL                 |      |       |       |            |          |
|*  3 |    TABLE ACCESS STORAGE FULL| T1   |   500 | 20000 |   274   (0)| 00:00:01 |
|*  4 |    TABLE ACCESS STORAGE FULL| T1   |   500 | 20000 |   274   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Reason it works is that the virtual column is now logically at the end of the column definitions.

If we compare the trace once “fixed”, then we see the following in the X section:

 Table:  X  Alias:  X  (NOT ANALYZED)
...
  Column (#8): STATUS(VARCHAR2)  NO STATISTICS (using defaults)

Online Statistics Gathering

Online Statistics Gathering, https://sqlmaria.com/2017/01/03/online-statistics-gathering/, is a pretty cool feature for the ages-old version that is Oracle 12.1, especially data warehouses.

As a recap, when you do direct path load into an empty segment, Oracle will take advantage of gathering statistics on the data whilst it is loading it, i.e no need to go back and re-read in an explicit stats gathering.

Note that, per Maria’s note, only base table and column statistics are gathered, not histograms but those can be gathered without re-gathering base column statistics via GATHER AUTO.

Old news.

We have only recently upgraded a complex application from 11.2.0.4 and hit an issue which is really obvious in hindsight but just worth pointing out, particular in combination with another feature?

I’m sure the behaviour below will come as no surprise in 11.2.0.4

drop table t1;
create global temporary table t1
on commit preserve rows
as
select * from user_objects where rownum <= 100;

explain plan for
select * from t1;

select * from table(dbms_xplan.display);
select num_rows from user_tables where table_name = 'T1';
select column_name, num_nulls, num_distinct from user_tab_columns where table_name = 'T1';

The output of note:

Plan hash value: 3617692013
----------------------------------------------------------------------------------
| Id | Operation                      | Name | Rows | Bytes | Cost (%CPU)| Time  |
----------------------------------------------------------------------------------
|  0 | SELECT STATEMENT               |      |  100 | 19000 |   2  (0)| 00:00:01 |
|  1 | TABLE ACCESS STORAGE FULL      | T1   |  100 | 19000 |   2  (0)| 00:00:01 |
----------------------------------------------------------------------------------

Note
-----
  - dynamic sampling used for this statement (level=2)


 NUM_ROWS
----------
      

COLUMN_NAME           NUM_NULLS NUM_DISTINCT
-------------------- ---------- ------------
GENERATED                       
SECONDARY                       
NAMESPACE                       
EDITION_NAME                      
OBJECT_NAME                      
SUBOBJECT_NAME                     
OBJECT_ID                       
DATA_OBJECT_ID                     
OBJECT_TYPE                      
CREATED                        
LAST_DDL_TIME                     
TIMESTAMP                       
STATUS                         
TEMPORARY                       
                 

We load 100 rows, there are no stats, dynamic sampling kicks in, bingo the optimizer expects 100 rows.

Then for > 11.2.0.4 ( in this case 19.6), with the addition of SCOPE column in the script for 19c, we see that we now have table and column statistics after the load despite no explicit stats gathering call. Which, given the feature we’re discussing, should be completely obvious and expected.

I’ve used a temporary table here deliberately purely to facilitate a comment at the end. Clearly this could be a normal heap table – but the SCOPE in the output is specific to the new feature of temporary table private statistics.

Plan hash value: 3617692013
----------------------------------------------------------------------------------
| Id | Operation                   | Name | Rows | Bytes | Cost (%CPU)|     Time |
----------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |      |  100 | 11300 |      2  (0)| 00:00:01 |
|  1 | TABLE ACCESS STORAGE FULL   | T1   |  100 | 11300 |      2  (0)| 00:00:01 |
----------------------------------------------------------------------------------

Note
-----
  - Global temporary table session private statistics used


  NUM_ROWS SCOPE  
---------- -------
           SHARED 
       100 SESSION

COLUMN_NAME            NUM_NULLS NUM_DISTINCT SCOPE  
--------------------- ---------- ------------ -------
OBJECT_NAME                    0            7 SESSION
SUBOBJECT_NAME                 7           93 SESSION
OBJECT_ID                      0          100 SESSION
DATA_OBJECT_ID                 3           97 SESSION
OBJECT_TYPE                    0            3 SESSION
CREATED                        0           95 SESSION
LAST_DDL_TIME                  0           97 SESSION
TIMESTAMP                      0           96 SESSION
STATUS                         0            1 SESSION
TEMPORARY                      0            1 SESSION
GENERATED                      0            2 SESSION
SECONDARY                      0            1 SESSION
NAMESPACE                      0            1 SESSION
EDITION_NAME                 100            0 SESSION
SHARING                        0            1 SESSION
EDITIONABLE                  100            0 SESSION
ORACLE_MAINTAINED              0            1 SESSION
APPLICATION                    0            1 SESSION
DEFAULT_COLLATION             94            1 SESSION
DUPLICATED                     0            1 SESSION
SHARDED                        0            1 SESSION
CREATED_APPID                100            0 SESSION
CREATED_VSNID                100            0 SESSION
MODIFIED_APPID               100            0 SESSION
MODIFIED_VSNID               100            0 SESSION

Again, we’ve loaded 100 rows, as part of the direct path load, we have some base & column statstics, the optimizer expects 100 rows. No dynamic sampling necessary

But what if our data is loaded in multiple steps, might this new online statistics gathering cause us a bit of a problem?

It only gathers the stats when the segment is empty.

I’m not going to repeat the output above for the table & column statistics, just the execution plan detail.

drop table t1;
create global temporary table t1
on commit preserve rows
as
select * from user_objects where rownum <= 100;
insert /*+ append */ into t1
select * from user_objects
commit;

insert /*+ append */ into t1
select * from user_objects
commit;

explain plan for
select * from t1;

Which in 11.2.0.4 gives:

Plan hash value: 3617692013
---------------------------------------------------------------------------------------
| Id | Operation                      | Name |   Rows | Bytes | Cost (%CPU)|    Time  |
---------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT               |      |  32376 | 6007K |    93   (0)| 00:00:01 |
|  1 | TABLE ACCESS STORAGE FULL      | T1   |  32376 | 6007K |    93   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Note
-----
  - dynamic sampling used for this statement (level=2)

select count(*) from t1;

  COUNT(*)
----------
     34392

And in 19.6

Plan hash value: 3617692013
----------------------------------------------------------------------------------
| Id | Operation                   | Name | Rows | Bytes | Cost (%CPU)|     Time |
----------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |      |  100 | 11300 |      2  (0)| 00:00:01 |
|  1 | TABLE ACCESS STORAGE FULL   | T1   |  100 | 11300 |      2  (0)| 00:00:01 |
----------------------------------------------------------------------------------

Note
-----
  - Global temporary table session private statistics used

select count(*) from t1;

  COUNT(*)
----------
     59712*

*With an obvious difference in actual rows, because these are different environments on different versions.

The problem now is that our subsequent processing code which drives off a table populated with different volumes of data in multiple stages used to work fine in 11.2.0.4 but is now at risk of some serious consequences from having cardinality mis-estimates thanks to inaccurate statistics from our new feature which is behaving as it should.

In our real-world example, it was not always the case that there was a big skew between the first load and the subsequent loads and actually this consequence, obvious in hindsight, took a few weeks to reveal itself until that first load into an empty segment happened to be very small followed by some other loads which were much larger.

Furthermore, this also involved the private temporary table aspect and so when you’re looking at solutions, you might consider whether some explicit stats gathering at the end of the load is the solution – in which case should it be private stats or shared stats or whether it even matters when quite possibly the easiest solution is to just use dynamic sampling in the subsequent select/s.

I was reminded of this issue when writing this other article earlier on a more serious problem that online statistics gathering was causing:

Real-time Statistics ORA-00600s / Integer Overflow

Unfortunately this is just an anecdotal long after-the-fact note on issues observed on upgrade from 11.2.0.4 to 19.6 with new feature real-time statistics. I thought I had joted this down previously, but doesn’t appear to be another article published or in draft. Issue was not reproducible outside of the application unfortunately and was also intermittent.

Feature was causing several issues in one particular area of the application.

First off was ORA-00600s in the processing with immediate disconnect on the processing session (apart from trace file generation)

Second was sometimes leaving evidence of integer overflow issues in column statistics.

Third, “corrupt” statistics then leading to subsequent bad execution plans for complicated queries evidenced by MERGE JOIN / MERGE JOIN CARTESIAN and row estimates of max 18E.

Examples of the latter complex queries with cardinality issues leading to bad plans and severe performance problems, when distilled to simplest form, could be seen as follows:

explain plan for
SELECT *
FROM   XXXXX
WHERE  business_date     = :busines_date
AND    id                = :id
AND    processed_flag    = 'Y';

select * from table(dbms_xplan.display);

----------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |              |    18E|    15E|   647K  (1)| 00:00:26 |
|*  1 |  TABLE ACCESS BY INDEX ROWID BATCHED| XXXXXXXXXXXX |    18E|    15E|   647K  (1)| 00:00:26 |
|*  2 |   INDEX RANGE SCAN                  | XXXXXXXXXXXX |   733K|       |  1538   (1)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Note
-----
   - dynamic statistics used: statistics for conventional DML
explain plan for
SELECT /*+ opt_param('optimizer_features_enable','18.1.0') */ *
FROM   XXXXX
WHERE  business_date     = :busines_date
AND    id                = :id
AND    processed_flag    = 'Y';

select * from table(dbms_xplan.display);

----------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |              |     5 |  1405 |    13   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID BATCHED| XXXXXXXXXXXX |     5 |  1405 |    13   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | XXXXXXXXXXXX |    10 |       |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

The root cause corruption evidenced as follows:

select num_nulls, notes from dba_tab_col_statistics where....

NUM_NULLS                SAMPLE_SIZE NOTES
----------------------   ----------- ------------------
-917567866115141580119           267 STATS_ON_CONVENTIONAL_DML
                   500           267 STATS_ON_CONVENTIONAL_DML
                   500           267 STATS_ON_CONVENTIONAL_DML
    -10558214104677900           267 STATS_ON_CONVENTIONAL_DML
-917567866114408000000           267 STATS_ON_CONVENTIONAL_DML
-917567866114408000000           267 STATS_ON_CONVENTIONAL_DML
         -115153305100           267 STATS_ON_CONVENTIONAL_DML

It should be obvious that NUM_NULLs cannot be < 1 and also some strange, unsuitable values in some parts on LOW_VALUE which I’ve omitted from output above.

Traction with Oracle Support was disappointing and, unfortunately, once they knew we didn’t have a problem if we just turned the feature off, that was good enough for them and to be honest we just needed to move on in our testing – this was back in August 2020.

So, feature real-time statistics is turned off across the board via the parameters below which being underscore parameters we had Oracle Support’s blessing to do.

_optimizer_use_stats_on_conventionaml_dml

_optimizer_gather_stats_on conventional_dml

Oracle 19c Upgrade: Query hanging on PX Deq: Join Ack Part 2

Previously I had blogged about an intermittent problem which was affecting one of our feeds where a parallel query would just hang as the QC would wait forever on a parallel slave which never joined the party.

The theory was the parallel slave was affected by another issue – possibly the slave process died and until PMON cleans it up, the QC assumes it is still alive (ref doc id 250960.1)

Intermittent problems can be a right pain but this issue mainly affected one particular process most days kicking off between 1 and 2 am – same process runs 30-40 times per day but only tended to have the issue once so it seemed somewhat data dependent – but also would affect some other processes in different schemas very occasionally.

Over time, evidence built from trace files suggest that the issue was mostly related to:

ORA-00600: internal error code, arguments: [17182]

or

ORA-00600: internal error code, arguments: [17114]

or both in the same error stack

If the QC received the error, connection would terminate instantly (apart from trace file generation), if the slave received the error then QC would hang.

One patch – 31045929 – suggested by Oracle Support was applied but was not effective.

Second patch – 28907129 – similar.

Ultimately, a workaround, which was tested to validate that the second patch was in the right ballpark, was effective and that was to disable _optimizer_gather_stats_on_load (https://sqlmaria.com/2017/01/03/online-statistics-gathering/)

It’s disappointing to have to turn off yet another “new” feature – this one related to stats gathering for direct path into an empty segment – when we have already had to turn off real-time statistics for conventional dml previously ( link to future post – I thought I had blogged about this but seems not) but hopefully we will still be able to pursue patch fix to success.

Count Rows in Partition Where…

Just an illustration of a SQL / XML utility to count rows in a partition.

In my case, my requirement was to count rows in all subpartitions where there is data which might not belong in that partition (due to NOVALIDATE partition exchange). My thoughts were that I wanted the equivalent of a NESTED LOOP – for each partition… do XYZ and was happy for this to run for a while in the background without consuming significant parallel resources.

Partition keys here are not nullable.

with s as
(select /*+ */ 
        sp.table_owner, sp.table_name, sp.subpartition_name
 ,      sn.snapshot_id, sn.version
 from   dba_tab_subpartitions sp 
 join   my_snaphot_table sn on (sn.subpartition_name = sp.subpartition_name)
 where  sp.table_owner = :owner 
 and    sp.table_name  = :table_name)
select s.*, x.cnt
from   s
 ,      xmltable('for $i in /ROWSET/ROW/CNT
                  return $i'
                 passing xmltype(
                          dbms_xmlgen.getxml
                          ('select /*+ parallel(2) */ count(*) cnt '
                         ||'from '||s.table_owner||'.'||s.table_name||' SUBPARTITION ('||s.subpartition_name||') '
                         ||'where (snapshot_id, snapshot_version) != (('''||s.snapshot_id||''', '||s.version||'))'
                           ))
                 columns cnt number path '/') x
where x.cnt > 0; 

Oracle 19c Upgrade: Bug 31602782 – Same Instance Slave Parse Failure Flood Control

It’s almost a universal truth that for any non-trivial application you can expect to hit one or two serious incidents in production post-upgrade, most often due to some combination of workload and/or timing which just didn’t happen in testing beforehand.

In this instance (/these instances), it was the bug 31602782, the patch for which has the above title, the doc id description being “Contention on CURSOR: Pin S wait on X when PQ slave’s execution plan does not match with QC.

This has been around since 12.2 and is first included in 20.1. The doc describes it as

When a parallel query is executed, QC sends parse message to all slaves in all instances…. if the slave could not build the cursor with the same plan as QC and could not join, another slave tries the same and so on until all slaves get exhausted

In practice, what this meant for us was avery large and significant contention on library cache lock and cursor: pin s wait for x affecting all data loads, large parsing backlogs, issues with anything then wanting library cache lock on the same underlying objects – e.g.partition maintenance operations, etc

There were two specific processes involved and had we not got a bug diagnosis when we did, I would have started to rewrite those processes. It is not uncommon that the flows which end up hitting bugs and performance issues have mileage left in them in terms of optimisation. In hindsight, I believe we did hit this problem in UAT but not in this area and in UAT I had rewritten the process in question as it was problematic anyway,

Certainly once this patch was applied, our most urgent post-upgrade issues were largely resolved.

As an aside, I would urge anyone looking at an upgrade to review beforehand the list of Important Recommended One-off Patches which for 19c is Doc Id: 2720807.1

Oracle 19c Upgrade: Query hanging on PX Deq: Join Ack

Next installment of issues observed post Oracle Upgrade (11.2.0.4 to 19.6)

Intermittent query affecting a number of SQL statement, but one particular SQL more than any other.

Event PX Deq: Join Ack is usually seen briefly when the QC (Query Co-ordinator) has to build the slave sets in order to execute a SQL statement in parallel. A message is sent to each slave and then the QC waits for acknowledgement (reference Doc Id: 250960.1)

On my application, when this problem occurs, the QC will hang forever waiting for the acknowledgement from one or a slow sequence of slaves. Longest I saw before intervention was 3 days.

Per that note, possible reason for prolonged wait times is that something happened to the PQ slave process. Unless/until PMON cleans up the internal structures, any new QC will assume the process is alive. Another possible cause is depletion of SGA memory such that the slave cannot establish its response channel.

With us, this issue is currently under investigation with Oracle Support but is believed to correlate to memory corruption errors:

ORA-00600: internal error code, arguments: [17182]

which ties back to bug 31045929: PGA Memory Corruption Caused By Cursor Frame Overrun

Oracle 19c Upgrade: Query reading way more partitions than it should

Still a work in progress… but despite a query plan which details that partition pruning should be happening, evidence (both physical reads and obvious timing) suggest that an OLTP-style query is reading all table partitions.

There is no evidence to suggest that this was happening before the upgrade (11.2.0.4 to 19.6) and plenty of evidence of it afterwards. Not consistent suggesting some significant triggering event.

Oracle: Groundbreakers Developer Community https://community.oracle.com/tech/developers/discussion/4480886/partition-list-iterator-reading-too-many-partitions/ currently has all the latest information. If I ever get to a conclusion, then I might collate & summarise here.

COMPATIBLE Identifier Intrigue

An observation about a crucial difference whilst testing an upgrade with / without COMPATIBLE.

Upgrade from 11.2.0.4 to 19.6
Initial upgrade testing done with default COMPATIBLE of 19.0.0
Another enviroment was upgraded with COMPATIBLE set to 11.2.0.4.0, rationale being that it allowed for “easy” downgrade should we run into unresolvable issues.

For background reason, please see Mike Dietrich blog on COMPATIBLE: https://mikedietrichde.com/2019/04/17/when-and-how-should-you-change-compatible/
I am not criticising this article at all.
There is an example in the blog about identifier being such a feature impacted by COMPATIBLE.And there are some interesting points in the discussions about the impact of testing.

We had an issue with a view on top of a PIVOT query.
PIVOT is slightly unusual in that the resulting column names are dynamic.

This example is a bit ridiculous… however… Let’s say we had a pivot a bit like this:

select owner, created_year, "January can be a cold month _T"
  from (
         select owner
              , to_number(to_char(created,'YYYY')) created_year
              , to_char(created,'MON') created_month
              , object_type
           from dba_objects
           where owner in ('SYSTEM','SYS')
       )
pivot (
         max(object_type) as type, count(*) as cnt for created_month in
         ('JAN' as "January can be a cold month "
         ,'FEB' as "February can be a bit miser"
         ,'MAR' as "Mad as a Hare"
         ,'APR' as "Showers"
         ,'MAY' as "Can be magnificent"
         ,'JUN' as "Flamin'"
         ,'JUL' as "Peak Summer"
         ,'AUG' as "Often disappointing"
         ,'SEP' as "Always a bonus"
         ,'OCT' AS "Neither here nor there"
         ,'NOV' as "All the best babies born he"
         ,'DEC' as "Christmas? Already?")
       )
order
    by owner
     , created_year;
OWNER                          CREATED_YEAR January can be a co
------------------------------ ------------ -------------------
SYS                                    2014
SYS                                    2015
SYS                                    2016 DIRECTORY
SYS                                    2017
SYS                                    2018 TABLE SUBPARTITION
SYS                                    2019
SYS                                    2020
SYS                                    2021 TABLE PARTITION
SYSTEM                                 2014                    

9 rows selected.

You can see our final column selection (which might be a view definition in the real world ?!?!??!), is relying on the 11g implicit identifier truncation to 30 characters of “January can be a cold month _T”.
Unwise… in hindsight.

So we fix this and release it to our default 19.6 COMPATIBLE database.

select owner, created_year, "January can be a cold month _TYPE"
  from (
         select owner
              , to_number(to_char(created,'YYYY')) created_year
              , to_char(created,'MON') created_month
              , object_type
           from dba_objects
           where owner in ('SYSTEM','SYS')
       )
pivot (
         max(object_type) as type, count(*) as cnt for created_month in
         ('JAN' as "January can be a cold month "
         ,'FEB' as "February can be a bit miser"
         ,'MAR' as "Mad as a Hare"
         ,'APR' as "Showers"
         ,'MAY' as "Can be magnificent"
         ,'JUN' as "Flamin'"
         ,'JUL' as "Peak Summer"
         ,'AUG' as "Often disappointing"
         ,'SEP' as "Always a bonus"
         ,'OCT' AS "Neither here nor there"
         ,'NOV' as "All the best babies born he"
         ,'DEC' as "Christmas? Already?")
       )
order
    by owner
     , created_year;
	 OWNER      CREATED_YEAR January can be a cold m
---------- ------------ -----------------------
SYS                2015
SYS                2016 DIRECTORY
SYS                2017
SYS                2018
SYS                2019
SYS                2020
SYS                2021 VIEW
SYSTEM             2015
SYSTEM             2021 VIEW                   

9 rows selected.

And then we release it to the other compatible = 11.2.0.4 19c database. Well, I’m sure that there aren’t going to be any surprises that it doesn’t bloody work:

select owner, created_year, "January can be a cold month _TYPE"
  from (
         select owner
              , to_number(to_char(created,'YYYY')) created_year
              , to_char(created,'MON') created_month
              , object_type
           from dba_objects
           where owner in ('SYSTEM','SYS')
       )
pivot (
         max(object_type) as type, count(*) as cnt for created_month in
         ('JAN' as "January can be a cold month "
         ,'FEB' as "February can be a bit miser"
         ,'MAR' as "Mad as a Hare"
         ,'APR' as "Showers"
         ,'MAY' as "Can be magnificent"
         ,'JUN' as "Flamin'"
         ,'JUL' as "Peak Summer"
         ,'AUG' as "Often disappointing"
         ,'SEP' as "Always a bonus"
         ,'OCT' AS "Neither here nor there"
         ,'NOV' as "All the best babies born he"
         ,'DEC' as "Christmas? Already?")
       )
order
    by owner
     , created_year;
ORA-00972: identifier is too long
00972. 00000 -  "identifier is too long"
*Cause:    An identifier with more than 30 characters was specified.
*Action:   Specify at most 30 characters.
Error at Line: 128 Column: 29

So what?
We just use the old one right?
No – it doesn’t bloody work either:

select owner, created_year, "January can be a cold month _T"
  from (
         select owner
              , to_number(to_char(created,'YYYY')) created_year
              , to_char(created,'MON') created_month
              , object_type
           from dba_objects
           where owner in ('SYSTEM','SYS')
       )
pivot (
         max(object_type) as type, count(*) as cnt for created_month in
         ('JAN' as "January can be a cold month "
         ,'FEB' as "February can be a bit miser"
         ,'MAR' as "Mad as a Hare"
         ,'APR' as "Showers"
         ,'MAY' as "Can be magnificent"
         ,'JUN' as "Flamin'"
         ,'JUL' as "Peak Summer"
         ,'AUG' as "Often disappointing"
         ,'SEP' as "Always a bonus"
         ,'OCT' AS "Neither here nor there"
         ,'NOV' as "All the best babies born he"
         ,'DEC' as "Christmas? Already?")
       )
order
    by owner
     , created_year;
ORA-00904: "January can be a cold month _T": invalid identifier
00904. 00000 -  "%s: invalid identifier"
*Cause:
*Action:
Error at Line: 100 Column: 29

Fair enough… What should it be then Mr Compatible?
Give us a clue:

select *
  from (
         select owner
              , to_number(to_char(created,'YYYY')) created_year
              , to_char(created,'MON') created_month
              , object_type
           from dba_objects
           where owner in ('SYSTEM','SYS')
       )
pivot (
         max(object_type) as type, count(*) as cnt for created_month in
         ('JAN' as "January can be a cold month "
         ,'FEB' as "February can be a bit miser"
         ,'MAR' as "Mad as a Hare"
         ,'APR' as "Showers"
         ,'MAY' as "Can be magnificent"
         ,'JUN' as "Flamin'"
         ,'JUL' as "Peak Summer"
         ,'AUG' as "Often disappointing"
         ,'SEP' as "Always a bonus"
         ,'OCT' AS "Neither here nor there"
         ,'NOV' as "All the best babies born he"
         ,'DEC' as "Christmas? Already?")
       )
order
    by owner
     , created_year;
OWNER      CREATED_YEAR January can be a cold m
---------- ------------ -----------------------
SYS                2015
SYS                2016 DIRECTORY
SYS                2017
SYS                2018
SYS                2019
SYS                2020 TABLE PARTITION
SYS                2021 VIEW
SYSTEM             2015
SYSTEM             2021 VIEW

Ok. Give me a bit more of a clue please:

create table dom1 as
select *
  from (
         select owner
              , to_number(to_char(created,'YYYY')) created_year
              , to_char(created,'MON') created_month
              , object_type
           from all_objects
           where owner in ('SYSTEM','SYS')
       )
pivot (
         max(object_type) as type for created_month in
         ('JAN' as "January can be a cold month ")
       )
order
    by owner
     , created_year;

Table DOM1 created.
select * from dom1;
OWNER      CREATED_YEAR January can be a cold m
---------- ------------ -----------------------
SYS                2015
SYS                2016
SYS                2017
SYS                2018
SYS                2019
SYS                2021 VIEW
SYSTEM             2015
SYSTEM             2021 VIEW

You’re playing hard to get…

select listagg('"'||column_name||'"',',') within group(order by column_id) from user_tab_columns where table_name = 'DOM1';
LISTAGG('"'||COLUMN_NAME||'"',',')WITHINGROUP(ORDERBYCOLUMN_ID)
-----------------------------------------------------------------
"OWNER","CREATED_YEAR","January can be a cold month _TYPE"
select "OWNER","CREATED_YEAR","January can be a cold month _TYPE" from dom1;
ORA-00972: identifier is too long
00972. 00000 -  "identifier is too long"
*Cause:    An identifier with more than 30 characters was specified.
*Action:   Specify at most 30 characters.
Error at Line: 225 Column: 36&lt;/code&gt;&lt;/pre&gt;
select "OWNER","CREATED_YEAR","January can be a cold month _TYPE"
from (select *
  from (
         select owner
              , to_number(to_char(created,'YYYY')) created_year
              , to_char(created,'MON') created_month
              , object_type
           from all_objects
           where owner in ('SYSTEM','SYS')
       )
pivot (
         max(object_type) as type for created_month in
         ('JAN' as "January can be a cold month ")
       )
order
    by owner;
ORA-00972: identifier is too long
00972. 00000 -  "identifier is too long"
*Cause:    An identifier with more than 30 characters was specified.
*Action:   Specify at most 30 characters.
Error at Line: 227 Column: 31

OK… enough already… I’m not playing these games 🙂

The obvious solution is not to go anywhere near the limit in the first place. This was just a bit of messing around.

Exchange Partition Validation

Testing an upgrade from 11.2.0.4 to 19.6, we observed a very significant degradation in partition exchange performance.

Is it a case of expected behaviour or not?

The documentation is quite clear in both 11.2 and 19c:

https://docs.oracle.com/cd/E11882_01/server.112/e25523/part_admin002.htm#i1107555

https://docs.oracle.com/en/database/oracle/oracle-database/19/vldbg/maintenance-partition-tables-indexes.html#GUID-E08650B4-06B1-43F9-91B0-FBF685A3B848

When you specify WITHOUT VALIDATION for the exchange partition operation, this is normally a fast operation because it involves only data dictionary updates. However, if the table or partitioned table involved in the exchange operation has a primary key or unique constraint enabled, then the exchange operation is performed as if WITH VALIDATION were specified to maintain the integrity of the constraints.

Personally, in a data warehouse with exchange partition I would tend to have defined such a constraint as ENABLE NOVALIDATE if it was even required.

The CONSTRAINT is a PK constraint backed up a local index, partition keys as leading columns.

The trace file points to a clear culprit:

 select /*+ first_rows(1) ordered */ 1 
 from "X"."FB_P_161261_SP_9" A? ,"X"."FACILITY_BALANCE" B 
 where "A"."SNAPSHOT_ID" = "B"."SNAPSHOT_ID" and "A"."SNAPSHOT_VERSION" = "B"."SNAPSHOT_VERSION" and "A"."FACILITY_BALANCE_ID" = "B"."FACILITY_BALANCE_ID" and "A"."VERSION" = "B"."VERSION" 
 and ( tbl$or$idx$part$num("X"."FACILITY_BALANCE",0,0,0 ,B? .rowid )  <  62358 or  tbl$or$idx$part$num("X"."FACILITY_BALANCE",0,0,0 ,B? .rowid )  >  62358 ) 
 and tbl$or$idx$part$num("X"."FACILITY_BALANCE",0,0,0 ,A? ."SNAPSHOT_ID" ,A? ."SNAPSHOT_VERSION" )  <>  62358 and rownum < 2


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      8.40       8.46          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1   4229.27    5278.73   27688674   27783294          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3   4237.67    5287.20   27688674   27783294          0           0 

We know from Jonathan Lewis’s work on http://www.nocoug.org/download/2008-11/partitioning_NOCOUG.pdf that this is expected internal SQL from a ENABLE VALIDATE constraint in an EXCHANGE WITHOUT VALIDATION scenario

So why the big change in performance?

Probably this is an indication of just a simple execution plan issue. I would need to trace the same in 11g to confirm which is easier said than done under the circumstances.

This is the info from the 19c tkprof at least:

Rows (1st) Rows (avg) Rows (max) Row Source Operation

---------- ---------- ---------- ---------------------------------------------------
         0          0          0 COUNT STOPKEY (cr=27783294 pr=27688674 pw=0 time=287636041 us starts=1)
         0          0          0  NESTED LOOPS (cr=27783294 pr=27688674 pw=0 time=287636035 us starts=1 cost=4 size=1061 card=1)
 238651024  238651024  238651024   VIEW VW_JF_SET$91EDBF25 (cr=27783288 pr=27688673 pw=0 time=3245932610 us starts=1 cost=4 size=62 card=2)
 238651024  238651024  238651024    UNION-ALL (cr=27783288 pr=27688673 pw=0 time=2708356160 us starts=1)
  11351302   11351302   11351302     PARTITION COMBINED ITERATOR PARTITION: KEY KEY (cr=69570 pr=69469 pw=0 time=5491712 us starts=1 cost=2 size=43 card=1)
  11351302   11351302   11351302      INDEX STORAGE FAST FULL SCAN FIRST ROWS PK_FACILITY_BALANCE PARTITION: KEY KEY (cr=69570 pr=69469 pw=0 time=4380693 us starts=20 cost=2 size=43 card=1)(object id 42709299)
 227299722  227299722  227299722     PARTITION COMBINED ITERATOR PARTITION: KEY KEY (cr=27713718 pr=27619204 pw=0 time=2376271490 us starts=1 cost=2 size=43 card=1)
 227299722  227299722  227299722      INDEX STORAGE FAST FULL SCAN FIRST ROWS PK_FACILITY_BALANCE PARTITION: KEY KEY (cr=27713718 pr=27619204 pw=0 time=1916674628 us starts=62357 cost=2 size=43 card=1)(object id 42709299)
         0          0          0   INDEX UNIQUE SCAN FB_P_161261_SP_9$C3 (cr=6 pr=1 pw=0 time=1734545604 us starts=238651024 cost=0 size=1030 card=1)(object id 72993779)  
		

What is clear is that if the constraint definition is changed to ENABLE NOVALIDATE then the exchange partition takes about 1 minute. Which still feels a bit slow but a heck of a lot better than 90 minutes.

If you see Mohamed Houri’s comment below, this experience is not isolated and in his case, they went for EXCHANGE PARTITION (WITH VALIDATION). After the Christmas break, I will re-evaluate the choice I made.