Materialize cardinality

Blowing the dust off the blog with a couple of trivial observations.

Firstly, this little one about one implication of materialize.

I’ve always been a fan of the simplicity of the materialize hint as a quick fix for certain plan stability/performance issues but it comes at a clear cost of actually materialising to temp that subquery and is often used when actually a combination of no_merge, no_unnest and/or no_push_pred might be better choice.

Having been recently working on a platform with a problematic systemic temp addiction, I rarely use it unless I know the cost of materialising that resultset once is cheaper than querying the data the requisite number of times via any alternative method particularly on Exadata where the overhead of querying the data twice might be less than you think (note to self: might be helpful to demo this in a future post).

Here is another implication of materialize on the cardinality of a set of data.

This simulates a real world problem observation where the view contained a materialize hint.

Starting with some data – one day for each of April, five versions per day, between 0 and 5 versions potentially approved each day:

drop table  ref_data_versions;

create table ref_data_versions
(business_date   DATE
,version         NUMBER
,status          VARCHAR2(10));


insert into ref_data_versions
select to_date(20200401,'YYYYMMDD') + days.rn-1
,      versions.rn
,      CASE when versions.rn = round(dbms_random.value(1,5)) then 'APPROVED' ELSE 'UNAPPROVED' END
from   dual
cross join
       (select rownum rn from xmltable('1 to 30')) days
cross join
       (select rownum rn from xmltable('1 to 5')) versions;
       
commit;

select count(*) from ref_data_versions;

The following query represents our view and happens to show the tangential observation that the optimizer does not recognize that the row_number analytic will filter any rows.

explain plan for 
with x as
(select /*+ */ *
 from   (select rdv.*
         ,      row_number() over (partition by business_date order by decode(status,'APPROVED',1,2), version DESC) rnk
         from   ref_data_versions rdv)
 where  rnk = 1)
select * 
from   x;
 
select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
Plan hash value: 2125428461
 
----------------------------------------------------------------------------------------------
| Id  | Operation                | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |                   |   150 |  6300 |     4  (25)| 00:00:01 |
|*  1 |  VIEW                    |                   |   150 |  6300 |     4  (25)| 00:00:01 |
|*  2 |   WINDOW SORT PUSHED RANK|                   |   150 |  4350 |     4  (25)| 00:00:01 |
|   3 |    TABLE ACCESS FULL     | REF_DATA_VERSIONS |   150 |  4350 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("RNK"=1)
   2 - filter(ROW_NUMBER() OVER ( PARTITION BY "BUSINESS_DATE" ORDER BY 
              DECODE("STATUS",'APPROVED',1,2),INTERNAL_FUNCTION("VERSION") DESC )<=1)
 
Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

If we add in a predicate on business date, we get:

explain plan for 
with x as
(select /*+ */ *
 from   (select rdv.*
         ,      row_number() over (partition by business_date order by decode(status,'APPROVED',1,2), version DESC) rnk
         from   ref_data_versions rdv)
 where  rnk = 1)
select * 
from   x
where  business_date = to_date(20200429,'YYYYMMDD');
 
select * from table(dbms_xplan.display);
Plan hash value: 2125428461
 
----------------------------------------------------------------------------------------------
| Id  | Operation                | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |                   |     5 |   210 |     4  (25)| 00:00:01 |
|*  1 |  VIEW                    |                   |     5 |   210 |     4  (25)| 00:00:01 |
|*  2 |   WINDOW SORT PUSHED RANK|                   |     5 |   145 |     4  (25)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL     | REF_DATA_VERSIONS |     5 |   145 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("RNK"=1)
   2 - filter(ROW_NUMBER() OVER ( PARTITION BY "BUSINESS_DATE" ORDER BY 
              DECODE("STATUS",'APPROVED',1,2),INTERNAL_FUNCTION("VERSION") DESC )<=1)
   3 - filter("RDV"."BUSINESS_DATE"=TO_DATE(' 2020-04-29 00:00:00', 'syyyy-mm-dd 
              hh24:mi:ss'))
 
Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

Note that the query expects to get 5 rows rather than 1, but that’s consistent with what we saw before.

What happens if the subquery uses materialize:

explain plan for 
with x as
(select /*+ materialize */ *
 from   (select rdv.*
         ,      row_number() over (partition by business_date order by decode(status,'APPROVED',1,2), version DESC) rnk
         from   ref_data_versions rdv)
 where  rnk = 1)
select * 
from   x
where  business_date = to_date(20200429,'YYYYMMDD');
 
select * from table(dbms_xplan.display);
Plan hash value: 1377080515
 
------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                        | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                             |   150 |  6300 |     6  (17)| 00:00:01 |
|   1 |  TEMP TABLE TRANSFORMATION               |                             |       |       |            |          |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D787C_3AB51228 |       |       |            |          |
|*  3 |    VIEW                                  |                             |   150 |  6300 |     4  (25)| 00:00:01 |
|*  4 |     WINDOW SORT PUSHED RANK              |                             |   150 |  4350 |     4  (25)| 00:00:01 |
|   5 |      TABLE ACCESS FULL                   | REF_DATA_VERSIONS           |   150 |  4350 |     3   (0)| 00:00:01 |
|*  6 |   VIEW                                   |                             |   150 |  6300 |     2   (0)| 00:00:01 |
|   7 |    TABLE ACCESS FULL                     | SYS_TEMP_0FD9D787C_3AB51228 |   150 |  6300 |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - filter("RNK"=1)
   4 - filter(ROW_NUMBER() OVER ( PARTITION BY "BUSINESS_DATE" ORDER BY 
              DECODE("STATUS",'APPROVED',1,2),INTERNAL_FUNCTION("VERSION") DESC )<=1)
   6 - filter("BUSINESS_DATE"=TO_DATE(' 2020-04-29 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

The filter at step 6 is now no longer having any effect on the overall cardinality.

There are approaches using cardinality and opt_estimate which you might use to address some of the underlying issues.

However, just another example of why you should think twice about the liberal application of materialize hints (or any hints!).

The system with the problem was 11.2.0.4. Examples above are run on LiveSQL which is currently 19.

Subquery


with x as (select sysdate from dual)
select * from utter_bollocks.x;

Materialize

Summary – Note the recursive SQL, the association of an in-memory temporary table with a child cursor, and possible side-effects for distributed transactions.

Prompted by a recent thread on the OTN forums, if you /*+ materialize */ a subquery will you always get IO associated with that materialisation?

Short answer: Yes. For that is what materialisation is all about.

A longer answer is perhaps slightly more interesting.

In terms of materialisation, you can force it with the hint above or it will automatically kick in if you reference it at least twice. I’m not aware of this threshold being documented but Jonathan Lewis mentioned this observation here and it ties in with what I’ve seen.

And it doesn’t seem to matter how small the result set is, it will always be materialised if those materialisation criteria are met.

If we trace a new query, we can see some of the recursive sql involved.

SQL> alter session set events '10046 trace name context forever, level 12';
SQL>
SQL> with x as
  2  (select /*+ materialize */
  3          1 col1
  4   from   dual)
  5  select * from x;

      COL1
----------
         1

SQL> alter session set events '10046 trace name context off';
SQL>

Tracing older versions of Oracle can be more revealing because in 9iR2 for example the trace file explicitly lists the recursive INSERT into the temp table, whereas by the time you get to 11.2 the INSERT has disappeared and the associated waits incorporated into the SELECT.

All versions list the creation of the temporary table (if indeed it needs to be created – see below), the DDL for which includes the specifications IN_MEMORY_METADATA and CURSOR_SPECIFIC_SEGMENT.

CREATE GLOBAL TEMPORARY TABLE "SYS"."SYS_TEMP_0FD9D6610_8A97FC" ("C0" NUMBER )
   IN_MEMORY_METADATA CURSOR_SPECIFIC_SEGMENT STORAGE (OBJNO 4254950928 )
  NOPARALLEL

Note in the creation of table SYS_TEMP_0FD9D6610_8A97FC that 0FD9D6610 is the hex of 4254950928, which is just the sequence-based objno. Not sure of the significance of the last part, e.g.8A97FC.

We can also see that the data is written to temp using a direct path write/direct path write temp depending on version … and selected back via the buffer cache (for efficient use of the data) using a db file sequential read or db file scattered read.

In older versions as mentioned, you should find the recursive INSERT listed separately, e.g. (different database, different version, different temp table name and if you’re interested in the control file sequential read see this post by Timur Akhmadeev):

INSERT /*+ APPEND BYPASS_RECURSIVE_CHECK */ INTO 
  "SYS"."SYS_TEMP_0FD9D662B_671BC5CD" SELECT /*+ */ 1 FROM "SYS"."DUAL" 
  "DUAL"

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  control file sequential read                    3        0.00          0.00
  direct path write                               1        0.00          0.00
********************************************************************************

Otherwise in newer versions, no insert but the waits for the recursive statement listed as part of the main select:

with x as
(select /*+ materialize */
        1 col1
 from   dual)
select * from x 

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  Disk file operations I/O                        2        0.00          0.00
  direct path write temp                          1        0.00          0.00
  direct path sync                                1        0.02          0.02
  SQL*Net message to client                       2        0.00          0.00
  db file sequential read                         1        0.00          0.00
  SQL*Net message from client                     2       14.45         14.45
********************************************************************************

The temp table exists in memory and our session and other sessions cannot describe it but can select from it:

SQL> desc  sys.SYS_TEMP_0FD9D6610_8A97FC;
ERROR:
ORA-04043: object sys.SYS_TEMP_0FD9D6610_8A97FC does not exist


SQL> select * from sys.SYS_TEMP_0FD9D6610_8A97FC;

no rows selected

SQL> 

Note that the temp table is associated with the child cursor. This can be observed by using multiple sessions and forcing the creation of multiple child cursors – for example by using different optimizer settings – and tracing those sessions.

Subsequent executions of this cursor – by this session or another – can reuse this existing in-memory temporary table with no need to recreate it.

So, if we ran into one of the numerous situations that exist – often caused by bugs – where there are excessive child cursors for sql statements, if these use materialised subqueries then this is something else to be slightly concerned about.

If the cursor ages out or we flush the shared pool, the table will be cleaned up along with the cursor.

SQL> alter system flush shared_pool;
SQL> select * from sys.SYS_TEMP_0FD9D6610_8A97FC;
select * from sys.SYS_TEMP_0FD9D6610_8A97FC
                  *
ERROR at line 1:
ORA-00942: table or view does not exist


SQL> 

This recursive creation of the temp table might raise some interesting questions. For example, how this (recursive DDL) might affect / be affected by transactions?

Short answer: It does and it doesn’t

The longer answer is that it only seems to affect distributed transactions and this effect is apparently a bug or bugs, separately listed in both 10.2 – bug 9399589 – and 11.1/11.2 – bug 9706532.

I’ve not tested the proposed patches to the issue, but certainly what happens in 11.2.0.3 is that if you hard-parse the statement as part of a distributed transaction, then the materialisation is silently bypassed (fixed in 12c)

SQL> alter system flush shared_pool;
SQL> -- distributed transaction
SQL> insert into t1@test values(1);
SQL> with x as
  2  (select /*+ materialize */
  3          1
  4   from   dual)
  5  select * from x;

         1
----------
         1
SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------
SQL_ID  0x0a8cyg22wz7, child number 0
-------------------------------------
with x as (select /*+ materialize */         1  from   dual) select *
from x

Plan hash value: 1388734953

-----------------------------------------------------------------
| Id  | Operation        | Name | Rows  | Cost (%CPU)| Time     |
-----------------------------------------------------------------
|   0 | SELECT STATEMENT |      |       |     2 (100)|          |
|   1 |  FAST DUAL       |      |     1 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------

SQL> commit;
SQL> -- no distributed transaction
SQL> with x as
  2  (select /*+ materialize */
  3          1
  4   from   dual)
  5  select * from x;

         1
----------
         1
SQL> 
SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------
SQL_ID  0x0a8cyg22wz7, child number 0
-------------------------------------
with x as (select /*+ materialize */         1  from   dual) select *
from x

Plan hash value: 1388734953

-----------------------------------------------------------------
| Id  | Operation        | Name | Rows  | Cost (%CPU)| Time     |
-----------------------------------------------------------------
|   0 | SELECT STATEMENT |      |       |     2 (100)|          |
|   1 |  FAST DUAL       |      |     1 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------

SQL> 


Whereas if it’s a local transaction that does the hard-parse then materialisation can be used and subsequent executions of that cursor in a distributed transaction can make use of that plan and the existing temp table.

SQL> alter system flush shared_pool;

System altered.

SQL> -- no distributed transaction
SQL> with x as
  2  (select /*+ materialize */
  3          1
  4   from   dual)
  5  select * from x;

         1
----------
         1

1 row selected.

SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------
SQL_ID  0x0a8cyg22wz7, child number 0
-------------------------------------
with x as (select /*+ materialize */         1  from   dual) select *
from x

Plan hash value: 3267439756

---------------------------------------------------------------------------------------------
| Id  | Operation                  | Name                      | Rows  | Bytes | Cost (%CPU)|
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                           |       |       |     4 (100)|
|   1 |  TEMP TABLE TRANSFORMATION |                           |       |       |            |
|   2 |   LOAD AS SELECT           |                           |       |       |            |
|   3 |    FAST DUAL               |                           |     1 |       |     2   (0)|
|   4 |   VIEW                     |                           |     1 |     3 |     2   (0)|
|   5 |    TABLE ACCESS FULL       | SYS_TEMP_0FD9D6604_8B16D2 |     1 |    13 |     2   (0)|
---------------------------------------------------------------------------------------------


18 rows selected.

SQL> -- distributed transaction
SQL> insert into t1@test values(1);

1 row created.

SQL> with x as
  2  (select /*+ materialize */
  3          1
  4   from   dual)
  5  select * from x;

         1
----------
         1

1 row selected.

SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------
SQL_ID  0x0a8cyg22wz7, child number 0
-------------------------------------
with x as (select /*+ materialize */         1  from   dual) select *
from x

Plan hash value: 3267439756

---------------------------------------------------------------------------------------------
| Id  | Operation                  | Name                      | Rows  | Bytes | Cost (%CPU)|
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                           |       |       |     4 (100)|
|   1 |  TEMP TABLE TRANSFORMATION |                           |       |       |            |
|   2 |   LOAD AS SELECT           |                           |       |       |            |
|   3 |    FAST DUAL               |                           |     1 |       |     2   (0)|
|   4 |   VIEW                     |                           |     1 |     3 |     2   (0)|
|   5 |    TABLE ACCESS FULL       | SYS_TEMP_0FD9D6604_8B16D2 |     1 |    13 |     2   (0)|
---------------------------------------------------------------------------------------------


18 rows selected.

SQL> 

I saw on Twitter last week and this week that @Boneist had an interesting experience with this sort of thing.

Finally, as a quick related distraction, note that if you try to get a real time sql monitoring report within a distributed transaction – I mean, why would you? but anyway I found this whilst investing the distributed behaviour above – then it will bomb out with ORA-32036: unsupported case for inlining of query name in WITH clause.

ORA-32035 considered a good thing?

I used to think that it was a good thing that this error was no longer raised in 11.2.

Now I’m not so sure.

SQL> select * from v$version;

BANNER
-------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production

SQL> with t as
  2  (select 1 col2 from dual)
  3  select * from t1;

COL1
-----
XXXXX

SQL> 

Doh!

Note to self: Clear up old Ts, T1s, T2s, etc straight afterwards.

I used to ALWAYS prefix my factored subqueries as subq_*.

This provides a compelling reason to continue that – not that it changes anything of course, just means I’m less likely to make such a typo.