Materialize cardinality

Blowing the dust off the blog with a couple of trivial observations.

Firstly, this little one about one implication of materialize.

I’ve always been a fan of the simplicity of the materialize hint as a quick fix for certain plan stability/performance issues but it comes at a clear cost of actually materialising to temp that subquery and is often used when actually a combination of no_merge, no_unnest and/or no_push_pred might be better choice.

Having been recently working on a platform with a problematic systemic temp addiction, I rarely use it unless I know the cost of materialising that resultset once is cheaper than querying the data the requisite number of times via any alternative method particularly on Exadata where the overhead of querying the data twice might be less than you think (note to self: might be helpful to demo this in a future post).

Here is another implication of materialize on the cardinality of a set of data.

This simulates a real world problem observation where the view contained a materialize hint.

Starting with some data – one day for each of April, five versions per day, between 0 and 5 versions potentially approved each day:

drop table  ref_data_versions;

create table ref_data_versions
(business_date   DATE
,version         NUMBER
,status          VARCHAR2(10));


insert into ref_data_versions
select to_date(20200401,'YYYYMMDD') + days.rn-1
,      versions.rn
,      CASE when versions.rn = round(dbms_random.value(1,5)) then 'APPROVED' ELSE 'UNAPPROVED' END
from   dual
cross join
       (select rownum rn from xmltable('1 to 30')) days
cross join
       (select rownum rn from xmltable('1 to 5')) versions;
       
commit;

select count(*) from ref_data_versions;

The following query represents our view and happens to show the tangential observation that the optimizer does not recognize that the row_number analytic will filter any rows.

explain plan for 
with x as
(select /*+ */ *
 from   (select rdv.*
         ,      row_number() over (partition by business_date order by decode(status,'APPROVED',1,2), version DESC) rnk
         from   ref_data_versions rdv)
 where  rnk = 1)
select * 
from   x;
 
select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
Plan hash value: 2125428461
 
----------------------------------------------------------------------------------------------
| Id  | Operation                | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |                   |   150 |  6300 |     4  (25)| 00:00:01 |
|*  1 |  VIEW                    |                   |   150 |  6300 |     4  (25)| 00:00:01 |
|*  2 |   WINDOW SORT PUSHED RANK|                   |   150 |  4350 |     4  (25)| 00:00:01 |
|   3 |    TABLE ACCESS FULL     | REF_DATA_VERSIONS |   150 |  4350 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("RNK"=1)
   2 - filter(ROW_NUMBER() OVER ( PARTITION BY "BUSINESS_DATE" ORDER BY 
              DECODE("STATUS",'APPROVED',1,2),INTERNAL_FUNCTION("VERSION") DESC )<=1)
 
Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

If we add in a predicate on business date, we get:

explain plan for 
with x as
(select /*+ */ *
 from   (select rdv.*
         ,      row_number() over (partition by business_date order by decode(status,'APPROVED',1,2), version DESC) rnk
         from   ref_data_versions rdv)
 where  rnk = 1)
select * 
from   x
where  business_date = to_date(20200429,'YYYYMMDD');
 
select * from table(dbms_xplan.display);
Plan hash value: 2125428461
 
----------------------------------------------------------------------------------------------
| Id  | Operation                | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |                   |     5 |   210 |     4  (25)| 00:00:01 |
|*  1 |  VIEW                    |                   |     5 |   210 |     4  (25)| 00:00:01 |
|*  2 |   WINDOW SORT PUSHED RANK|                   |     5 |   145 |     4  (25)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL     | REF_DATA_VERSIONS |     5 |   145 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("RNK"=1)
   2 - filter(ROW_NUMBER() OVER ( PARTITION BY "BUSINESS_DATE" ORDER BY 
              DECODE("STATUS",'APPROVED',1,2),INTERNAL_FUNCTION("VERSION") DESC )<=1)
   3 - filter("RDV"."BUSINESS_DATE"=TO_DATE(' 2020-04-29 00:00:00', 'syyyy-mm-dd 
              hh24:mi:ss'))
 
Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

Note that the query expects to get 5 rows rather than 1, but that’s consistent with what we saw before.

What happens if the subquery uses materialize:

explain plan for 
with x as
(select /*+ materialize */ *
 from   (select rdv.*
         ,      row_number() over (partition by business_date order by decode(status,'APPROVED',1,2), version DESC) rnk
         from   ref_data_versions rdv)
 where  rnk = 1)
select * 
from   x
where  business_date = to_date(20200429,'YYYYMMDD');
 
select * from table(dbms_xplan.display);
Plan hash value: 1377080515
 
------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                        | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                             |   150 |  6300 |     6  (17)| 00:00:01 |
|   1 |  TEMP TABLE TRANSFORMATION               |                             |       |       |            |          |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D787C_3AB51228 |       |       |            |          |
|*  3 |    VIEW                                  |                             |   150 |  6300 |     4  (25)| 00:00:01 |
|*  4 |     WINDOW SORT PUSHED RANK              |                             |   150 |  4350 |     4  (25)| 00:00:01 |
|   5 |      TABLE ACCESS FULL                   | REF_DATA_VERSIONS           |   150 |  4350 |     3   (0)| 00:00:01 |
|*  6 |   VIEW                                   |                             |   150 |  6300 |     2   (0)| 00:00:01 |
|   7 |    TABLE ACCESS FULL                     | SYS_TEMP_0FD9D787C_3AB51228 |   150 |  6300 |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - filter("RNK"=1)
   4 - filter(ROW_NUMBER() OVER ( PARTITION BY "BUSINESS_DATE" ORDER BY 
              DECODE("STATUS",'APPROVED',1,2),INTERNAL_FUNCTION("VERSION") DESC )<=1)
   6 - filter("BUSINESS_DATE"=TO_DATE(' 2020-04-29 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

The filter at step 6 is now no longer having any effect on the overall cardinality.

There are approaches using cardinality and opt_estimate which you might use to address some of the underlying issues.

However, just another example of why you should think twice about the liberal application of materialize hints (or any hints!).

The system with the problem was 11.2.0.4. Examples above are run on LiveSQL which is currently 19.

6 Responses to Materialize cardinality

  1. Mikhail Velikikh says:

    Hi Dominic,

    > it comes at a clear cost of actually materialising to temp that subquery

    Oracle introduced Cursor-Duration Temporary Tables (CDT) in 12.2:
    https://docs.oracle.com/en/database/oracle/oracle-database/12.2/newft/new-features.html#GUID-DA942A86-7920-4D63-8E22-0756444D5B52

    One of your plans also has it: ‘LOAD AS SELECT (CURSOR DURATION MEMORY)’
    I believe it can be used only by serial execution plans at the moment.

    Regards,
    Mikhail.

    • Dom Brooks says:

      Hi Mikhail, yes, true. You can see this in 11gR2 pre CDT though. It just so happens that I was using LiveSQL (19c) to simulate the issue from an 11gR2 system.

    • Dom Brooks says:

      Yes, sorry, originally I missed your point which is that this “cost” of writing to/reading from temp can be different once you get CDT where this can be completely in-memory.
      Yes, interesting feature and admittedly not one that I’ve managed to get access to and test the limits of in the real world.

  2. Boneist says:

    Is the materialize hint now documented? I was told it and the corresponding inline hint would never be documented, although I would very much like the latter to be documented!

    • Dom Brooks says:

      Nope. Doesn’t stop it being used of course with obvious caveats. Yes I find myself doing quick fixes on other people’s stuff using inline.

      • Boneist says:

        It frustrates me a lot that these two hints are not documented, particularly as it feels like you get penalised for writing your query in an easy-to-read way using building blocks (aka subfactored queries) without being able to control how those blocks are used unless you use undocumented hints.

        I want to be able to say “this is the same subquery, but the table has such good indexes on it that materialising it makes performance worse, so inline it” and not risk my production code.

        Sure, you could rewrite the query to manually inline the subqueries, but how is writing out the same subquery multiple times good coding practice?

Leave a comment