Materialize cardinality
April 29, 2020 6 Comments
Blowing the dust off the blog with a couple of trivial observations.
Firstly, this little one about one implication of materialize.
I’ve always been a fan of the simplicity of the materialize hint as a quick fix for certain plan stability/performance issues but it comes at a clear cost of actually materialising to temp that subquery and is often used when actually a combination of no_merge, no_unnest and/or no_push_pred might be better choice.
Having been recently working on a platform with a problematic systemic temp addiction, I rarely use it unless I know the cost of materialising that resultset once is cheaper than querying the data the requisite number of times via any alternative method particularly on Exadata where the overhead of querying the data twice might be less than you think (note to self: might be helpful to demo this in a future post).
Here is another implication of materialize on the cardinality of a set of data.
This simulates a real world problem observation where the view contained a materialize hint.
Starting with some data – one day for each of April, five versions per day, between 0 and 5 versions potentially approved each day:
drop table ref_data_versions; create table ref_data_versions (business_date DATE ,version NUMBER ,status VARCHAR2(10)); insert into ref_data_versions select to_date(20200401,'YYYYMMDD') + days.rn-1 , versions.rn , CASE when versions.rn = round(dbms_random.value(1,5)) then 'APPROVED' ELSE 'UNAPPROVED' END from dual cross join (select rownum rn from xmltable('1 to 30')) days cross join (select rownum rn from xmltable('1 to 5')) versions; commit; select count(*) from ref_data_versions;
The following query represents our view and happens to show the tangential observation that the optimizer does not recognize that the row_number analytic will filter any rows.
explain plan for with x as (select /*+ */ * from (select rdv.* , row_number() over (partition by business_date order by decode(status,'APPROVED',1,2), version DESC) rnk from ref_data_versions rdv) where rnk = 1) select * from x; select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT Plan hash value: 2125428461 ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 150 | 6300 | 4 (25)| 00:00:01 | |* 1 | VIEW | | 150 | 6300 | 4 (25)| 00:00:01 | |* 2 | WINDOW SORT PUSHED RANK| | 150 | 4350 | 4 (25)| 00:00:01 | | 3 | TABLE ACCESS FULL | REF_DATA_VERSIONS | 150 | 4350 | 3 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("RNK"=1) 2 - filter(ROW_NUMBER() OVER ( PARTITION BY "BUSINESS_DATE" ORDER BY DECODE("STATUS",'APPROVED',1,2),INTERNAL_FUNCTION("VERSION") DESC )<=1) Note ----- - dynamic statistics used: dynamic sampling (level=2)
If we add in a predicate on business date, we get:
explain plan for with x as (select /*+ */ * from (select rdv.* , row_number() over (partition by business_date order by decode(status,'APPROVED',1,2), version DESC) rnk from ref_data_versions rdv) where rnk = 1) select * from x where business_date = to_date(20200429,'YYYYMMDD'); select * from table(dbms_xplan.display);
Plan hash value: 2125428461 ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 210 | 4 (25)| 00:00:01 | |* 1 | VIEW | | 5 | 210 | 4 (25)| 00:00:01 | |* 2 | WINDOW SORT PUSHED RANK| | 5 | 145 | 4 (25)| 00:00:01 | |* 3 | TABLE ACCESS FULL | REF_DATA_VERSIONS | 5 | 145 | 3 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("RNK"=1) 2 - filter(ROW_NUMBER() OVER ( PARTITION BY "BUSINESS_DATE" ORDER BY DECODE("STATUS",'APPROVED',1,2),INTERNAL_FUNCTION("VERSION") DESC )<=1) 3 - filter("RDV"."BUSINESS_DATE"=TO_DATE(' 2020-04-29 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) Note ----- - dynamic statistics used: dynamic sampling (level=2)
Note that the query expects to get 5 rows rather than 1, but that’s consistent with what we saw before.
What happens if the subquery uses materialize:
explain plan for with x as (select /*+ materialize */ * from (select rdv.* , row_number() over (partition by business_date order by decode(status,'APPROVED',1,2), version DESC) rnk from ref_data_versions rdv) where rnk = 1) select * from x where business_date = to_date(20200429,'YYYYMMDD'); select * from table(dbms_xplan.display);
Plan hash value: 1377080515 ------------------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 150 | 6300 | 6 (17)| 00:00:01 | | 1 | TEMP TABLE TRANSFORMATION | | | | | | | 2 | LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D787C_3AB51228 | | | | | |* 3 | VIEW | | 150 | 6300 | 4 (25)| 00:00:01 | |* 4 | WINDOW SORT PUSHED RANK | | 150 | 4350 | 4 (25)| 00:00:01 | | 5 | TABLE ACCESS FULL | REF_DATA_VERSIONS | 150 | 4350 | 3 (0)| 00:00:01 | |* 6 | VIEW | | 150 | 6300 | 2 (0)| 00:00:01 | | 7 | TABLE ACCESS FULL | SYS_TEMP_0FD9D787C_3AB51228 | 150 | 6300 | 2 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("RNK"=1) 4 - filter(ROW_NUMBER() OVER ( PARTITION BY "BUSINESS_DATE" ORDER BY DECODE("STATUS",'APPROVED',1,2),INTERNAL_FUNCTION("VERSION") DESC )<=1) 6 - filter("BUSINESS_DATE"=TO_DATE(' 2020-04-29 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
The filter at step 6 is now no longer having any effect on the overall cardinality.
There are approaches using cardinality and opt_estimate which you might use to address some of the underlying issues.
However, just another example of why you should think twice about the liberal application of materialize hints (or any hints!).
The system with the problem was 11.2.0.4. Examples above are run on LiveSQL which is currently 19.
Recent Comments