Flippin’ Witch Hunt – adaptive direct path read

Plan flips… here seems to be general impression that most application performance issues are going to be caused by some sort of “plan flip” and that “plan flips” are bad.

  • If you don’t use plan stability features, should you really expect plan stability?
  • Even outside your stats job window whenever that is?
  • What if you’re using bind variables with histograms?
  • And you’ve got features like dynamic sampling which will probably get a slightly different sample for each hard parse?
  • And if you’ve also got cardinality feedback kicking in?

The old battle between stability and optimisation, I suppose.

I made a throwaway comment on a forum thread about having seen a performance problem caused by a sql statement which changed to doing direct path IO.

I thought it might be vaguely interesting to do a quick blog post and back that up with a tiny bit of “evidence”.

This was getting on for three weeks ago now (29th May) and there were bigger fish to fry at the time so some of the evidence and some of the accuracy of the evidence might wont a little but I love having a long AWR retention and the outputs below are what ASH & AWR can tell us now about what happened back then.

Problem:
A particular sql statment was running slowly.
Initial analysis had identified a change in the plan and that change was caused by cardinality feedback and changes in plans are bad of course 😉

Looking at DBA_HIST_SQLSTAT, I could see that the SQL regularly flip-flopped between two plans, one of which was the plan normally received after a hard parse, the other a change brought on by cardinality feedback.

But there really wasn’t much to choose between the plans in terms of statistics and averages.

In the execution plan, we can also skip specifics, the only relevant detail that there was a FULL TABLE SCAN which was the inner part (i.e. executed multiple times for each row in the driving / outer rowsource) of a NESTED LOOP – something which is rarely going to be a good thing, i.e.

NESTED LOOP
  some row source
  FULL TABLE SCAN

However, if we cross-reference DBA_HIST_ACTIVE_SESS_HISTORY, then we see the critical differences between the executions of this FULL TABLE SCAN on the 28th / 29th May and other days.

If you don’t like averages then I don’t blame you but you won’t like this query:

with ash as
(select trunc(ash.sample_time) snap_day
 ,      min(ash.snap_id) min_snap_id
 ,      max(ash.snap_id) max_snap_id
 ,      ash.current_obj# obj_id
 ,      ash.sql_id
 ,      ash.sql_plan_hash_value
 ,      ash.sql_plan_operation
 ,      ash.sql_plan_options
 ,      ash.current_obj#
 ,      ash.event
 ,      count(distinct ash.sql_exec_id) ash_cnt
 from   dba_hist_active_sess_history ash
 where  ash.sql_id              = 'drbkfkfxg6a6a'
 and    ash.sql_plan_operation  = 'TABLE ACCESS'
 and    ash.sql_plan_options    = 'FULL'
 and    ash.current_obj#        = 169748
 and    ash.sample_time        >= to_date('01/05/2012','DD/MM/YYYY')
 and    ash.event              is not null
 group by
        trunc(ash.sample_time)
 ,      ash.sql_plan_hash_value
 ,      ash.sql_id
 ,      ash.sql_plan_operation
 ,      ash.sql_plan_options
 ,      ash.current_obj#
 ,      ash.event)
select ash.snap_day
,      ash.event
,      ash.ash_cnt
,      (select round(sum(elapsed_time_delta) / sum(executions_delta) /1000/1000)
        from   dba_hist_sqlstat   st
        where  st.sql_id        = ash.sql_id
        and    st.snap_id between ash.min_snap_id
                          and     ash.max_snap_id) avg_elp
,      (select round(sum(rows_processed_delta) / sum(executions_delta))
        from   dba_hist_sqlstat   st
        where  st.sql_id        = ash.sql_id
        and    st.snap_id between ash.min_snap_id
                          and     ash.max_snap_id) avg_rws
,      round((1275068416 -
       (select sum(space_allocated_delta)
        from   dba_hist_seg_stat st
        where  st.obj#          = ash.obj_id
        and    st.snap_id      >= ash.min_snap_id))
      / 1024/1024) "SPACE_USED (ISH)"
,      (select min(bytes)/1024/1024
        from   dba_hist_sgastat st
        where  name             = 'buffer_cache'
        and    st.snap_id between ash.min_snap_id
                          and     ash.max_snap_id) min_bc
from   ash
order by snap_day;

What am I trying to see:

  • Historic running of a particular sql statment
  • Focusing on this FULL TABLE SCAN operation which was a problem on the 28th / 29th
  • Occurences in ASH (be very careful of reading too much into samples of samples and aggregates and averages)
  • What the statement tended to wait on and a feel for how much waiting (beware of sampled samples)
  • And finally a rough approximation of how big the object was relative to the buffer cache (we know what size it is now (which I’ve plugged into the query) so work back through the growth in segment stats. It’s using space allocated because I don’t have access to dbms_space to get space used and space allocated is going to be a high estimate of actual space used – it certainly shouldn’t be more than that)
SNAP_DAY  EVENT                               ASH_CNT    AVG_ELP SPACE_USED (ISH)     MIN_BC
--------- -------------------------------- ---------- ---------- ---------------- -------
01-MAY-12 db file scattered read                    1         19             1152      61696
01-MAY-12 db file sequential read                   1         19             1152      61696
01-MAY-12 db file sequential read                   1         25             1152      61696
02-MAY-12 db file sequential read                   1          8             1152      61696
02-MAY-12 db file scattered read                    1          8             1152      61696
07-MAY-12 db file scattered read                    1          7             1152      61696
08-MAY-12 db file scattered read                    1         24             1152      61696
08-MAY-12 db file sequential read                   1         24             1152      61696
09-MAY-12 db file sequential read                   1        100             1152      61696
09-MAY-12 db file scattered read                    1        100             1152      61696
10-MAY-12 db file sequential read                   1          6             1152      61696
10-MAY-12 db file scattered read                    1          6             1152      61696
11-MAY-12 db file sequential read                   1          7             1152      61696
11-MAY-12 db file scattered read                    1         13             1152      61696
11-MAY-12 db file sequential read                   1         13             1152      61696
15-MAY-12 db file scattered read                    1        148             1152      62208
16-MAY-12 db file scattered read                    1          8             1152      62464
16-MAY-12 db file sequential read                   2          5             1152      62464
17-MAY-12 db file scattered read                    1         17             1152      62208
17-MAY-12 db file sequential read                   1         17             1152      62208
18-MAY-12 db file sequential read                   2         17             1152      62208
18-MAY-12 db file scattered read                    2         17             1152      62208
21-MAY-12 db file scattered read                    1         16             1216      62208
21-MAY-12 db file scattered read                    1          6             1216      62208
22-MAY-12 db file scattered read                    1         19             1216      62208
23-MAY-12 db file scattered read                    3         17             1216      62208
23-MAY-12 read by other session                     1         15             1216      62208
24-MAY-12 db file scattered read                    1         28             1216      61696
25-MAY-12 db file scattered read                    1         67             1216      62208
25-MAY-12 db file sequential read                   1         67             1216      62208
28-MAY-12 direct path read                          3         45             1216      62464
28-MAY-12 direct path read                         96         70             1216      62464
29-MAY-12 direct path read                         10        484             1216      62464
29-MAY-12 direct path read                        137        643             1216      62464
29-MAY-12 enq: KO - fast object checkpoint          1        545             1216      62464

35 rows selected.

SQL> 

The FULL TABLE SCAN as the outer rowsource of a NESTED LOOP is a performance threat anyway but in summary, a performance problem, irrelevant plan stability and a key switch to direct path IO, not obviously caused by a significant growth in object size.

5 Responses to Flippin’ Witch Hunt – adaptive direct path read

  1. hourim says:

    If I’ve clearly understood the bottom line of this post you are saying that without any change in explain plan and without any significant change in an object segment size nor in the buffer cache space availability, a flip of a FULL segment access from a db file scattered read to a direct path read might be a cause of a performance pain.
    Is it right?

    If so, then the explication I have to the case you’ve posted here is that for each FULL segment operation decided by the OPTIMIZER (either FULL TABLE SCAN or INDEX FAST FULL SCAN) a db file sequential read is done on the segment header of the corresponding object to be full scanned. The purpose of this db file sequential read is to get the number of blocks to be read compared to the number of block below the high water mark. Based on this segment header db file sequential read, a decision (not a CBO decision) will be made of either to use a scattered read or a direct path read in order to FULL SCAN this segment.

    However, when this segment header information is already in the buffer cache, there will be no db file sequential read to decide about the manner the segment will be FULL SCANNED. The existing information will be re-used for that purpose.

    What do you think if for, subsequent select, this segment header information is not found in the buffer cache, forcing a new segment header db file sequential read, which at this time, will present different information about the number of block below the high water mark? Will this explain the change from scattered read to direct path read?

    By the way, it’s strange to see in the list of event a db file sequential read for FULL TABLE ACCESS
    Wasn’t this db file sequential read in your list, referring the db file segment header db file sequential read I am speaking about?

    It comes just before the direct path read

    Best Regards

  2. Dom Brooks says:

    Hi Mohamed,

    Thanks for the comment.

    Your summary of the bottom line is correct.

    I think we have to be very careful about reading in too much here because this is DBA_HIST_ACTIVE_SESS_HISTORY, a sample of a sample AND aggregations of multiple executions.

    An extended SQL Trace of both cases would have been much more informative.
    So, I think this is something definitely worthy of further investigation in an isolated test case.

    I appreciate that the information posted above is a little incomplete. I thought it was interesting so it’s a bit of a dump and run in the absence of any time to reproduce the issue.

    One of the reasons that I didn’t look at this particular issue deeper at the time was that there were much bigger performance issues as a result of which it’s highly likely that significantly less of the table was cached when we switched to direct path.

    So I believe that there is an adaptive element to the adaptive serial direct path read which takes into account how much of the object is cached which is along the lines of what you’re suggesting.

    Cheers,
    Dominic

    • Graham says:

      Hi Dom,

      I took something completely different from this (probably because I was doing my usual thing of scanning instead of really reading the details!). Your statement around the FTS being within a NESTED LOOP rarely being a good thing, no argument with that, but it got me thinking. If we know that certain combinations of operations in a plan are likely to be poor performers would it be interesting to write something that goes looking for these ‘possible issues’ proactively?

      It could be a massive waste of time of course, and there are loads of other ways to find the ‘worst’ performing SQL etc, but just because a particular piece of SQL runs longer than others doesn’t necessarily mean that it’s less performant. Maybe it’s doing a hell of a lot more. So then that ‘Top SQL’ list isn’t quite as useful as it first appears.

      Just a random thought. Maybe I’ll try it myself.

      Cheers
      Graham

      • Dom Brooks says:

        Graham – I don’t think it would be ever be a waste of time, write once and use again and again. Doesn’t sound too tricky and it only needs to identify one statment once and it’s effort earned. You just need to identify a plan where the parent operation is a nested loop and the second child operation is some sort of full segment scan.

        As you say, regarding top SQL just because there is statement at #1 doesn’t mean it’s a problem. As long as #1 is normally #1 it’s probably ok?

        In the same way, just because a SQL statement isn’t in the top N (the statement in the article above was not), doesn’t mean it’s not a problem.

  3. oakesgr says:

    Yeah, I was thinking of almost an execution plan ‘anti-pattern’ search (for want of a better phrase). Not necessarily limited to just this one example, but probably throw in a few more that you’d want to be aware of. Cartesian joins seems like an obvious one.

Leave a reply to Graham Cancel reply