June 19, 2012 5 Comments
Plan flips… here seems to be general impression that most application performance issues are going to be caused by some sort of “plan flip” and that “plan flips” are bad.
- If you don’t use plan stability features, should you really expect plan stability?
- Even outside your stats job window whenever that is?
- What if you’re using bind variables with histograms?
- And you’ve got features like dynamic sampling which will probably get a slightly different sample for each hard parse?
- And if you’ve also got cardinality feedback kicking in?
The old battle between stability and optimisation, I suppose.
I made a throwaway comment on a forum thread about having seen a performance problem caused by a sql statement which changed to doing direct path IO.
I thought it might be vaguely interesting to do a quick blog post and back that up with a tiny bit of “evidence”.
This was getting on for three weeks ago now (29th May) and there were bigger fish to fry at the time so some of the evidence and some of the accuracy of the evidence might wont a little but I love having a long AWR retention and the outputs below are what ASH & AWR can tell us now about what happened back then.
A particular sql statment was running slowly.
Initial analysis had identified a change in the plan and that change was caused by cardinality feedback and changes in plans are bad of course
Looking at DBA_HIST_SQLSTAT, I could see that the SQL regularly flip-flopped between two plans, one of which was the plan normally received after a hard parse, the other a change brought on by cardinality feedback.
But there really wasn’t much to choose between the plans in terms of statistics and averages.
In the execution plan, we can also skip specifics, the only relevant detail that there was a FULL TABLE SCAN which was the inner part (i.e. executed multiple times for each row in the driving / outer rowsource) of a NESTED LOOP – something which is rarely going to be a good thing, i.e.
NESTED LOOP some row source FULL TABLE SCAN
However, if we cross-reference DBA_HIST_ACTIVE_SESS_HISTORY, then we see the critical differences between the executions of this FULL TABLE SCAN on the 28th / 29th May and other days.
If you don’t like averages then I don’t blame you but you won’t like this query:
with ash as (select trunc(ash.sample_time) snap_day , min(ash.snap_id) min_snap_id , max(ash.snap_id) max_snap_id , ash.current_obj# obj_id , ash.sql_id , ash.sql_plan_hash_value , ash.sql_plan_operation , ash.sql_plan_options , ash.current_obj# , ash.event , count(distinct ash.sql_exec_id) ash_cnt from dba_hist_active_sess_history ash where ash.sql_id = 'drbkfkfxg6a6a' and ash.sql_plan_operation = 'TABLE ACCESS' and ash.sql_plan_options = 'FULL' and ash.current_obj# = 169748 and ash.sample_time >= to_date('01/05/2012','DD/MM/YYYY') and ash.event is not null group by trunc(ash.sample_time) , ash.sql_plan_hash_value , ash.sql_id , ash.sql_plan_operation , ash.sql_plan_options , ash.current_obj# , ash.event) select ash.snap_day , ash.event , ash.ash_cnt , (select round(sum(elapsed_time_delta) / sum(executions_delta) /1000/1000) from dba_hist_sqlstat st where st.sql_id = ash.sql_id and st.snap_id between ash.min_snap_id and ash.max_snap_id) avg_elp , (select round(sum(rows_processed_delta) / sum(executions_delta)) from dba_hist_sqlstat st where st.sql_id = ash.sql_id and st.snap_id between ash.min_snap_id and ash.max_snap_id) avg_rws , round((1275068416 - (select sum(space_allocated_delta) from dba_hist_seg_stat st where st.obj# = ash.obj_id and st.snap_id >= ash.min_snap_id)) / 1024/1024) "SPACE_USED (ISH)" , (select min(bytes)/1024/1024 from dba_hist_sgastat st where name = 'buffer_cache' and st.snap_id between ash.min_snap_id and ash.max_snap_id) min_bc from ash order by snap_day;
What am I trying to see:
- Historic running of a particular sql statment
- Focusing on this FULL TABLE SCAN operation which was a problem on the 28th / 29th
- Occurences in ASH (be very careful of reading too much into samples of samples and aggregates and averages)
- What the statement tended to wait on and a feel for how much waiting (beware of sampled samples)
- And finally a rough approximation of how big the object was relative to the buffer cache (we know what size it is now (which I’ve plugged into the query) so work back through the growth in segment stats. It’s using space allocated because I don’t have access to dbms_space to get space used and space allocated is going to be a high estimate of actual space used – it certainly shouldn’t be more than that)
SNAP_DAY EVENT ASH_CNT AVG_ELP SPACE_USED (ISH) MIN_BC --------- -------------------------------- ---------- ---------- ---------------- ------- 01-MAY-12 db file scattered read 1 19 1152 61696 01-MAY-12 db file sequential read 1 19 1152 61696 01-MAY-12 db file sequential read 1 25 1152 61696 02-MAY-12 db file sequential read 1 8 1152 61696 02-MAY-12 db file scattered read 1 8 1152 61696 07-MAY-12 db file scattered read 1 7 1152 61696 08-MAY-12 db file scattered read 1 24 1152 61696 08-MAY-12 db file sequential read 1 24 1152 61696 09-MAY-12 db file sequential read 1 100 1152 61696 09-MAY-12 db file scattered read 1 100 1152 61696 10-MAY-12 db file sequential read 1 6 1152 61696 10-MAY-12 db file scattered read 1 6 1152 61696 11-MAY-12 db file sequential read 1 7 1152 61696 11-MAY-12 db file scattered read 1 13 1152 61696 11-MAY-12 db file sequential read 1 13 1152 61696 15-MAY-12 db file scattered read 1 148 1152 62208 16-MAY-12 db file scattered read 1 8 1152 62464 16-MAY-12 db file sequential read 2 5 1152 62464 17-MAY-12 db file scattered read 1 17 1152 62208 17-MAY-12 db file sequential read 1 17 1152 62208 18-MAY-12 db file sequential read 2 17 1152 62208 18-MAY-12 db file scattered read 2 17 1152 62208 21-MAY-12 db file scattered read 1 16 1216 62208 21-MAY-12 db file scattered read 1 6 1216 62208 22-MAY-12 db file scattered read 1 19 1216 62208 23-MAY-12 db file scattered read 3 17 1216 62208 23-MAY-12 read by other session 1 15 1216 62208 24-MAY-12 db file scattered read 1 28 1216 61696 25-MAY-12 db file scattered read 1 67 1216 62208 25-MAY-12 db file sequential read 1 67 1216 62208 28-MAY-12 direct path read 3 45 1216 62464 28-MAY-12 direct path read 96 70 1216 62464 29-MAY-12 direct path read 10 484 1216 62464 29-MAY-12 direct path read 137 643 1216 62464 29-MAY-12 enq: KO - fast object checkpoint 1 545 1216 62464 35 rows selected. SQL>
The FULL TABLE SCAN as the outer rowsource of a NESTED LOOP is a performance threat anyway but in summary, a performance problem, irrelevant plan stability and a key switch to direct path IO, not obviously caused by a significant growth in object size.