CPU_time much higher than elapsed_time

Of course, one of the first places you should go is Oracle Support, but if you, like me, put off doing that and scratch your head for a while wondering why ELAPSED_TIME in V$SQL might be significantly lower than CPU_TIME when there are no obvious other explanations, then note these two bugs in 11.1.0.7:

BUG 7168259 – Millisecond elapsed_time shows as 0 : Elapsed_time for SQL statements which execute in under one millisecond can show up as 0

BUG 7561762 – Elapsed_time is under-reported : Elapsed_time can be 0 or significantly less than V$SQL.CPU_TIME if there is significant idle wait class time

Manual Concatenation and LNNVL

A quick example about things to consider when considering rewriting a SQL OR condition to a UNION/UNION ALL, as pointed out to me by Jonathan Lewis referencing this post on this recent OTN forum thread.

At the heart of the issue are:

  • how to deal with the possibility of duplicates
  • the potential impact of NULLS

The impact of NULLS is best described by Jonathan in the article already mentioned so I’m going to use a noddy example to illustrate:

  • the problem of duplicates
  • why suggesting a UNION is not an appropriate generic approach (although it may work for specific examples)
  • why that generic approach should be a UNION ALL plus LNNVL

So… if, for whatever reason, we want to rewrite this statement:

select *
from   t2, t1
where  t2.col1 = t1.par
or     t2.col1 = t1.child;

If this is our setup that includes a noddy duplicate:

SQL> create table t1
  2  (par   number
  3  ,child number);

Table created.

SQL> create table t2
  2  (col1 number
  3  ,col2 varchar2(1));

Table created.

SQL> insert into t1 values (1,1);

1 row created.

SQL> insert into t1 values (1,1);

1 row created.

SQL> insert into t2 values (1,'A');

1 row created.

SQL> select * from t1;

       PAR      CHILD
---------- ----------
         1          1
         1          1

2 rows selected.

SQL> select * from t2;

      COL1 C
---------- -
         1 A

1 row selected.

SQL> 

Such that our target statement returns this:

SQL> select *
  2  from   t2, t1
  3  where  t2.col1 = t1.par
  4  or     t2.col1 = t1.child;

      COL1 C        PAR      CHILD
---------- - ---------- ----------
         1 A          1          1
         1 A          1          1

2 rows selected.

SQL> 

We can see that rewriting with a UNION is not equivalent to the original:

SQL> select *
  2  from   t2, t1
  3  where  t2.col1 = t1.par
  4  union
  5  select *
  6  from   t2, t1
  7  where  t2.col1 = t1.child;

      COL1 C        PAR      CHILD
---------- - ---------- ----------
         1 A          1          1

1 row selected.

SQL> 

And if we use a UNION ALL:

SQL> select *
  2  from   t2, t1
  3  where  t2.col1 = t1.par
  4  union all
  5  select *
  6  from   t2, t1
  7  where  t2.col1 = t1.child;

      COL1 C        PAR      CHILD
---------- - ---------- ----------
         1 A          1          1
         1 A          1          1
         1 A          1          1
         1 A          1          1

4 rows selected.

Then we need LNNVL to eliminate the duplicates from the second branch, the lower half of the statement:

SQL> select *
  2  from   t2, t1
  3  where  t2.col1 = t1.par
  4  union all
  5  select *
  6  from   t2, t1
  7  where  t2.col1 = t1.child
  8  and    lnnvl(t1.par=t1.child);

      COL1 C        PAR      CHILD
---------- - ---------- ----------
         1 A          1          1
         1 A          1          1

2 rows selected.

SQL> 

Plan_hash_value and internal temporary table names

Here’s a little thing about plan_hash_2 that’s come via a pointer from Jonathan Lewis to bug 10162430.

From querying DBA_HIST_SQLSTAT I happened to notice that over a period of 65 days a particular query had 63 different plans (PLAN_HASH_VALUE).

I found that the differences were down done to the different internal temporary table names used for a materialized subquery, i.e. the plans were essentially the same but for the name of SYS_TEMP_0FD9D6786_B6EF87D2, SYS_TEMP_0FD9D6825_BE8671F, SYS_TEMP_0FD9D6684_EAD00827, etc.

Exactly what bug 10162430 addresses is unclear because it is listed as fixed in 11.2.0.3 and my different plan hashes come from an 11.2.0.3 database.

But the key information in the bug description highlighted that PLAN_HASH_2 is not affected by these differing temp table names.

Exactly what plan_hash_2 is and how it differs from plan_hash_value is undocumented but:

  • it is found in V$SQL_PLAN.OTHER_XML;
  • also in the AWR repository DBA_HIST_SQL_PLAN.OTHER_XML;
  • and is used as the enforcing plan hash in sql plan baselines – not exposed in DBA_SQL_PLAN_BASELINES but internally represented in column PLAN_ID in SYS.SQLOBJ$, SYS.SQLOBJ$DATA, SYS.SQLOBJ$AUXDATA.

So, if you regularly look at DBA_HIST_SQLSTAT to look at historic execution statistics of a SQL statement and your queries use materialized subqueries then I recommend adding PLAN_HASH_2 to the output, something like this:

select sn.snap_id
,      sn.end_interval_time
,      st.module
,      st.sql_id
,      st.plan_hash_value
,      (select extractvalue(xmltype(other_xml),'other_xml/info[@type="plan_hash_2"]')
        from   dba_hist_sql_plan hp
        where  hp.sql_id          = st.sql_id
        and    hp.plan_hash_value = st.plan_hash_value
        and    hp.other_xml is not null) plan_hash_2
,      rows_processed_delta rws
,      executions_delta     execs
,      elapsed_time_delta   elp
,      cpu_time_delta       cpu
,      buffer_gets_delta    gets
,      iowait_delta         io
from   dba_hist_snapshot sn
,      dba_hist_sqlstat  st
where  st.snap_id            = sn.snap_id
and    st.sql_id             = '&sql_id'
and    st.elapsed_time_delta > 0
order by sn.snap_id desc; 

And just to show this difference between PHV and PH2:

   SNAP_ID END_INTERVAL_TIME         PLAN_HASH_VALUE PLAN_HASH_2
---------- ------------------------- --------------- ------------
      9131 12/07/2012 00:00               1987570133 3399935153
      9083 11/07/2012 00:00               1567016512 3399935153
      9035 10/07/2012 00:00               3386529786 3399935153
      8891 07/07/2012 00:00               2197008736 3399935153
      8842 05/07/2012 23:30                991904690 3399935153
      8794 04/07/2012 23:30               1331394991 3399935153
      8746 03/07/2012 23:30               2577119670 3399935153
      8699 03/07/2012 00:00               3774955286 3399935153
      8698 02/07/2012 23:30               3774955286 3399935153
      8555 30/06/2012 00:00               3308813978 3399935153
      8507 29/06/2012 00:00                796849390 3399935153
      8459 28/06/2012 00:00                917735929 3399935153
      8410 26/06/2012 23:30                139934889 3399935153
      8363 26/06/2012 00:00               1936665492 3399935153
      8219 23/06/2012 00:00                666306815 3399935153
      8171 22/06/2012 00:00               1053584101 3399935153
      8123 21/06/2012 00:00               1353471101 3399935153
...

Flippin’ Witch Hunt – adaptive direct path read

Plan flips… here seems to be general impression that most application performance issues are going to be caused by some sort of “plan flip” and that “plan flips” are bad.

  • If you don’t use plan stability features, should you really expect plan stability?
  • Even outside your stats job window whenever that is?
  • What if you’re using bind variables with histograms?
  • And you’ve got features like dynamic sampling which will probably get a slightly different sample for each hard parse?
  • And if you’ve also got cardinality feedback kicking in?

The old battle between stability and optimisation, I suppose.

I made a throwaway comment on a forum thread about having seen a performance problem caused by a sql statement which changed to doing direct path IO.

I thought it might be vaguely interesting to do a quick blog post and back that up with a tiny bit of “evidence”.

This was getting on for three weeks ago now (29th May) and there were bigger fish to fry at the time so some of the evidence and some of the accuracy of the evidence might wont a little but I love having a long AWR retention and the outputs below are what ASH & AWR can tell us now about what happened back then.

Problem:
A particular sql statment was running slowly.
Initial analysis had identified a change in the plan and that change was caused by cardinality feedback and changes in plans are bad of course ;)

Looking at DBA_HIST_SQLSTAT, I could see that the SQL regularly flip-flopped between two plans, one of which was the plan normally received after a hard parse, the other a change brought on by cardinality feedback.

But there really wasn’t much to choose between the plans in terms of statistics and averages.

In the execution plan, we can also skip specifics, the only relevant detail that there was a FULL TABLE SCAN which was the inner part (i.e. executed multiple times for each row in the driving / outer rowsource) of a NESTED LOOP – something which is rarely going to be a good thing, i.e.

NESTED LOOP
  some row source
  FULL TABLE SCAN

However, if we cross-reference DBA_HIST_ACTIVE_SESS_HISTORY, then we see the critical differences between the executions of this FULL TABLE SCAN on the 28th / 29th May and other days.

If you don’t like averages then I don’t blame you but you won’t like this query:

with ash as
(select trunc(ash.sample_time) snap_day
 ,      min(ash.snap_id) min_snap_id
 ,      max(ash.snap_id) max_snap_id
 ,      ash.current_obj# obj_id
 ,      ash.sql_id
 ,      ash.sql_plan_hash_value
 ,      ash.sql_plan_operation
 ,      ash.sql_plan_options
 ,      ash.current_obj#
 ,      ash.event
 ,      count(distinct ash.sql_exec_id) ash_cnt
 from   dba_hist_active_sess_history ash
 where  ash.sql_id              = 'drbkfkfxg6a6a'
 and    ash.sql_plan_operation  = 'TABLE ACCESS'
 and    ash.sql_plan_options    = 'FULL'
 and    ash.current_obj#        = 169748
 and    ash.sample_time        >= to_date('01/05/2012','DD/MM/YYYY')
 and    ash.event              is not null
 group by
        trunc(ash.sample_time)
 ,      ash.sql_plan_hash_value
 ,      ash.sql_id
 ,      ash.sql_plan_operation
 ,      ash.sql_plan_options
 ,      ash.current_obj#
 ,      ash.event)
select ash.snap_day
,      ash.event
,      ash.ash_cnt
,      (select round(sum(elapsed_time_delta) / sum(executions_delta) /1000/1000)
        from   dba_hist_sqlstat   st
        where  st.sql_id        = ash.sql_id
        and    st.snap_id between ash.min_snap_id
                          and     ash.max_snap_id) avg_elp
,      (select round(sum(rows_processed_delta) / sum(executions_delta))
        from   dba_hist_sqlstat   st
        where  st.sql_id        = ash.sql_id
        and    st.snap_id between ash.min_snap_id
                          and     ash.max_snap_id) avg_rws
,      round((1275068416 -
       (select sum(space_allocated_delta)
        from   dba_hist_seg_stat st
        where  st.obj#          = ash.obj_id
        and    st.snap_id      >= ash.min_snap_id))
      / 1024/1024) "SPACE_USED (ISH)"
,      (select min(bytes)/1024/1024
        from   dba_hist_sgastat st
        where  name             = 'buffer_cache'
        and    st.snap_id between ash.min_snap_id
                          and     ash.max_snap_id) min_bc
from   ash
order by snap_day;

What am I trying to see:

  • Historic running of a particular sql statment
  • Focusing on this FULL TABLE SCAN operation which was a problem on the 28th / 29th
  • Occurences in ASH (be very careful of reading too much into samples of samples and aggregates and averages)
  • What the statement tended to wait on and a feel for how much waiting (beware of sampled samples)
  • And finally a rough approximation of how big the object was relative to the buffer cache (we know what size it is now (which I’ve plugged into the query) so work back through the growth in segment stats. It’s using space allocated because I don’t have access to dbms_space to get space used and space allocated is going to be a high estimate of actual space used – it certainly shouldn’t be more than that)
SNAP_DAY  EVENT                               ASH_CNT    AVG_ELP SPACE_USED (ISH)     MIN_BC
--------- -------------------------------- ---------- ---------- ---------------- -------
01-MAY-12 db file scattered read                    1         19             1152      61696
01-MAY-12 db file sequential read                   1         19             1152      61696
01-MAY-12 db file sequential read                   1         25             1152      61696
02-MAY-12 db file sequential read                   1          8             1152      61696
02-MAY-12 db file scattered read                    1          8             1152      61696
07-MAY-12 db file scattered read                    1          7             1152      61696
08-MAY-12 db file scattered read                    1         24             1152      61696
08-MAY-12 db file sequential read                   1         24             1152      61696
09-MAY-12 db file sequential read                   1        100             1152      61696
09-MAY-12 db file scattered read                    1        100             1152      61696
10-MAY-12 db file sequential read                   1          6             1152      61696
10-MAY-12 db file scattered read                    1          6             1152      61696
11-MAY-12 db file sequential read                   1          7             1152      61696
11-MAY-12 db file scattered read                    1         13             1152      61696
11-MAY-12 db file sequential read                   1         13             1152      61696
15-MAY-12 db file scattered read                    1        148             1152      62208
16-MAY-12 db file scattered read                    1          8             1152      62464
16-MAY-12 db file sequential read                   2          5             1152      62464
17-MAY-12 db file scattered read                    1         17             1152      62208
17-MAY-12 db file sequential read                   1         17             1152      62208
18-MAY-12 db file sequential read                   2         17             1152      62208
18-MAY-12 db file scattered read                    2         17             1152      62208
21-MAY-12 db file scattered read                    1         16             1216      62208
21-MAY-12 db file scattered read                    1          6             1216      62208
22-MAY-12 db file scattered read                    1         19             1216      62208
23-MAY-12 db file scattered read                    3         17             1216      62208
23-MAY-12 read by other session                     1         15             1216      62208
24-MAY-12 db file scattered read                    1         28             1216      61696
25-MAY-12 db file scattered read                    1         67             1216      62208
25-MAY-12 db file sequential read                   1         67             1216      62208
28-MAY-12 direct path read                          3         45             1216      62464
28-MAY-12 direct path read                         96         70             1216      62464
29-MAY-12 direct path read                         10        484             1216      62464
29-MAY-12 direct path read                        137        643             1216      62464
29-MAY-12 enq: KO - fast object checkpoint          1        545             1216      62464

35 rows selected.

SQL> 

The FULL TABLE SCAN as the outer rowsource of a NESTED LOOP is a performance threat anyway but in summary, a performance problem, irrelevant plan stability and a key switch to direct path IO, not obviously caused by a significant growth in object size.

Regexp hungry for CPU? Real time sql monitoring shows just how

Not exactly a snappy blog post title…

First up, an execution plan showing the old problem of how the costs of a scalar subquery are not properly accounted for, even in the latest 11gR2 releases.
Read more of this post

ORA-00600 [kkqtSetOp.1] – Join Factorization

Just a quick note about an ORA-00600 that recently occurred following an upgrade to 11.2.0.3.

I’ve not been able to distill an isolated test case from the specific production code generating this error so I’ll skip the application-specific example.

The error message reported was:
Read more of this post

Timeslot SQL

A reminder about sql logic required to deal with express datetimes to the nearest x mins – for example, this particular question come up on an internal forum last week where the poster wanted to aggregate data per quarter hour.

There are two approaches depending on the exact requirement.

The first option is to generate the timeslots using a data generator and then join to the source data.

SQL>  alter session set nls_date_format = 'DD-MON-YYYY Hh24:MI';

Session altered.

SQL> with times as
  2  (select trunc(sysdate) + ((rownum-1)*15/60/24) slot
  3   from   dual
  4   connect by rownum <= 10)  --<-- just to keep the output short
  5  select * from times;

SLOT
-----------------
30-APR-2012 00:00
30-APR-2012 00:15
30-APR-2012 00:30
30-APR-2012 00:45
30-APR-2012 01:00
30-APR-2012 01:15
30-APR-2012 01:30
30-APR-2012 01:45
30-APR-2012 02:00
30-APR-2012 02:15

10 rows selected.

The second options is to round/floor/ceil the relevant existing date/timestamp in the table to the nearest x minutes.

For this solution, there is a generic approach as follows, demonstrated using sysdate:

SQL> select sysdate,
  2         trunc(sysdate)
  3       + floor(to_char(sysdate,'sssss') / 900) / 96 floor_slot,
  4         trunc(sysdate)
  5       + round(to_char(sysdate,'sssss') / 900) / 96 round_slot,
  6         trunc(sysdate)
  7       + ceil(to_char(sysdate,'sssss') / 900) / 96 ceil_slot
  8  from   dual;

SYSDATE           FLOOR_SLOT        ROUND_SLOT        CEIL_SLOT
----------------- ----------------- ----------------- -----------------
30-APR-2012 09:58 30-APR-2012 09:45 30-APR-2012 10:00 30-APR-2012 10:00

SQL> 

Where the basic forumla is:

 trunc(sysdate)
+ floor(to_char(sysdate,'sssss') / x ) / y 
  1. The date – example uses sysdate but this would probably be a column – truncated to start of day.
  2. The usage of FLOOR / ROUND / CEIL depends on the requirement.
  3. to_char(sysdate,’sssss’) is the number of seconds since midnight
  4. x is the number of seconds in the timeslot length we’re interested in – for the example, 15 minutes * 60 seconds per minute = 900
  5. Because date arithmetic is done in days, y is the number of timeslots in a day – for this example 4 timeslots per hour * 24 hours per day = 96

So, if we want five minute timeslots, x is 5*60=300; y is 12*24=288:

SQL> select sysdate,
  2         trunc(sysdate)
  3       + floor(to_char(sysdate,'sssss') / 300) / 288 floor_stamp,
  4         trunc(sysdate)
  5       + round(to_char(sysdate,'sssss') / 300) / 288 round_stamp,
  6         trunc(sysdate)
  7       + ceil(to_char(sysdate,'sssss') / 300) / 288 ceil_stamp
  8  from   dual;

SYSDATE           FLOOR_STAMP       ROUND_STAMP       CEIL_STAMP
----------------- ----------------- ----------------- -----------------
30-APR-2012 10:13 30-APR-2012 10:10 30-APR-2012 10:15 30-APR-2012 10:15

SQL> 

Or, for half hour timeslots, x is 60*30=1800; y is 2*24=48:

SQL> select sysdate,
  2         trunc(sysdate)
  3       + floor(to_char(sysdate,'sssss') / 1800) / 48 floor_stamp,
  4         trunc(sysdate)
  5       + round(to_char(sysdate,'sssss') / 1800) / 48 round_stamp,
  6         trunc(sysdate)
  7       + ceil(to_char(sysdate,'sssss') / 1800) / 48 ceil_stamp
  8  from   dual;

SYSDATE           FLOOR_STAMP       ROUND_STAMP       CEIL_STAMP
----------------- ----------------- ----------------- -----------------
30-APR-2012 10:19 30-APR-2012 10:00 30-APR-2012 10:30 30-APR-2012 10:30

SQL> 
Follow

Get every new post delivered to your Inbox.

Join 78 other followers