Thoughts on TM_DELTA_TIME

What are these columns in V$ACTIVE_SESSION_HISTORY – TM_DELTA_TIME, TM_DELTA_CPU_TIME and TM_DELTA_DB_TIME?

What does the doco say?

  • TM_DELTA_TIME – Time interval (in microseconds) over which TM_DELTA_CPU_TIME and TM_DELTA_DB_TIME are accumulated
  • TM_DELTA_CPU_TIME – Amount of time this session spent on CPU over the last TM_DELTA_TIME microseconds
  • TM_DELTA_DB_TIME – Amount of time spent by this session in database calls over the last TM_DELTA_TIME microseconds

Need a reminder on TIME MODEL STATISTICS?

The most important of the time model statistics is DB time.

This statistics represents the total time spent in database calls and is an indicator of the total instance workload.

It is calculated by aggregating the CPU and wait times of all sessions not waiting on idle wait events (non-idle user sessions).

i.e. DB TIME is time spent either actively working or actively waiting

What this effectively means is that TM_DELTA_CPU_TIME and TM_DELTA_DB_TIME have no context without TM_DELTA_TIME.

So when looking at TM_DELTA_CPU_TIME and TM_DELTA_DB_TIME, all other sample times/periods are pretty much irrelevant apart from TM_DELTA_TIME – the latter is the only value that gives context to the other two.

TM_DELTA_DB_TIME is time spent actively working or actively waiting, some of this might be time spent on CPU – TM_DELTA_CPU_TIME – the rest will be time spent actively waiting on some non-idle wait events plus some overheads no doubt.

And TM_DELTA_DB_TIME should never exceed TM_DELTA_TIME, right?

The difference between TM_DELTA_TIME and TM_DELTA_DB_TIME should be idle time.

Sounds like something that might be quite useful particularly to counterbalance some of the pitfalls of looking at sampled data?

How might we investigate this further?

Shall we start with a plsql loop that should do nothing but sit on CPU for a short while?

SQL> select * from v$version;

BANNER
-------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production

SQL> select distinct sid from v$mystat;

       SID
----------
       642

SQL> alter session set plsql_optimize_level = 0;

Session altered.

SQL> set timing on
SQL> DECLARE
  2  BEGIN
  3   FOR i IN 1 .. 1000000000 LOOP
  4     NULL;
  5   END LOOP;
  6  END;
  7  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:16.46
SQL> 

What does ASH say?

SQL> select to_char(sample_time,'HH24:MI:SS')
  2  ,      is_awr_sample
  3  ,      event
  4  ,      session_state
  5  ,      to_char(tm_delta_time) tm_delta_time
  6  ,      to_char(tm_delta_cpu_time) tm_delta_cpu_time
  7  ,      to_char(tm_delta_db_time) tm_delta_db_time
  8  from   v$active_session_history
  9  where  session_id = 642
 10  and    sql_id     = '6b5drmtrvs6vv'
 11  order by sample_id desc;

TO_CHAR( I EVENT                     SESSION TM_DELTA_TIME   TM_DELTA_CPU_TIME TM_DELTA_DB_TIME
-------- - ------------------------- ------- --------------- ----------------- ----------------
14:45:25 N                           ON CPU
14:45:24 N                           ON CPU  2019831         2016694           2019831
14:45:23 N                           ON CPU
14:45:22 N                           ON CPU  2019978         2017693           2019978
14:45:21 N                           ON CPU
14:45:20 N                           ON CPU  2019848         2017694           2019848
14:45:19 N                           ON CPU
14:45:18 Y                           ON CPU
14:45:17 N                           ON CPU  2019934         2017693           2019934
14:45:16 N                           ON CPU
14:45:15 N                           ON CPU  2019803         2017693           2019803
14:45:14 N                           ON CPU
14:45:13 N                           ON CPU  2019925         2017693           2019925
14:45:12 N                           ON CPU
14:45:11 N                           ON CPU  4711327         1921708           1926278
14:45:10 N                           ON CPU  40504273        130979            134479

16 rows selected.

SQL> 

So, in this example, roughly every two seconds I get feedback of what my session was doing for that DELTA TIME period.

Unsurprisingly, given what we were doing, the session mainly spent 100% of it’s time in TM_DELTA_DB_TIME actively working, and TM_DELTA_CPU_TIME is the major component of that.

The difference between CPU_TIME – DB_TIME would be the time spent actively waiting plus some overhead.

If did more waiting on a non-idle wait event – IO for example – our DB_TIME would be similar but the element of CPU_TIME would be smaller, e.g.

SQL> DECLARE
  2  BEGIN
  3   FOR i IN (SELECT * FROM all_source) LOOP
  4     NULL;
  5   END LOOP;
  6  END;
  7  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:37.82

which gives

SQL> select to_char(sample_time,'HH24:MI:SS')
  2  ,      is_awr_sample
  3  ,      event
  4  ,      session_state
  5  ,      to_char(tm_delta_time) tm_delta_time
  6  ,      to_char(tm_delta_cpu_time) tm_delta_cpu_time
  7  ,      to_char(tm_delta_db_time) tm_delta_db_time
  8  from   v$active_session_history
  9  where  session_id = 642
 10  and    sql_id     IN ('2nj4dm3qwmkqh','34y8jhfh5w4q9')
 11  order by sample_id desc;

TO_CHAR( I EVENT                     SESSION TM_DELTA_TIME   TM_DELTA_CPU_TIME TM_DELTA_DB_TIME
-------- - ------------------------- ------- --------------- ----------------- ----------------
15:05:43 N                           ON CPU  2019899         1089835           2019899
15:05:42 Y db file scattered read    WAITING
15:05:41 N db file scattered read    WAITING 2020879         1359793           2020879
15:05:40 N                           ON CPU
15:05:39 N db file scattered read    WAITING 2022891         1487774           2022891
15:05:38 N db file scattered read    WAITING
15:05:37 N db file scattered read    WAITING 2021913         1123828           2021913
15:05:36 N                           ON CPU
15:05:35 N db file scattered read    WAITING 2036914         953856            2036914
15:05:34 N                           ON CPU
15:05:33 N db file scattered read    WAITING 1017936         686895            1017936
15:05:32 Y db file scattered read    WAITING 2039867         897864            2039867
15:05:31 N db file scattered read    WAITING
15:05:30 N db file scattered read    WAITING 1928599         1035843           1928599
15:05:29 N                           ON CPU
15:05:28 N                           ON CPU  2069218         1185819           2069218
15:05:27 N                           ON CPU
15:05:26 N                           ON CPU  2092901         840872            2092901
15:05:25 N                           ON CPU
15:05:24 N db file scattered read    WAITING 1998272         1044841           1998272
15:05:23 N db file scattered read    WAITING
15:05:22 Y db file scattered read    WAITING 2052570         1131828           2052570
15:05:21 N                           ON CPU
15:05:20 N db file scattered read    WAITING 2021863         1411786           2021863
15:05:19 N                           ON CPU
15:05:18 N db file scattered read    WAITING 1674137         797879            1674137
15:05:17 N db file scattered read    WAITING
15:05:16 N                           ON CPU  2350338         270958            2350338
15:05:15 N                           ON CPU
15:05:14 N db file scattered read    WAITING 1944057         165975            1944057
15:05:13 N db file scattered read    WAITING
15:05:12 Y db file scattered read    WAITING 2078929         1194819           2078929
15:05:11 N db file scattered read    WAITING
15:05:10 N                           ON CPU  2052015         1266807           2052015
15:05:09 N                           ON CPU
15:05:08 N db file scattered read    WAITING 985873021       915860            1846906
15:05:07 N                           ON CPU
15:05:06 N                           ON CPU  6746807         836873            843056

82 rows selected.

SQL> 

As expected, we see from TM_DELTA_DB_TIME that we are mainly spending 100% of our time actively working or actively waiting.

But, because we’ve had to do some IO, the proportion of that time which is CPU has reduced and the difference between TM_DELTA_DB_TIME and TM_DELTA_CPU_TIME should be roughly the time spent actively waiting during that TM_DELTA_TIME.

But in this output immediately above, do we start to see some of the limitations in this data?

What about this?

TO_CHAR( I EVENT                     SESSION TM_DELTA_TIME   TM_DELTA_CPU_TIME TM_DELTA_DB_TIME
-------- - ------------------------- ------- --------------- ----------------- ----------------
15:05:08 N db file scattered read    WAITING 985873021       915860            1846906

Why the big TM_DELTA_TIME all of a sudden? What’s that all about?

I’ve no idea and if anyone does then please comment.

I would presume that it’s a bug in the accounting.

And if it is a bug, it seriously limits the usefulness of this data for now and the ability to build diagnostic scripts on top of it.

So, moving quickly to how this data is stored down into the repository – DBA_HIST_ACTIVE_SESS_HISTORY.

You probably already know that every 10th second’s worth of the ASH buffer is marked for storage in the repository, as marked by V$ACTIVE_SESSION_HISTORY.IS_AWR_SAMPLE (included in the output above).

From the last buffer output above executing the loop of cpu+io workload, it should be these rows:

TO_CHAR( I EVENT                     SESSION TM_DELTA_TIME   TM_DELTA_CPU_TIME TM_DELTA_DB_TIME
-------- - ------------------------- ------- --------------- ----------------- ----------------
15:05:42 Y db file scattered read    WAITING
15:05:32 Y db file scattered read    WAITING 2039867         897864            2039867
15:05:22 Y db file scattered read    WAITING 2052570         1131828           2052570
15:05:12 Y db file scattered read    WAITING 2078929         1194819           2078929

My initial thoughts were to wonder whether these TM_DELTA figures are persisted “as is”.
In which case, their value would be seriously limited.

Don’t want to wait for the flushing of ASH buffer to the repository?
We can force the flush by creating an AWR snapshot.

SQL> exec dbms_workload_repository.create_snapshot;

PL/SQL procedure successfully completed.

SQL> 

And wham, bam, we can check out what has actually been persisted to DBA_HIST_ACTIVE_SESSION_HISTORY:

SQL> select to_char(sample_time,'HH24:MI:SS')
  2  ,      event
  3  ,      session_state
  4  ,      to_char(tm_delta_time) tm_delta_time
  5  ,      to_char(tm_delta_cpu_time) tm_delta_cpu_time
  6  ,      to_char(tm_delta_db_time) tm_delta_db_time
  7  from   dba_hist_active_sess_history
  8  where  session_id = 642
  9  and    sql_id     IN ('2nj4dm3qwmkqh','34y8jhfh5w4q9')
 10  order by sample_id desc;

TO_CHAR( EVENT                     SESSION TM_DELTA_TIME   TM_DELTA_CPU_TIME TM_DELTA_DB_TIME
-------- ------------------------- ------- --------------- ----------------- ----------------
15:05:42 db file scattered read    WAITING 9120533         5612146           703679604684133
15:05:32 db file scattered read    WAITING 10128857        7202905           281471853489246
15:05:22 db file scattered read    WAITING 10042965        7493861           20148167
15:05:12 db file scattered read    WAITING 1194282071      20199931          31190611

9 rows selected.

SQL> 

So, the good news would be that the values from the buffer are not just copied down but seem to be an aggregation, presumably of the sampled rows. I’m not sure because some of the numbers add up, some don’t.

But the big, bad news is that there are similar accounting bugs – note the values in the first two rows for TM_DELTA_DB_TIME.

It should not be possible for TM_DELTA_DB_TIME to be greater than TM_DELTA_TIME.

As a result, this data seems occasionally unreliable and this unreliability undermines it’s usefulness.

Unless of course someone else has another explanation?

Flip Flop Hooray, Ho

(Title reference for any old skool hip hop fans?)

An investigation into a flip flopping sql statement – the downside of dynamic sampling – why it can be a problem for fast changing tables and frequently executed SQL.

Yesterday morning a sql statement which processes outstanding messages was having performance problems.

There were two child cursors being executed by the four concurrent sessions, depending on their optimizer settings.

I’m not going to detail the optimzer environment differences because it’s mainly irrelvant other than, because of the child cursors, there was a hard parse at time T0 and a hard parse at time T1 and then these cursors were executed repeatedly / regularly over a number of hours.

SQL statistics show this:

SQL> select to_char(sn.end_interval_time,'DD/MM/YYYY HH24:MI') dt
  2  ,      st.plan_hash_value plan_hash
  3  ,      rows_processed_delta rws
  4  ,      executions_delta     execs
  5  ,      elapsed_time_delta   elp
  6  ,      cpu_time_delta       cpu
  7  ,      buffer_gets_delta    gets
  8  ,      iowait_delta         io
  9  from   dba_hist_snapshot sn
 10  ,      dba_hist_sqlstat  st
 11  where  st.snap_id            = sn.snap_id
 12  and    st.sql_id             = '1s1vgw24g2u1k'
 13  and    st.elapsed_time_delta > 0
 14  and    sn.end_interval_time between to_timestamp('18/06/2012 09:00','DD/MM/YYYY HH24:MI')
 15                              and     to_timestamp('18/06/2012 10:00','DD/MM/YYYY HH24:MI')
 16  order by sn.snap_id, st.plan_hash_value; 

DT                PLAN_HASH        RWS      EXECS        ELP        CPU       GETS         IO
---------------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
18/06/2012 09:00 1081606725       1525        182 2163175676  101353586    3938349 2062115102
18/06/2012 09:00 3791039348         48        299     449781     380944      71472      68103
18/06/2012 09:30 1081606725       2032         24 3019992733  146890665   12995474 2886335201
18/06/2012 09:30 3791039348         43        289   73670959   73468837   56740610      24170

SQL> 

Whereas this is representative of “normal”:

DT                PLAN_HASH        RWS      EXECS        ELP        CPU       GETS         IO
---------------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
15/06/2012 09:00 3791039348       1289       1077  174434637   16030542    5470115     280865

SQL> 

I mentioned that this code was processing outstanding messages.
These different processes aren’t balanced and aren’t expected to be.

But plan hash 3791039348 is good for everyone and in general plan hash 1081606725 suits no-one.

Here’s the sanitised SQL – remember this isn’t so much a SQL tuning/rewriting exercise so much as a root cause of slowness exercise (where the root cause is not initially meant to be “because the SQL is not brilliant”)

      SELECT  ....
      FROM    cmsg,
              sint,
              setm,
              comp,
              trga,
		          seti,
		          trad,
              (SELECT COUNT(*) count,
                      setm_il.sett_mesg_tag sett_mesg_tag
               FROM   prco prco_il,
                      cmsg cmsg_il,
                      setm setm_il,
                      seti seti_il
               WHERE  cmsg_il.unacknowledged = prco_il.unack_code
               AND    cmsg_il.evnt_type      = 'SI'
               AND   (cmsg_il.evnt_tag       = seti_il.sett_istr_tag 
               OR     cmsg_il.evnt_tag       = seti_il.amending_si_tag)
               AND    setm_il.sett_istr_tag  = seti_il.sett_istr_tag
               GROUP BY setm_il.sett_mesg_tag) cmsg_si
      WHERE   cmsg.rowid IN (SELECT row_id
                             FROM (SELECT rowid row_id
                                   FROM   cmsg  cmsg2
                                   WHERE  unacknowledged = :p_unack_code
                                   AND    evnt_type     != 'SOD'
                                   ORDER BY seq_num)
                             WHERE rownum <= :p_batch_lim)
      AND     sint.sint_tag         (+) = cmsg.evnt_tag
      AND     sint.stus_ind         (+) = 'C'
      AND     setm.sett_mesg_tag    (+) = sint.sett_mesg_tag
      AND     comp.cmpy_num         (+) = setm.cmpy_num
      AND     trga.trga_code        (+) = comp.trga_code
      AND     seti.sett_istr_tag    (+) = setm.sett_istr_tag
      AND     trad.trad_tag         (+) = seti.trad_tag
      AND     trad.cmpy_num         (+) = seti.cmpy_num
      AND     cmsg_si.sett_mesg_tag (+) = sint.sett_mesg_tag
      ORDER by cmsg.seq_num;

Here is the relevant sections of the two execution plans:

The good:

SQL_ID 1s1vgw24g2u1k
 
Plan hash value: 3791039348
 
------------------------------------------------------------------------------------------------
| Id  | Operation                                         | Name       | Rows  | Cost  | Time  |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                                  |            |       |  9192 |       |
|   1 |  FOR UPDATE                                       |            |       |       |       |
|   2 |   SORT ORDER BY                                   |            |     1 |  9192 | 00:46 |
|   3 |    NESTED LOOPS OUTER                             |            |     1 |  9191 | 00:46 |
|   4 |     NESTED LOOPS OUTER                            |            |     1 |  9190 | 00:46 |
|   5 |      NESTED LOOPS OUTER                           |            |     1 |  9188 | 00:46 |
|   6 |       NESTED LOOPS OUTER                          |            |     1 |  9187 | 00:46 |
|   7 |        NESTED LOOPS OUTER                         |            |     1 |  9186 | 00:46 |
|   8 |         HASH JOIN OUTER                           |            |     1 |  9184 | 00:46 |
|   9 |          NESTED LOOPS OUTER                       |            |     1 |    10 | 00:01 |
|  10 |           NESTED LOOPS                            |            |     1 |     7 | 00:01 |
|  11 |            VIEW                                   | VW_NSO_1   |     3 |     5 | 00:01 |
|  12 |             SORT UNIQUE                           |            |     1 |       |       |
|  13 |              COUNT STOPKEY                        |            |       |       |       |
|  14 |               VIEW                                |            |     3 |     5 | 00:01 |
|  15 |                SORT ORDER BY STOPKEY              |            |     3 |     5 | 00:01 |
|  16 |                 TABLE ACCESS BY INDEX ROWID       | CMSG       |     3 |     4 | 00:01 |
|  17 |                  INDEX RANGE SCAN                 | CMSG3_IDX  |     4 |     3 | 00:01 |
|  18 |            TABLE ACCESS BY USER ROWID             | CMSG       |     1 |     1 | 00:01 |
|  19 |           TABLE ACCESS BY INDEX ROWID             | SETI       |     1 |     3 | 00:01 |
|  20 |            INDEX RANGE SCAN                       | SETI_1     |     1 |     2 | 00:01 |
|  21 |          VIEW                                     |            | 14490 |  9173 | 00:46 |
|  22 |           SORT GROUP BY                           |            | 14490 |       |       |
|  23 |            CONCATENATION                          |            |       |       |       |
|  24 |             NESTED LOOPS                          |            |       |       |       |
|  25 |              NESTED LOOPS                         |            |   562 |  8472 | 00:43 |
|  26 |               NESTED LOOPS                        |            |   428 |  6757 | 00:34 |
|  27 |                NESTED LOOPS                       |            |   304 |    30 | 00:01 |
|  28 |                 TABLE ACCESS BY INDEX ROWID       | CMSG       |   304 |    30 | 00:01 |
|  29 |                  INDEX FULL SCAN                  | CMSG4_IDX  |     1 |    29 | 00:01 |
|  30 |                 INDEX UNIQUE SCAN                 | PRCO_2_IDX |     1 |     0 |       |
|  31 |                PARTITION LIST ALL                 |            |     1 |    24 | 00:01 |
|  32 |                 TABLE ACCESS BY LOCAL INDEX ROWID | SETI       |     1 |    24 | 00:01 |
|  33 |                  INDEX RANGE SCAN                 | SETI_3_IDX |    36 |    22 | 00:01 |
|  34 |               INDEX RANGE SCAN                    | SETM_2_IDX |     1 |     2 | 00:01 |
|  35 |              TABLE ACCESS BY INDEX ROWID          | SETM       |     1 |     4 | 00:01 |
|  36 |             NESTED LOOPS                          |            |       |       |       |
|  37 |              NESTED LOOPS                         |            |    20 |   700 | 00:04 |
|  38 |               NESTED LOOPS                        |            |    15 |   640 | 00:04 |
|  39 |                NESTED LOOPS                       |            |    15 |   640 | 00:04 |
|  40 |                 TABLE ACCESS BY INDEX ROWID       | CMSG       |   304 |    30 | 00:01 |
|  41 |                  INDEX FULL SCAN                  | CMSG4_IDX  |     1 |    29 | 00:01 |
|  42 |                 TABLE ACCESS BY GLOBAL INDEX ROWID| SETI       |     1 |     2 | 00:01 |
|  43 |                  INDEX UNIQUE SCAN                | SETI_IDX   |     1 |     1 | 00:01 |
|  44 |                INDEX UNIQUE SCAN                  | PRCO_2_IDX |     1 |     0 |       |
|  45 |               INDEX RANGE SCAN                    | SETM_2_IDX |     1 |     2 | 00:01 |
|  46 |              TABLE ACCESS BY INDEX ROWID          | SETM       |     1 |     4 | 00:01 |
|  47 |         TABLE ACCESS BY INDEX ROWID               | SETM       |     1 |     2 | 00:01 |
|  48 |          INDEX UNIQUE SCAN                        | SETM_IDX   |     1 |     1 | 00:01 |
|  49 |        TABLE ACCESS BY INDEX ROWID                | COMP       |     1 |     1 | 00:01 |
|  50 |         INDEX UNIQUE SCAN                         | COMP_PK    |     1 |     0 |       |
|  51 |       TABLE ACCESS BY INDEX ROWID                 | TRGA       |     1 |     1 | 00:01 |
|  52 |        INDEX UNIQUE SCAN                          | TRGA_IDX   |     1 |     0 |       |
|  53 |      TABLE ACCESS BY GLOBAL INDEX ROWID           | SETI       |     1 |     2 | 00:01 |
|  54 |       INDEX UNIQUE SCAN                           | SETI_IDX   |     1 |     1 | 00:01 |
|  55 |     PARTITION LIST ITERATOR                       |            |     1 |     1 | 00:01 |
|  56 |      INDEX UNIQUE SCAN                            | TRAD_IDX   |     1 |     1 | 00:01 |
------------------------------------------------------------------------------------------------

And the downright bad and ugly:

SQL_ID 1s1vgw24g2u1k
 
Plan hash value: 1081606725
 
------------------------------------------------------------------------------------------------
| Id  | Operation                                         | Name       | Rows  | Cost  | Time  |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                                  |            |       |   436K|       |
|   1 |  FOR UPDATE                                       |            |       |       |       |
|   2 |   SORT ORDER BY                                   |            |     1 |   436K| 36:24 |
|   3 |    NESTED LOOPS OUTER                             |            |     1 |   436K| 36:24 |
|   4 |     NESTED LOOPS OUTER                            |            |     1 |   436K| 36:24 |
|   5 |      NESTED LOOPS OUTER                           |            |     1 |   436K| 36:23 |
|   6 |       NESTED LOOPS OUTER                          |            |     1 |   436K| 36:23 |
|   7 |        NESTED LOOPS OUTER                         |            |     1 |   436K| 36:23 |
|   8 |         HASH JOIN OUTER                           |            |     1 |   436K| 36:23 |
|   9 |          NESTED LOOPS OUTER                       |            |     1 |    10 | 00:01 |
|  10 |           NESTED LOOPS                            |            |     1 |     7 | 00:01 |
|  11 |            VIEW                                   | VW_NSO_1   |     2 |     5 | 00:01 |
|  12 |             SORT UNIQUE                           |            |     1 |       |       |
|  13 |              COUNT STOPKEY                        |            |       |       |       |
|  14 |               VIEW                                |            |     2 |     5 | 00:01 |
|  15 |                SORT ORDER BY STOPKEY              |            |     2 |     5 | 00:01 |
|  16 |                 TABLE ACCESS BY INDEX ROWID       | CMSG       |     2 |     4 | 00:01 |
|  17 |                  INDEX RANGE SCAN                 | CMSG3_IDX  |     2 |     3 | 00:01 |
|  18 |            TABLE ACCESS BY USER ROWID             | CMSG       |     1 |     1 | 00:01 |
|  19 |           TABLE ACCESS BY INDEX ROWID             | SETI       |     1 |     3 | 00:01 |
|  20 |            INDEX RANGE SCAN                       | SETI_1     |     1 |     2 | 00:01 |
|  21 |          VIEW                                     |            |   719K|   436K| 36:23 |
|  22 |           SORT GROUP BY                           |            |   719K|       |       |
|  23 |            CONCATENATION                          |            |       |       |       |
|  24 |             NESTED LOOPS                          |            |       |       |       |
|  25 |              NESTED LOOPS                         |            | 27212 |   403K| 33:40 |
|  26 |               HASH JOIN                           |            | 20520 |   301K| 25:06 |
|  27 |                HASH JOIN                          |            | 14473 |    35 | 00:01 |
|  28 |                 INDEX FULL SCAN                   | PRCO_2_IDX |   502 |     1 | 00:01 |
|  29 |                 TABLE ACCESS BY INDEX ROWID       | CMSG       | 14473 |    33 | 00:01 |
|  30 |                  INDEX FULL SCAN                  | CMSG4_IDX  |     1 |    32 | 00:01 |
|  31 |                PARTITION LIST ALL                 |            |    30M|   300K| 25:02 |
|  32 |                 TABLE ACCESS FULL                 | SETI       |    30M|   300K| 25:02 |
|  33 |               INDEX RANGE SCAN                    | SETM_2_IDX |     1 |     3 | 00:01 |
|  34 |              TABLE ACCESS BY INDEX ROWID          | SETM       |     1 |     5 | 00:01 |
|  35 |             NESTED LOOPS                          |            |       |       |       |
|  36 |              NESTED LOOPS                         |            |   960 | 32673 | 02:44 |
|  37 |               NESTED LOOPS                        |            |   724 | 29046 | 02:26 |
|  38 |                NESTED LOOPS                       |            |   724 | 29045 | 02:26 |
|  39 |                 TABLE ACCESS BY INDEX ROWID       | CMSG       | 14473 |    33 | 00:01 |
|  40 |                  INDEX FULL SCAN                  | CMSG4_IDX  |     1 |    32 | 00:01 |
|  41 |                 TABLE ACCESS BY GLOBAL INDEX ROWID| SETI       |     1 |     2 | 00:01 |
|  42 |                  INDEX UNIQUE SCAN                | SETI_IDX   |     1 |     1 | 00:01 |
|  43 |                INDEX UNIQUE SCAN                  | PRCO_2_IDX |     1 |     0 |       |
|  44 |               INDEX RANGE SCAN                    | SETM_2_IDX |     1 |     3 | 00:01 |
|  45 |              TABLE ACCESS BY INDEX ROWID          | SETM       |     1 |     5 | 00:01 |
|  46 |         TABLE ACCESS BY INDEX ROWID               | SETM       |     1 |     2 | 00:01 |
|  47 |          INDEX UNIQUE SCAN                        | SETM_IDX   |     1 |     1 | 00:01 |
|  48 |        TABLE ACCESS BY INDEX ROWID                | COMP       |     1 |     2 | 00:01 |
|  49 |         INDEX UNIQUE SCAN                         | COMP_PK    |     1 |     1 | 00:01 |
|  50 |       TABLE ACCESS BY INDEX ROWID                 | TRGA       |     1 |     1 | 00:01 |
|  51 |        INDEX UNIQUE SCAN                          | TRGA_IDX   |     1 |     0 |       |
|  52 |      TABLE ACCESS BY GLOBAL INDEX ROWID           | SETI       |     1 |     2 | 00:01 |
|  53 |       INDEX UNIQUE SCAN                           | SETI_IDX   |     1 |     1 | 00:01 |
|  54 |     PARTITION LIST ITERATOR                       |            |     1 |     1 | 00:01 |
|  55 |      INDEX UNIQUE SCAN                            | TRAD_IDX   |     1 |     1 | 00:01 |
------------------------------------------------------------------------------------------------
 
Note
-----
   - dynamic sampling used for this statement (level=4)
 

The trouble with specific examples is that sometimes they’re a bit longer than ideal (as well as being a bit out of context).

Let me highlight what I see as the problem:

  • There is a non-merged view with an OR (which is expanded to a CONCATENATION)
  • In the good plan, @ line 33, one part of the OR condition is dealt with via an INDEX RANGE SCAN on SETI_3_IDX.
  • In the bad plan, @ line 32, this is a TABLE ACCESS FULL on SETI.
  • There’s a dynamic sampling note in the bad plan

Good:

|  21 | VIEW                                    |            | 14490 |  9173   (1)| 00:00:46 |
...
|  26 |     NESTED LOOPS                        |            |   428 |  6757   (1)| 00:00:34 |
|  27 |      NESTED LOOPS                       |            |   304 |    30   (0)| 00:00:01 |
|  28 |       TABLE ACCESS BY INDEX ROWID       | CMSG       |   304 |    30   (0)| 00:00:01 |
|  29 |        INDEX FULL SCAN                  | CMSG4_IDX  |     1 |    29   (0)| 00:00:01 |
|  30 |       INDEX UNIQUE SCAN                 | PRCO_2_IDX |     1 |     0   (0)|          |
|  31 |      PARTITION LIST ALL                 |            |     1 |    24   (0)| 00:00:01 |
|  32 |       TABLE ACCESS BY LOCAL INDEX ROWID | SETI       |     1 |    24   (0)| 00:00:01 |
|  33 |        INDEX RANGE SCAN                 | SETI_3_IDX |    36 |    22   (0)| 00:00:01 |
...

Bad:

|  21 | VIEW                                    |            |   719K|   436K  (3)| 00:36:23 |
...
|  26 |     HASH JOIN                           |            | 20520 |   301K  (4)| 00:25:06 |
|  27 |      HASH JOIN                          |            | 14473 |    35   (3)| 00:00:01 |
|  28 |       INDEX FULL SCAN                   | PRCO_2_IDX |   502 |     1   (0)| 00:00:01 |
|  29 |       TABLE ACCESS BY INDEX ROWID       | CMSG       | 14473 |    33   (0)| 00:00:01 |
|  30 |        INDEX FULL SCAN                  | CMSG4_IDX  |     1 |    32   (0)| 00:00:01 |
|  31 |      PARTITION LIST ALL                 |            |    30M|   300K  (4)| 00:25:02 |
|  32 |       TABLE ACCESS FULL                 | SETI       |    30M|   300K  (4)| 00:25:02 |
...
Note
-----
   - dynamic sampling used for this statement (level=4)

It’s always nice to simplify a situation and because the problem seems to be in this non-merged view, we can actually eliminate half the sql and then see if we can just reproduce the problem by adjusting the estimates.

Which we can, and via trial and error, we see that as we approach an estimate of 14000 rows from CMSG, the plan changes to the FULL TABLE SCAN of SETI:

SQL> explain plan for
  2                 SELECT /*+
  3                          opt_estimate(table cmsg_il rows=14000)
  4                          */
  5                        COUNT(*) count,
  6                        setm_il.sett_mesg_tag sett_mesg_tag
  7                 FROM   prco    prco_il,
  8                        cmsg    cmsg_il,
  9                        setm    setm_il,
 10                        seti    seti_il
 11                 WHERE  cmsg_il.unacknowledged = prco_il.unack_code
 12                 AND    cmsg_il.evnt_type = 'SI'
 13                 AND   (cmsg_il.evnt_tag = seti_il.sett_istr_tag OR
 14                        cmsg_il.evnt_tag = seti_il.amending_si_tag)
 15                 AND    setm_il.sett_istr_tag = seti_il.sett_istr_tag
 16                 GROUP BY setm_il.sett_mesg_tag;

Explained.

SQL> select * from table(dbms_xplan.display);
Plan hash value: 273475595

-----------------------------------------------------------------------------------------------
| Id  | Operation                                | Name       | Rows  | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |            |   695K|   432K  (3)| 00:36:01 |
|   1 |  HASH GROUP BY                           |            |   695K|            |          |
|   2 |   CONCATENATION                          |            |       |            |          |
|   3 |    NESTED LOOPS                          |            |       |            |          |
|   4 |     NESTED LOOPS                         |            | 26323 |   400K  (3)| 00:33:23 |
|*  5 |      HASH JOIN                           |            | 19850 |   301K  (4)| 00:25:06 |
|*  6 |       HASH JOIN                          |            | 14000 |    35   (3)| 00:00:01 |
|   7 |        INDEX FULL SCAN                   | PRCO_2_IDX |   502 |     1   (0)| 00:00:01 |
|*  8 |        TABLE ACCESS BY INDEX ROWID       | CMSG       | 14000 |    33   (0)| 00:00:01 |
|*  9 |         INDEX FULL SCAN                  | CMSG4_IDX  |     1 |    32   (0)| 00:00:01 |
|  10 |       PARTITION LIST ALL                 |            |    30M|   300K  (4)| 00:25:02 |
|  11 |        TABLE ACCESS FULL                 | SETI       |    30M|   300K  (4)| 00:25:02 |
|* 12 |      INDEX RANGE SCAN                    | SETM_2_IDX |     1 |     3   (0)| 00:00:01 |
|  13 |     TABLE ACCESS BY INDEX ROWID          | SETM       |     1 |     5   (0)| 00:00:01 |
|  14 |    NESTED LOOPS                          |            |       |            |          |
|  15 |     NESTED LOOPS                         |            |   928 | 31605   (1)| 00:02:39 |
|  16 |      NESTED LOOPS                        |            |   700 | 28097   (1)| 00:02:21 |
|  17 |       NESTED LOOPS                       |            |   700 | 28097   (1)| 00:02:21 |
|* 18 |        TABLE ACCESS BY INDEX ROWID       | CMSG       | 14000 |    33   (0)| 00:00:01 |
|* 19 |         INDEX FULL SCAN                  | CMSG4_IDX  |     1 |    32   (0)| 00:00:01 |
|* 20 |        TABLE ACCESS BY GLOBAL INDEX ROWID| SETI       |     1 |     2   (0)| 00:00:01 |
|* 21 |         INDEX UNIQUE SCAN                | SETI_IDX   |     1 |     1   (0)| 00:00:01 |
|* 22 |       INDEX UNIQUE SCAN                  | PRCO_2_IDX |     1 |     0   (0)| 00:00:01 |
|* 23 |      INDEX RANGE SCAN                    | SETM_2_IDX |     1 |     3   (0)| 00:00:01 |
|  24 |     TABLE ACCESS BY INDEX ROWID          | SETM       |     1 |     5   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("CMSG_IL"."EVNT_TAG"="SETI_IL"."AMENDING_SI_TAG")
   6 - access("CMSG_IL"."UNACKNOWLEDGED"="PRCO_IL"."UNACK_CODE")
   8 - filter("CMSG_IL"."EVNT_TYPE"='SI')
   9 - filter("CMSG_IL"."UNACKNOWLEDGED" IS NOT NULL)
  12 - access("SETM_IL"."SETT_ISTR_TAG"="SETI_IL"."SETT_ISTR_TAG")
  18 - filter("CMSG_IL"."EVNT_TYPE"='SI')
  19 - filter("CMSG_IL"."UNACKNOWLEDGED" IS NOT NULL)
  20 - filter(LNNVL("CMSG_IL"."EVNT_TAG"="SETI_IL"."AMENDING_SI_TAG"))
  21 - access("CMSG_IL"."EVNT_TAG"="SETI_IL"."SETT_ISTR_TAG")
  22 - access("CMSG_IL"."UNACKNOWLEDGED"="PRCO_IL"."UNACK_CODE")
  23 - access("SETM_IL"."SETT_ISTR_TAG"="SETI_IL"."SETT_ISTR_TAG")

46 rows selected.

SQL> 

So, optimizer environment differences and bind variable peeking are pretty much ruled out as influencing factors.

What remains is that the bad plan has a higher estimate of rows than the good plan and the bad plan has dynamic sampling.

Both SQL statements are the same.

Dynamic sampling level is the same for both (4 set at system level).

Dynamic sampling should be being applied because of the parameter setting and because there are two single table access predicates
- one from the predicate on CMSG_IL.EVENT_TYPE and one implicit IS NOT NULL predicate on the join column UNACKNOWLEDGED.

So why would we see DS in one plan and not anohther?

If we do a 10053 trace of the distilled non-mergeable view, we see:

Access path analysis for CMSG
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for CMSG[CMSG_IL]

*** 2012-06-19 14:32:04.576
** Performing dynamic sampling initial checks. **
  Column (#4):
    NewDensity:0.000062, OldDensity:0.000000 BktCnt:8090, PopBktCnt:8084, PopValCnt:46, NDV:80
  Column (#4): EVNT_TYPE(
    AvgLen: 4 NDV: 80 Nulls: 0 Density: 0.000062
    Histogram: Freq  #Bkts: 52  UncompBkts: 8090  EndPtVals: 52
** Dynamic sampling initial checks returning TRUE (level = 4).

*** 2012-06-19 14:32:04.576
** Generated dynamic sampling query:
    query text :
SELECT /* OPT_DYN_SAMP */ 
       /*+ 
         ALL_ROWS 
         IGNORE_WHERE_CLAUSE 
         NO_PARALLEL(SAMPLESUB) 
         opt_param('parallel_execution_enabled', 'false') 
         NO_PARALLEL_INDEX(SAMPLESUB) 
         NO_SQL_TUNE 
         */ 
       NVL(SUM(C1),0), 
       NVL(SUM(C2),0) 
FROM (SELECT /*+ 
               IGNORE_WHERE_CLAUSE NO_PARALLEL("CMSG_IL") 
               FULL("CMSG_IL") 
               NO_PARALLEL_INDEX("CMSG_IL") 
               */ 
             1 AS C1, 
             CASE WHEN "CMSG_IL"."EVNT_TYPE"=:B1 
                  AND  "CMSG_IL"."UNACKNOWLEDGED" IS NOT NULL 
                  THEN 1 
                  ELSE 0 
             END AS C2 
FROM "CMSG" SAMPLE BLOCK (0.002711 , 1) SEED (1) "CMSG_IL") SAMPLESUB

*** 2012-06-19 14:32:04.612
** Executed dynamic sampling query:
    level : 4
    sample pct. : 0.002711
    actual sample size : 1756
    filtered sample card. : 0
    orig. card. : 75899119
    block cnt. table stat. : 1118909
    block cnt. for sampling: 1143461
    max. sample block cnt. : 32
    sample block cnt. : 31
    min. sel. est. : 0.00000001
** Not using dynamic sampling for single table sel. or cardinality.

So, the results of dynamic sampling are discarded as unreliable and the estimates fall back on using the stored object stats.

But at some point (one of the hard parses) the data returned in the sample must have found some relevant data and been considered reliable which led to the “bad” plan being executed repeatedly in inappropriate circumstances.

Two footnotes to this investigation.

Firstly, when you have tables with fast-changing data and row counts, there are usually two choices in approach for statistics
- either find a set of stats that gives the best (balance of) plans and lock them.
- or use dynamic sampling

The scenario above shows the downside of the dynamic sampling approach.
It may well be that the plan hash 1081606725 was good for the initial execution(s) of the statement when there maybe were a large number of outstanding messages but it very quickly became unsuitable given the frequent executions.

Secondly, I mentioned that this was not a SQL rewriting exercise, but, if it was, I’m not a great fan of the outer-joined inline view. It’s not merged but it seems sensible that a predicate be pushed into view.
Pushing the predicate, whether happening by default or hinted explicitly, would give a subsection of the plan like:

|  20 | VIEW PUSHED PREDICATE                  |            |     1 |    38   (0)| 00:00:01 |
|  21 |  SORT GROUP BY                         |            |     1 |    38   (0)| 00:00:01 |
|  22 |   NESTED LOOPS                         |            |     1 |    38   (0)| 00:00:01 |
|  23 |    NESTED LOOPS                        |            |     1 |    38   (0)| 00:00:01 |
|  24 |     NESTED LOOPS                       |            |     1 |     5   (0)| 00:00:01 |
|  25 |      TABLE ACCESS BY INDEX ROWID       | SETM       |     1 |     3   (0)| 00:00:01 |
|* 26 |       INDEX UNIQUE SCAN                | SETM_IDX   |     1 |     2   (0)| 00:00:01 |
|  27 |      TABLE ACCESS BY GLOBAL INDEX ROWID| SETI       |    30M|     2   (0)| 00:00:01 |
|* 28 |       INDEX UNIQUE SCAN                | SETI_IDX   |     1 |     1   (0)| 00:00:01 |
|* 29 |     TABLE ACCESS BY INDEX ROWID        | CMSG       |     1 |    33   (0)| 00:00:01 |
|* 30 |      INDEX FULL SCAN                   | CMSG_IDX   |     1 |    32   (0)| 00:00:01 |
|* 31 |    INDEX UNIQUE SCAN                   | PRCO_2_IDX |     1 |     0   (0)| 00:00:01 |

However, I’m not keen on explicitly hinting this nor leaving it to chance and as it’s outer joined, I would strongly consider using a scalar subselect:

      WITH cmsg AS
           (SELECT *
            FROM   (SELECT cmsg.evnt_tag,
                           cmsg.evnt_type,
                           cmsg.seq_num
                    FROM   cmsg
                    WHERE  unacknowledged = :p_unack_code
                    AND    evnt_type != 'SOD'
                    ORDER BY seq_num)
            WHERE rownum <= :p_batch_lim)
      SELECT  ...,
              (SELECT COUNT(*)
               FROM   proc    prco_il,
                      cmsg    cmsg_il,
                      setm    setm_il,
                      seti    seti_il
               WHERE  cmsg_il.unacknowledged = prco_il.unack_code
               AND    cmsg_il.evnt_type      = 'SI'
               AND   (cmsg_il.evnt_tag       = seti_il.sett_istr_tag
               OR     cmsg_il.evnt_tag       = seti_il.amending_si_tag)
               AND    setm_il.sett_istr_tag  = seti_il.sett_istr_tag
               AND    setm_il.sett_mesg_tag  = sint.sett_mesg_tag) count
      FROM    cmsg,
              sint,
              setm,
              comp,
              trga,
              seti,
              trad
      WHERE   sint.sint_tag      (+) = cmsg.evnt_tag
      AND     sint.stus_ind      (+) = 'C'
      AND     setm.sett_mesg_tag (+) = sint.sett_mesg_tag
      AND     comp.cmpy_num      (+) = setm.cmpy_num
      AND     trga.trga_code     (+) = comp.trga_code
      AND     seti.sett_istr_tag (+) = setm.sett_istr_tag
      AND     trad.trad_tag      (+) = seti.trad_tag
      AND     trad.cmpy_num      (+) = seti.cmpy_num
      ORDER by cmsg.seq_num;

Flippin’ Witch Hunt – adaptive direct path read

Plan flips… here seems to be general impression that most application performance issues are going to be caused by some sort of “plan flip” and that “plan flips” are bad.

  • If you don’t use plan stability features, should you really expect plan stability?
  • Even outside your stats job window whenever that is?
  • What if you’re using bind variables with histograms?
  • And you’ve got features like dynamic sampling which will probably get a slightly different sample for each hard parse?
  • And if you’ve also got cardinality feedback kicking in?

The old battle between stability and optimisation, I suppose.

I made a throwaway comment on a forum thread about having seen a performance problem caused by a sql statement which changed to doing direct path IO.

I thought it might be vaguely interesting to do a quick blog post and back that up with a tiny bit of “evidence”.

This was getting on for three weeks ago now (29th May) and there were bigger fish to fry at the time so some of the evidence and some of the accuracy of the evidence might wont a little but I love having a long AWR retention and the outputs below are what ASH & AWR can tell us now about what happened back then.

Problem:
A particular sql statment was running slowly.
Initial analysis had identified a change in the plan and that change was caused by cardinality feedback and changes in plans are bad of course ;)

Looking at DBA_HIST_SQLSTAT, I could see that the SQL regularly flip-flopped between two plans, one of which was the plan normally received after a hard parse, the other a change brought on by cardinality feedback.

But there really wasn’t much to choose between the plans in terms of statistics and averages.

In the execution plan, we can also skip specifics, the only relevant detail that there was a FULL TABLE SCAN which was the inner part (i.e. executed multiple times for each row in the driving / outer rowsource) of a NESTED LOOP – something which is rarely going to be a good thing, i.e.

NESTED LOOP
  some row source
  FULL TABLE SCAN

However, if we cross-reference DBA_HIST_ACTIVE_SESS_HISTORY, then we see the critical differences between the executions of this FULL TABLE SCAN on the 28th / 29th May and other days.

If you don’t like averages then I don’t blame you but you won’t like this query:

with ash as
(select trunc(ash.sample_time) snap_day
 ,      min(ash.snap_id) min_snap_id
 ,      max(ash.snap_id) max_snap_id
 ,      ash.current_obj# obj_id
 ,      ash.sql_id
 ,      ash.sql_plan_hash_value
 ,      ash.sql_plan_operation
 ,      ash.sql_plan_options
 ,      ash.current_obj#
 ,      ash.event
 ,      count(distinct ash.sql_exec_id) ash_cnt
 from   dba_hist_active_sess_history ash
 where  ash.sql_id              = 'drbkfkfxg6a6a'
 and    ash.sql_plan_operation  = 'TABLE ACCESS'
 and    ash.sql_plan_options    = 'FULL'
 and    ash.current_obj#        = 169748
 and    ash.sample_time        >= to_date('01/05/2012','DD/MM/YYYY')
 and    ash.event              is not null
 group by
        trunc(ash.sample_time)
 ,      ash.sql_plan_hash_value
 ,      ash.sql_id
 ,      ash.sql_plan_operation
 ,      ash.sql_plan_options
 ,      ash.current_obj#
 ,      ash.event)
select ash.snap_day
,      ash.event
,      ash.ash_cnt
,      (select round(sum(elapsed_time_delta) / sum(executions_delta) /1000/1000)
        from   dba_hist_sqlstat   st
        where  st.sql_id        = ash.sql_id
        and    st.snap_id between ash.min_snap_id
                          and     ash.max_snap_id) avg_elp
,      (select round(sum(rows_processed_delta) / sum(executions_delta))
        from   dba_hist_sqlstat   st
        where  st.sql_id        = ash.sql_id
        and    st.snap_id between ash.min_snap_id
                          and     ash.max_snap_id) avg_rws
,      round((1275068416 -
       (select sum(space_allocated_delta)
        from   dba_hist_seg_stat st
        where  st.obj#          = ash.obj_id
        and    st.snap_id      >= ash.min_snap_id))
      / 1024/1024) "SPACE_USED (ISH)"
,      (select min(bytes)/1024/1024
        from   dba_hist_sgastat st
        where  name             = 'buffer_cache'
        and    st.snap_id between ash.min_snap_id
                          and     ash.max_snap_id) min_bc
from   ash
order by snap_day;

What am I trying to see:

  • Historic running of a particular sql statment
  • Focusing on this FULL TABLE SCAN operation which was a problem on the 28th / 29th
  • Occurences in ASH (be very careful of reading too much into samples of samples and aggregates and averages)
  • What the statement tended to wait on and a feel for how much waiting (beware of sampled samples)
  • And finally a rough approximation of how big the object was relative to the buffer cache (we know what size it is now (which I’ve plugged into the query) so work back through the growth in segment stats. It’s using space allocated because I don’t have access to dbms_space to get space used and space allocated is going to be a high estimate of actual space used – it certainly shouldn’t be more than that)
SNAP_DAY  EVENT                               ASH_CNT    AVG_ELP SPACE_USED (ISH)     MIN_BC
--------- -------------------------------- ---------- ---------- ---------------- -------
01-MAY-12 db file scattered read                    1         19             1152      61696
01-MAY-12 db file sequential read                   1         19             1152      61696
01-MAY-12 db file sequential read                   1         25             1152      61696
02-MAY-12 db file sequential read                   1          8             1152      61696
02-MAY-12 db file scattered read                    1          8             1152      61696
07-MAY-12 db file scattered read                    1          7             1152      61696
08-MAY-12 db file scattered read                    1         24             1152      61696
08-MAY-12 db file sequential read                   1         24             1152      61696
09-MAY-12 db file sequential read                   1        100             1152      61696
09-MAY-12 db file scattered read                    1        100             1152      61696
10-MAY-12 db file sequential read                   1          6             1152      61696
10-MAY-12 db file scattered read                    1          6             1152      61696
11-MAY-12 db file sequential read                   1          7             1152      61696
11-MAY-12 db file scattered read                    1         13             1152      61696
11-MAY-12 db file sequential read                   1         13             1152      61696
15-MAY-12 db file scattered read                    1        148             1152      62208
16-MAY-12 db file scattered read                    1          8             1152      62464
16-MAY-12 db file sequential read                   2          5             1152      62464
17-MAY-12 db file scattered read                    1         17             1152      62208
17-MAY-12 db file sequential read                   1         17             1152      62208
18-MAY-12 db file sequential read                   2         17             1152      62208
18-MAY-12 db file scattered read                    2         17             1152      62208
21-MAY-12 db file scattered read                    1         16             1216      62208
21-MAY-12 db file scattered read                    1          6             1216      62208
22-MAY-12 db file scattered read                    1         19             1216      62208
23-MAY-12 db file scattered read                    3         17             1216      62208
23-MAY-12 read by other session                     1         15             1216      62208
24-MAY-12 db file scattered read                    1         28             1216      61696
25-MAY-12 db file scattered read                    1         67             1216      62208
25-MAY-12 db file sequential read                   1         67             1216      62208
28-MAY-12 direct path read                          3         45             1216      62464
28-MAY-12 direct path read                         96         70             1216      62464
29-MAY-12 direct path read                         10        484             1216      62464
29-MAY-12 direct path read                        137        643             1216      62464
29-MAY-12 enq: KO - fast object checkpoint          1        545             1216      62464

35 rows selected.

SQL> 

The FULL TABLE SCAN as the outer rowsource of a NESTED LOOP is a performance threat anyway but in summary, a performance problem, irrelevant plan stability and a key switch to direct path IO, not obviously caused by a significant growth in object size.

Copy_table_stats

An observation on DBMS_STATS.COPY_TABLE_STATS in 11.2.0.3

Prior to 10.2.0.5, 11.1.0.7 (I think) and 11.2, COPY_TABLE_STATS copied the high/low value of the partition key from the source to the destination rather than use the implicit boundaries of that partition. Doug mentions it in his series on statistics on partitioned objects.

Looking at it in 11.2.0.3, this issue is still addressed in 11.2.0.3 but there’s a “but” if we copy at the SUBPARTITION level:

Setup:

SQL> create table t1
  2  (dt    DATE
  3  ,col1  VARCHAR2(10)
  4  ,stuff VARCHAR2(100))
  5  PARTITION BY RANGE (DT)
  6  SUBPARTITION BY LIST (COL1)
  7  SUBPARTITION TEMPLATE
  8  (SUBPARTITION A VALUES ('A'),
  9   SUBPARTITION B VALUES ('B'),
 10   SUBPARTITION C VALUES ('C'))
 11  (PARTITION T1_WK1 VALUES LESS THAN (TO_DATE('07-01-2012','DD-MM-YYYY')),
 12   PARTITION T1_WK2 VALUES LESS THAN (TO_DATE('14-01-2012','DD-MM-YYYY')));

Table created.

SQL> insert into t1
  2  values (TO_DATE('01-01-2012','DD-MM-YYYY'),'A',RPAD('X',100,'X'));

1 row created.

SQL> insert into t1
  2  values (TO_DATE('02-01-2012','DD-MM-YYYY'),'B',RPAD('X',100,'X'));

1 row created.

SQL> insert into t1
  2  values (TO_DATE('03-01-2012','DD-MM-YYYY'),'C',RPAD('X',100,'X'));

1 row created.

SQL> exec dbms_stats.gather_table_stats(USER,'T1','T1_WK1');

PL/SQL procedure successfully completed.

SQL> 

So, in one partition, we have 1 row per subpartition. The other partition is empty.

And we’ve gathered stats on one partition.
(Why just one partition? It’s not really important but I’ll touch on it at the end).

You should see the following statistics:

SQL> select global_stats
  2  ,      column_name
  3  ,      case
  4              when column_name = 'DT'   then display_raw(low_value,'DATE')
  5              when column_name = 'COL1' then display_raw(low_value,'VARCHAR2')
  6         end lo
  7  ,      case
  8              when column_name = 'DT'   then display_raw(high_value,'DATE')
  9              when column_name = 'COL1' then display_raw(high_value,'VARCHAR2')
 10         end hi
 11  from   user_tab_col_statistics
 12  where  table_name = 'T1'
 13  and    column_name IN ('DT','COL1');

GLO COLUMN_NAME          LO         HI
--- -------------------- ---------- ----------
YES DT                   01-JAN-12  03-JAN-12
YES COL1                 A          C

SQL> select global_stats
  2  ,      partition_name
  3  ,      column_name
  4  ,      case
  5              when column_name = 'DT'   then display_raw(low_value,'DATE')
  6              when column_name = 'COL1' then display_raw(low_value,'VARCHAR2')
  7         end lo
  8  ,      case
  9              when column_name = 'DT'   then display_raw(high_value,'DATE')
 10              when column_name = 'COL1' then display_raw(high_value,'VARCHAR2')
 11         end hi
 12  from   user_part_col_statistics
 13  where  table_name  = 'T1'
 14  and    column_name IN ('DT','COL1')
 15  order by partition_name, column_name;

GLO PARTITION_NAME       COLUMN_NAME          LO         HI
--- -------------------- -------------------- ---------- ----------
YES T1_WK1               COL1                 A          C
YES T1_WK1               DT                   01-JAN-12  03-JAN-12
NO  T1_WK2               COL1
NO  T1_WK2               DT

SQL> select subpartition_name
  2  ,      column_name
  3  ,      case
  4              when column_name = 'DT'   then display_raw(low_value,'DATE')
  5              when column_name = 'COL1' then display_raw(low_value,'VARCHAR2')
  6         end lo
  7  ,      case
  8              when column_name = 'DT'   then display_raw(high_value,'DATE')
  9              when column_name = 'COL1' then display_raw(high_value,'VARCHAR2')
 10         end hi
 11  from   user_subpart_col_statistics
 12  where  table_name  = 'T1'
 13  and    column_name IN ('DT','COL1')
 14  order by subpartition_name, column_name;

SUBPARTITION_NAME    COLUMN_NAME          LO         HI
-------------------- -------------------- ---------- ----------
T1_WK1_A             COL1                 A          A
T1_WK1_A             DT                   01-JAN-12  01-JAN-12
T1_WK1_B             COL1                 B          B
T1_WK1_B             DT                   02-JAN-12  02-JAN-12
T1_WK1_C             COL1                 C          C
T1_WK1_C             DT                   03-JAN-12  03-JAN-12
T1_WK2_A             COL1
T1_WK2_A             DT
T1_WK2_B             COL1
T1_WK2_B             DT
T1_WK2_C             COL1
T1_WK2_C             DT

12 rows selected.

SQL> 

Hopefully no surprises there.

BTW: display_raw

Now, if we use COPY_TABLE_STATS at a partition level:

SQL> exec dbms_stats.copy_table_stats(USER,'T1','T1_WK1','T1_WK2');

PL/SQL procedure successfully completed.

SQL>

the table level stats are unaffected as you might expect:

SQL> select global_stats
  2  ,      column_name
  3  ,      case
  4              when column_name = 'DT'   then display_raw(low_value,'DATE')
  5              when column_name = 'COL1' then display_raw(low_value,'VARCHAR2')
  6         end lo
  7  ,      case
  8              when column_name = 'DT'   then display_raw(high_value,'DATE')
  9              when column_name = 'COL1' then display_raw(high_value,'VARCHAR2')
 10         end hi
 11  from   user_tab_col_statistics
 12  where  table_name = 'T1'
 13  and    column_name IN ('DT','COL1');

GLO COLUMN_NAME          LO         HI
--- -------------------- ---------- ----------
YES DT                   01-JAN-12  03-JAN-12
YES COL1                 A          C

SQL> 

and, most importantly, the partition key high/low values are no longer just copied:

SQL> select global_stats
  2  ,      partition_name
  3  ,      column_name
  4  ,      case
  5              when column_name = 'DT'   then display_raw(low_value,'DATE')
  6              when column_name = 'COL1' then display_raw(low_value,'VARCHAR2')
  7         end lo
  8  ,      case
  9              when column_name = 'DT'   then display_raw(high_value,'DATE')
 10              when column_name = 'COL1' then display_raw(high_value,'VARCHAR2')
 11         end hi
 12  from   user_part_col_statistics
 13  where  table_name  = 'T1'
 14  and    column_name IN ('DT','COL1')
 15  order by partition_name, column_name;

GLO PARTITION_NAME       COLUMN_NAME          LO         HI
--- -------------------- -------------------- ---------- ----------
YES T1_WK1               COL1                 A          C
YES T1_WK1               DT                   01-JAN-12  03-JAN-12
YES T1_WK2               COL1                 A          C
YES T1_WK2               DT                   07-JAN-12  14-JAN-12

The statistics above show that the high/low of the destination partition match the partition boundaries which nearly makes sense but note the high value is the LESS THAN limit of the partition.
(Robin Moffat also mentioned this in a post)

Now copying the statistics at a PARTITION level, does nothing to the SUBPARTITION statistics, never has done.
That’s not changed nor was it expected to:

SQL>  select subpartition_name
  2   ,      column_name
  3   ,      case
  4               when column_name = 'DT'   then display_raw(low_value,'DATE')
  5               when column_name = 'COL1' then display_raw(low_value,'VARCHAR2')
  6          end lo
  7   ,      case
  8               when column_name = 'DT'   then display_raw(high_value,'DATE')
  9               when column_name = 'COL1' then display_raw(high_value,'VARCHAR2')
 10          end hi
 11   from   user_subpart_col_statistics
 12   where  table_name  = 'T1'
 13   and    column_name IN ('DT','COL1')
 14   order by subpartition_name, column_name;

SUBPARTITION_NAME    COLUMN_NAME          LO                   HI
-------------------- -------------------- -------------------- --------------------
T1_WK1_A             COL1                 A                    A
T1_WK1_A             DT                   01-JAN-12            01-JAN-12
T1_WK1_B             COL1                 B                    B
T1_WK1_B             DT                   02-JAN-12            02-JAN-12
T1_WK1_C             COL1                 C                    C
T1_WK1_C             DT                   03-JAN-12            03-JAN-12
T1_WK2_A             COL1
T1_WK2_A             DT
T1_WK2_B             COL1
T1_WK2_B             DT
T1_WK2_C             COL1
T1_WK2_C             DT

12 rows selected.

Perhaps then, if we value our subpartition statitics, we might think it’s a good idea to do our operations at a subpartition level instead.

What happens if we reset these statistics and do a COPY_TABLE_STATS at a subpartition level?

Reset:

SQL> exec dbms_stats.delete_table_stats(USER,'T1');

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(USER,'T1','T1_WK1');

PL/SQL procedure successfully completed.

And COPY_TABLE_STATS on the subpartitions:

SQL>  begin
  2    dbms_stats.copy_table_stats(USER,'T1','T1_WK1_A','T1_WK2_A');
  3    dbms_stats.copy_table_stats(USER,'T1','T1_WK1_B','T1_WK2_B');
  4    dbms_stats.copy_table_stats(USER,'T1','T1_WK1_C','T1_WK2_C');
  5   end;
  6  /

PL/SQL procedure successfully completed.

The table level stats are still unaffected.

But at the subpartition level, the RANGE PARTITION KEY (DT) is back to just being copied, not set to the boundary of the PARTITION KEY:

SQL>  select subpartition_name
  2   ,      column_name
  3   ,      case
  4               when column_name = 'DT'   then display_raw(low_value,'DATE')
  5               when column_name = 'COL1' then display_raw(low_value,'VARCHAR2')
  6          end lo
  7   ,      case
  8               when column_name = 'DT'   then display_raw(high_value,'DATE')
  9               when column_name = 'COL1' then display_raw(high_value,'VARCHAR2')
 10          end hi
 11   from   user_subpart_col_statistics
 12   where  table_name  = 'T1'
 13   and    column_name IN ('DT','COL1')
 14   order by subpartition_name, column_name;

SUBPARTITION_NAME    COLUMN_NAME          LO                   HI
-------------------- -------------------- -------------------- --------------------
T1_WK1_A             COL1                 A                    A
T1_WK1_A             DT                   01-JAN-12            01-JAN-12
T1_WK1_B             COL1                 B                    B
T1_WK1_B             DT                   02-JAN-12            02-JAN-12
T1_WK1_C             COL1                 C                    C
T1_WK1_C             DT                   03-JAN-12            03-JAN-12
T1_WK2_A             COL1                 A                    A
T1_WK2_A             DT                   01-JAN-12            01-JAN-12
T1_WK2_B             COL1                 B                    B
T1_WK2_B             DT                   02-JAN-12            02-JAN-12
T1_WK2_C             COL1                 C                    C
T1_WK2_C             DT                   03-JAN-12            03-JAN-12

12 rows selected.

And, at the partition level, any aggregation of these incorrect values depends on whether we already have GLOBAL STATS or not (hence the initial stats gathering on just the partition).

If no global stats then the error is propagated up from the subpartition:

SQL>  select global_stats
  2   ,      partition_name
  3   ,      column_name
  4   ,      case
  5               when column_name = 'DT'   then display_raw(low_value,'DATE')
  6               when column_name = 'COL1' then display_raw(low_value,'VARCHAR2')
  7          end lo
  8   ,      case
  9               when column_name = 'DT'   then display_raw(high_value,'DATE')
 10               when column_name = 'COL1' then display_raw(high_value,'VARCHAR2')
 11          end hi
 12   from   user_part_col_statistics
 13   where  table_name  = 'T1'
 14   and    column_name IN ('DT','COL1')
 15   order by partition_name, column_name;

GLO PARTITION_NAME       COLUMN_NAME          LO                   HI
--- -------------------- -------------------- -------------------- --------------------
YES T1_WK1               COL1                 A                    C
YES T1_WK1               DT                   01-JAN-12            03-JAN-12
NO  T1_WK2               COL1                 A                    C
NO  T1_WK2               DT                   01-JAN-12            03-JAN-12

Or if instead of gathering statistics at the partition level initially (and at reset), we do a table level gather, then
the PARTITION stats are not aggregated, i.e.

GLO PARTITION_NAME       COLUMN_NAME          LO         HI
--- -------------------- -------------------- ---------- ----------
YES T1_WK1               COL1                 A          C
YES T1_WK1               DT                   01-JAN-12  03-JAN-12
YES T1_WK2               COL1
YES T1_WK2               DT

So, in summary, I just wanted to show that whilst the issue with COPY_TABLE_STATS and HIGH/LOW has been addressed at the PARTITION level, there still seem to be issues for the PARTITION KEY at the SUBPARTITION LEVEL.

Partitioned table statistics are more complicated than you might expect.

Here are some of the excellent resources that can guide you through:

Correlation / causation

On Tuesday morning I walked into a right sh*t storm.

I hadn’t been called but on Monday night performance had been absolutely catastrophic – SLAs missed left, right and centre.

Straight into a meeting where the following observations were made:

  • Performance was somewhat suboptimal the previous night
  • Some long-running queries were observed that aren’t normally top of the lists, these were “uncontrolled” access via Excel or Access, not via the reporting Front End or other application processes
  • When these queries stopped, performance returned to normal

So:

  • Why had these queries brought the database to its knees?
  • Was it stats and the stats job on Sunday night?
  • Was it “plan flips”? (I hear a lot about “plan flips” and them being “bad” – do people really expect plan stability without plan stability features?)
  • Was it cardinality feedback? (Admittedly this does seem to have caused us more positives than negatives particularly when working in conjunction with dynamic sampling when I’m not sure it should be).

At this point, I still hadn’t seen any evidence or symptoms of the issue, just hearsay.

So, what did the problem look like?

Below is a snippet from an AWR report from a “normal” period.
(It’s longer than usually advisable – I just want to show the symptoms over the problem period)

Platform                         CPUs Cores Sockets Memory(GB)
-------------------------------- ---- ----- ------- ----------
Linux x86 64-bit                   24    12       2     141.44

              Snap Id      Snap Time      Sessions Curs/Sess
            --------- ------------------- -------- ---------
Begin Snap:      6678 21-May-12 21:30:26       584      22.8
  End Snap:      6682 21-May-12 23:30:33       565      24.6
   Elapsed:              120.13 (mins)
   DB Time:              582.50 (mins)

Foreground Wait Events
                                                             Avg
                                        %Time Total Wait    wait    Waits   % DB
Event                             Waits -outs   Time (s)    (ms)     /txn   time
-------------------------- ------------ ----- ---------- ------- -------- ------
db file sequential read       2,664,063     0      7,050       3      2.5   20.2
log file sync                 1,027,185     0      3,501       3      1.0   10.0
direct path read                159,706     0      1,191       7      0.2    3.4
db file scattered read           61,649     0        284       5      0.1     .8

Background Wait Events
                                                             Avg
                                        %Time Total Wait    wait    Waits   % bg
Event                             Waits -outs   Time (s)    (ms)     /txn   time
-------------------------- ------------ ----- ---------- ------- -------- ------
log file parallel write         839,190     0      1,465       2      0.8   23.6
db file parallel write          551,198     0        674       1      0.5   10.9
Log archive I/O                  21,277     0        311      15      0.0    5.0
log file sequential read          6,714     0        150      22      0.0    2.4
control file parallel writ       14,311     0         48       3      0.0     .8

and this was what had happened:

              Snap Id      Snap Time      Sessions Curs/Sess
            --------- ------------------- -------- ---------
Begin Snap:      7014 28-May-12 21:30:28       560      22.8
  End Snap:      7018 28-May-12 23:30:01       538      23.7
   Elapsed:              119.54 (mins)
   DB Time:            2,067.54 (mins)

Foreground Wait Events
                                                             Avg
                                        %Time Total Wait    wait    Waits   % DB
Event                             Waits -outs   Time (s)    (ms)     /txn   time
-------------------------- ------------ ----- ---------- ------- -------- ------
log file sync                   399,560     0     83,938     210      1.0   67.7
direct path read                123,631     0     15,639     126      0.3   12.6
db file sequential read         363,158     0      9,406      26      0.9    7.6
db file scattered read           93,140     0      4,083      44      0.2    3.3

Background Wait Events
                                                             Avg
                                        %Time Total Wait    wait    Waits   % bg
Event                             Waits -outs   Time (s)    (ms)     /txn   time
-------------------------- ------------ ----- ---------- ------- -------- ------
db file parallel write          394,366     0      8,048      20      1.0   37.6
log file parallel write         101,489     0      6,197      61      0.2   28.9
Log archive I/O                  12,026     0      1,814     151      0.0    8.5
control file parallel writ       11,240     0      1,793     160      0.0    8.4
log file sequential read          9,587     0      1,179     123      0.0    5.5

So, hang on a second.

We think that a couple of long-running queries caused this devastation – log file syncs out to 70x our chosen “normal” above and no CPU issues? All IO basically choked.

Doesn’t make sense – wouldn’t that take some serious misconfiguration?

And yet, how could this be a storage issue if it all resolved itself when those long-running queries finished? It was this doubt that meant we spent longer than needed checking and double checking this box and this database before contacting the storage team.

Then to cut a long story short and to miss out a whole bunch of toing and froing and whys and wherefores… the storage team put the focus on another of our databases that uses the same disk groups.

That database didn’t seem to be doing very much:

              Snap Id      Snap Time      Sessions Curs/Sess
            --------- ------------------- -------- ---------
Begin Snap:      8271 28-May-12 21:00:33        71        .9
  End Snap:      8273 28-May-12 23:00:17        72        .9
   Elapsed:              119.74 (mins)
   DB Time:              155.20 (mins)

Although the few things it was doing were similarly slow:

                                                           Avg
                                                          wait   % DB
Event                                 Waits     Time(s)   (ms)   time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
control file sequential read        272,003       6,778     25   72.8 System I/O
db file sequential read              36,346       1,409     39   15.1 User I/O
db file parallel read                 1,127         797    707    8.6 User I/O
log file sync                           300         253    843    2.7 Commit
DB CPU                                               72            .8

But, hang on a sec. What’s that:

Time Model Statistics

Statistic Name                                       Time (s) % of DB Time
------------------------------------------ ------------------ ------------
RMAN cpu time (backup/restore)                       42,305.3        454.3

and

Background Wait Events
                                                             Avg
                                        %Time Total Wait    wait    Waits   % bg
Event                             Waits -outs   Time (s)    (ms)     /txn   time
-------------------------- ------------ ----- ---------- ------- -------- ------
RMAN backup & recovery I/O    1,166,261     0     72,242      62 2.65E+04   60.2

Turns out that it was an RMAN backup, running when it shouldn’t have been, on a 3.5T archive database on one of the other nodes of the cluster and which uses the same disk groups.

SQL> select to_char(start_time,'DD-MON-YYYY HH24:MI')
  2  ,      to_char(end_time,'DD-MON-YYYY HH24:MI')
  3  ,      input_bytes_display
  4  ,      output_bytes_display
  5  ,      time_taken_display  
  6  from v$rman_backup_job_details  
  7  where session_stamp = 784494089  
  8  order by start_time desc  
  9  /

TO_CHAR(START_TIM TO_CHAR(END_TIME, INPUT_BYTES_DISPLAY  OUTPUT_BYTES_DISPLAY TIME_TAKEN_DISPLAY
----------------- ----------------- -------------------- -------------------- --------------------
28-MAY-2012 19:01 28-MAY-2012 23:47     2.99T              532.82G            04:46:30

SQL> 

That correlation was the causation, not those queries and this backup finished just before those long-running queries finished and then everything was ok.

But the long-running queries were just a red-herring and another symptom of everything that was running on the live database was choking.

Sometimes it’s difficult to distinguish between cause and symptom.

HugePages – Overhead of Not Using

I’ve read a fair bit about HugePages and their importance especially for Oracle databases with large SGAs and a large number of dedicated connections.

Admittedly, some of what I read is over my head – I’m not a production DBA and quite often I don’t even have access to the box unfortunately – but I know who the experts are and I trust them (doesn’t mean that an expert’s opinion shouldn’t be tested of course):

Now, the title in Kevin’s posts asks ( rhetorically) “Configuring Linux Hugepages for Oracle Database Is Just Too Difficult! Isn’t It?”

And I think a lot of the time, it is. Not because of the topic but because of the structure of modern IT departments.

When I tried to tackle the subject before at a client, it was awkward – it wasn’t a subject I could talk about with any great authority although I could refer them to the expert material above and … well, to cut a short story shorter and keep me out of trouble… we didn’t. I was overruled.  No-one else at the client was using HugePages on 11g. So no HugePages and no chance to test it.

Well, in the end, we didn’t go with AMM, because for our applications, according to our tests, we were better suited to ASMM (damn these acronyms).

Statements like this from Kevin’s articles above are easy to understand:

Reasons for Using Hugepages

  1. Use hugepages if OLTP or ERP. Full stop.
  2. Use hugepages if DW/BI with large numbers of dedicated connections or a large SGA. Full stop.
  3. Use hugepages if you don’t like the amount of memory page tables are costing you (/proc/meminfo). Full stop.

and

“Large number of dedicated connections” and “large SGA” in the context of hugepages can only be quantified by the amount of memory wasted in page tables and whether the administrator is satisfied with that cost.”

Those are pretty clear cut.

So, if I understand everything correctly, I should not like this, right?:

$ grep "PageTables" /proc/meminfo
PageTables:   41622228 kB
$

That’s 40 gig of overhead for an SGA of 60 gig and a whole bunch of connections.
So, don’t need HugePages, eh?
Bit late now.

Why did that report run slow? ASH says log file sequential read

“Dump Online Redo for Buffers in Pin History”

I’ve just been investigating why a certain report ran slow last night.
Read more of this post

Follow

Get every new post delivered to your Inbox.

Join 68 other followers