Why did that report run slow? ASH says log file sequential read

“Dump Online Redo for Buffers in Pin History”

I’ve just been investigating why a certain report ran slow last night.

Got to be a bit careful with ASH once it goes into the repository – DBA_HIST_ACTIVE_SESS_HISTORY – because it’s a sample of a sample.

Normally, I prefer looking at the raw data to see what is and isn’t in the sample, time gaps, etc, but in this case an aggregation is quite clear:

SQL> select sql_exec_id
  2  ,      sql_plan_line_id plan_line
  3  ,      decode(event,null,session_state,event) event
  4  ,      count(*)
  5  ,      to_char(min(sample_time),'HH24:MI') min_time
  6  ,      to_char(max(sample_time),'HH24:MI') max_time
  7  ,      round(sum(delta_time)/1000000,2) delta 
  8  from   dba_hist_active_sess_history
  9  where  sql_id = '2b1y41hau5qhr'
 10  and    sample_time >= trunc(sysdate) -1
 11  group by trunc(sample_time), sql_exec_id, sql_plan_line_id, event
 12  ,      session_state
 13  order by 5 asc;

SQL_EXEC_ID  PLAN_LINE EVENT                            COUNT(*) MIN_T MAX_T      DELTA
----------- ---------- ------------------------------ ---------- ----- ----- ----------
                       control file sequential read            6 19:57 20:19      58.83
                       log file sequential read              289 19:58 20:48    2897.27
                       ON CPU                                 11 20:01 20:47     110.27
   16777217         18 db file sequential read                 1 20:48 20:48      10.02
   16777217         63 ON CPU                                  3 20:48 20:49      30.09
   16777217         16 db file sequential read                 1 20:48 20:48      10.03

6 rows selected.

SQL> 

Bottom line: this query is related to a problem I had with an ORA-00600 a couple of weeks ago happening during the query transformation phase of a hard parse. We did not release a fix for this, prefering to defer until the next major release for the application, because this particular ORA-00600 was not fatal to the end process.

However, as it now turns out, whilst this particular ORA-00600 might not be fatal to the end process it is very much a performance problem and the impact depends on how much work the session has done when it gets the error.

If we look in the alert log, we find the ORA-00600 in question along with some interesting timestamps that broadly correlate with the episodes of log file sequential read from the ASH output above:

Tue May 08 19:57:14 2012
Errors in file /......directory.../trace file.trc (incident=2356995):
ORA-00600: internal error code, arguments: [kkqtSetOp.1], [], [], [], [], [], [], [], [], [], [], []
...
Tue May 08 20:48:25 2012
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /......directory.../trace file.trc (incident=2356996):
ORA-00600: internal error code, arguments: [kkqtSetOp.1], [], [], [], [], [], [], [], [], [], [], []

If we look at the incident files in question, the second (incident=2356996) is a continuation of the first (incident=2356995).

So what are these log file sequential reads all about?

Following the ORA-00600, it’s diagnostic information – specifically the action “Dump Online Redo for Buffers in Pin History”.

In this case, the getting and dumping of these buffers took the best part of the 50 minutes and once done the query continued on its merry way and the process completed.

What we see in the incident file is a timeline like this starting at 19:57:18:

*** timestamp before redo dump: 05/08/2012 19:57:18
***********************************************
* Dump Online Redo for Buffers in Pin History *
***********************************************
$$$$$$$ Dump Online Redo for DBA list (tsn.rdba in hex):
  0xa.05dbf102  0xa.05dbf103  0xa.05dbf104  0xa.05dbf105  0xa.05dbf107  0xa.05dbf109  0xa.05dbf110  0xa.05dbf111  0xa.05dbf113  0xa.05dbf115  0xa.05dbf116  0xa.05dbf119  0xa.05dbf11b  0xa.05dbf11c  0xa.05dbf11e  0xa.05dbf11f:
DUMP REDO
 Opcodes *.*
 DBAs (file#, block#):
 (23, 1831170) (23, 1831171) (23, 1831172) (23, 1831173) (23, 1831175) (23, 1831177) (23, 1831184) (23, 1831185) (23, 1831187) (23, 1831189) (23, 1831190) (23, 1831193) (23, 1831195) (23, 1831196) (23, 1831198) (23, 1831199) .
 SCNs: scn: 0x0000.00000000 thru scn: 0xffff.ffffffff
 Times: 05/08/2012 18:57:18 thru eternity
Initial buffer sizes: read 1024K, overflow 832K, change 805K

*** 2012-05-08 19:57:34.263
Log read is SYNCHRONOUS though disk_asynch_io is enabled!
Log read is SYNCHRONOUS though disk_asynch_io is enabled!
Log read is SYNCHRONOUS though disk_asynch_io is enabled!
Log read is SYNCHRONOUS though disk_asynch_io is enabled!

*** 2012-05-08 19:57:57.137
 Thread 1 low checkpoint scn: 0x077b.9d30745d
 SCN Start Scan Point: scn: 0x077b.9d30745d (8227499570269)
Initial buffer sizes: read 1024K, overflow 832K, change 805K
 INCARNATION:
  START: scn: 0x077a.31629df7 (8221395951095) Timestamp:  04/02/2012 14:41:00
  END: scn: 0xffff.ffffffff

*** 2012-05-08 19:58:09.227
Log read is SYNCHRONOUS though disk_asynch_io is enabled!
 descrip:"Thread 0001, Seq# 0000018211, SCN 0x077b9d30745d-0x077b9d31adcd"
.....

Through to 20:48:25 when:

*** 2012-05-08 20:48:25.097
END OF DUMP REDO
***********************************************
* End Dumping Redo for Buffers in Pin History *
***********************************************
*** timestamp after redo dump: 05/08/2012 20:48:25
*** time cost of this redo dump is 3067 seconds

Credit to Coskan for his observations, asking the right questions and pointing me in the direction of this post by Jonathan Lewis, including the comments, particularly #12 from Timur Akhmadeev which pointed in the right direction.

This is now the second time in a matter of a few weeks that I’ve seen this dump behaviour itself cause serious knock-on issues.

The previous time was on one occasion during my ORA-00600 post-upgrade storm where this recursive getting and dumping of redo buffers – this time caused by an ORA-07445 – meant that certain allocated resources were not freed in a timely fashion resulting in a large queue for “library cache : mutex x” until the dumping session was killed.

3 Responses to Why did that report run slow? ASH says log file sequential read

  1. Dominic,

    I think this is logged as Bug 9385758 “Database hang caused by automatic REDO DUMP”. The patch mentions an optional feature for stopping the dump even in foreground processes.

    • Dom Brooks says:

      Thanks Jonathan.

      Initially I thought that bug was listed as fixed in 11.2.0.2 and as I’m on 11.2.0.3 … but on further reading the “fix” is the ability to prevent this dump by setting the event 10555.

  2. Tanel Poder says:

    Damn, I have hit this issue too in past and thought about blogging it … now it’s too late ;-P

Leave a comment