Correlation / causation

On Tuesday morning I walked into a right sh*t storm.

I hadn’t been called but on Monday night performance had been absolutely catastrophic – SLAs missed left, right and centre.

Straight into a meeting where the following observations were made:

  • Performance was somewhat suboptimal the previous night
  • Some long-running queries were observed that aren’t normally top of the lists, these were “uncontrolled” access via Excel or Access, not via the reporting Front End or other application processes
  • When these queries stopped, performance returned to normal


  • Why had these queries brought the database to its knees?
  • Was it stats and the stats job on Sunday night?
  • Was it “plan flips”? (I hear a lot about “plan flips” and them being “bad” – do people really expect plan stability without plan stability features?)
  • Was it cardinality feedback? (Admittedly this does seem to have caused us more positives than negatives particularly when working in conjunction with dynamic sampling when I’m not sure it should be).

At this point, I still hadn’t seen any evidence or symptoms of the issue, just hearsay.

So, what did the problem look like?

Below is a snippet from an AWR report from a “normal” period.
(It’s longer than usually advisable – I just want to show the symptoms over the problem period)

Platform                         CPUs Cores Sockets Memory(GB)
-------------------------------- ---- ----- ------- ----------
Linux x86 64-bit                   24    12       2     141.44

              Snap Id      Snap Time      Sessions Curs/Sess
            --------- ------------------- -------- ---------
Begin Snap:      6678 21-May-12 21:30:26       584      22.8
  End Snap:      6682 21-May-12 23:30:33       565      24.6
   Elapsed:              120.13 (mins)
   DB Time:              582.50 (mins)

Foreground Wait Events
                                        %Time Total Wait    wait    Waits   % DB
Event                             Waits -outs   Time (s)    (ms)     /txn   time
-------------------------- ------------ ----- ---------- ------- -------- ------
db file sequential read       2,664,063     0      7,050       3      2.5   20.2
log file sync                 1,027,185     0      3,501       3      1.0   10.0
direct path read                159,706     0      1,191       7      0.2    3.4
db file scattered read           61,649     0        284       5      0.1     .8

Background Wait Events
                                        %Time Total Wait    wait    Waits   % bg
Event                             Waits -outs   Time (s)    (ms)     /txn   time
-------------------------- ------------ ----- ---------- ------- -------- ------
log file parallel write         839,190     0      1,465       2      0.8   23.6
db file parallel write          551,198     0        674       1      0.5   10.9
Log archive I/O                  21,277     0        311      15      0.0    5.0
log file sequential read          6,714     0        150      22      0.0    2.4
control file parallel writ       14,311     0         48       3      0.0     .8

and this was what had happened:

              Snap Id      Snap Time      Sessions Curs/Sess
            --------- ------------------- -------- ---------
Begin Snap:      7014 28-May-12 21:30:28       560      22.8
  End Snap:      7018 28-May-12 23:30:01       538      23.7
   Elapsed:              119.54 (mins)
   DB Time:            2,067.54 (mins)

Foreground Wait Events
                                        %Time Total Wait    wait    Waits   % DB
Event                             Waits -outs   Time (s)    (ms)     /txn   time
-------------------------- ------------ ----- ---------- ------- -------- ------
log file sync                   399,560     0     83,938     210      1.0   67.7
direct path read                123,631     0     15,639     126      0.3   12.6
db file sequential read         363,158     0      9,406      26      0.9    7.6
db file scattered read           93,140     0      4,083      44      0.2    3.3

Background Wait Events
                                        %Time Total Wait    wait    Waits   % bg
Event                             Waits -outs   Time (s)    (ms)     /txn   time
-------------------------- ------------ ----- ---------- ------- -------- ------
db file parallel write          394,366     0      8,048      20      1.0   37.6
log file parallel write         101,489     0      6,197      61      0.2   28.9
Log archive I/O                  12,026     0      1,814     151      0.0    8.5
control file parallel writ       11,240     0      1,793     160      0.0    8.4
log file sequential read          9,587     0      1,179     123      0.0    5.5

So, hang on a second.

We think that a couple of long-running queries caused this devastation – log file syncs out to 70x our chosen “normal” above and no CPU issues? All IO basically choked.

Doesn’t make sense – wouldn’t that take some serious misconfiguration?

And yet, how could this be a storage issue if it all resolved itself when those long-running queries finished? It was this doubt that meant we spent longer than needed checking and double checking this box and this database before contacting the storage team.

Then to cut a long story short and to miss out a whole bunch of toing and froing and whys and wherefores… the storage team put the focus on another of our databases that uses the same disk groups.

That database didn’t seem to be doing very much:

              Snap Id      Snap Time      Sessions Curs/Sess
            --------- ------------------- -------- ---------
Begin Snap:      8271 28-May-12 21:00:33        71        .9
  End Snap:      8273 28-May-12 23:00:17        72        .9
   Elapsed:              119.74 (mins)
   DB Time:              155.20 (mins)

Although the few things it was doing were similarly slow:

                                                          wait   % DB
Event                                 Waits     Time(s)   (ms)   time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
control file sequential read        272,003       6,778     25   72.8 System I/O
db file sequential read              36,346       1,409     39   15.1 User I/O
db file parallel read                 1,127         797    707    8.6 User I/O
log file sync                           300         253    843    2.7 Commit
DB CPU                                               72            .8

But, hang on a sec. What’s that:

Time Model Statistics

Statistic Name                                       Time (s) % of DB Time
------------------------------------------ ------------------ ------------
RMAN cpu time (backup/restore)                       42,305.3        454.3


Background Wait Events
                                        %Time Total Wait    wait    Waits   % bg
Event                             Waits -outs   Time (s)    (ms)     /txn   time
-------------------------- ------------ ----- ---------- ------- -------- ------
RMAN backup & recovery I/O    1,166,261     0     72,242      62 2.65E+04   60.2

Turns out that it was an RMAN backup, running when it shouldn’t have been, on a 3.5T archive database on one of the other nodes of the cluster and which uses the same disk groups.

SQL> select to_char(start_time,'DD-MON-YYYY HH24:MI')
  2  ,      to_char(end_time,'DD-MON-YYYY HH24:MI')
  3  ,      input_bytes_display
  4  ,      output_bytes_display
  5  ,      time_taken_display  
  6  from v$rman_backup_job_details  
  7  where session_stamp = 784494089  
  8  order by start_time desc  
  9  /

----------------- ----------------- -------------------- -------------------- --------------------
28-MAY-2012 19:01 28-MAY-2012 23:47     2.99T              532.82G            04:46:30


That correlation was the causation, not those queries and this backup finished just before those long-running queries finished and then everything was ok.

But the long-running queries were just a red-herring and another symptom of everything that was running on the live database was choking.

Sometimes it’s difficult to distinguish between cause and symptom.


Poll: SQL Plan Management

I’m interested in what SQL Plan Management features people are actively using.
Read more of this post

Test config changes for Veritas CIO and Linux IO Scheduler

Just a short note to report on the impact that some config changes have made on IO times in a specific environment for a specific workload.

I mentioned previously that I’m working on an upgrade of an application from to

  • There are changes in pretty much every area – new hardware, different OS, etc, etc.
  • We’re using a full-volume UAT environment within the new set-up to compare new against old (production) and that will form the main basis for performance changes to the application required for this upgrade.
  • It’s pretty much an apples vs oranges comparison and not helped by the fact that UAT runs on tier 2 storage to be compared against the current tier 1 storage – UAT IO is slow.
  • In summary, not exactly best practice upgrade approach – but that’s just how it is sometimes in the real world.

Anyway, anyway, anyway… we’ve been waiting for recommendations and official go-ahead from the database engineering group who run the tests and control the builds of the machines and the following config changes have been made to the following:

  • Veritas VxFS CIO
  • Linux Deadline scheduler

Ideally such changes would be made individually to gauge their individual impact, however, as mentioned, it’s not always like that is it?.

And on the UAT environments above, based on a before-flashback-after run of the main application workload, the following IO times were observed:

Wait Event Average wait time (before) Average wait time (after)
db file sequential read 10 ms 7 ms
db file scattered read 21 ms 11 ms
db file parallel read 38 ms 60 ms
direct path read 95 ms 64 ms
direct path read temp 32 ms 9 ms
direct path write temp 34 ms 8 ms
log file sync 9 ms 3 ms

I’m not convinced that we have a level of control over the whole environment and time to deliver change incrementally to read too much into a single before/after comparison of the same workload, however these initial findings were better than I expected.
(Bottom line is that it’s still apples vs oranges)

So… it may be that someone finds this useful.

“The Correct Plan”

Part I – What is “The Correct Plan”?

How many SQL statements are genuinely straightforward enough that you can look at it and instantly say what “The Correct Plan” is?

When you make this judgement on “The Correct Plan”, do you take into account index clustering factors, single and multiblock read times, etc, etc? Should you?

What about new features, new access paths, new join methods, does “The Correct Plan” take these into account?

Or perhaps you think it is independent of them!?!?

What about if you’ve got a plan that you’re happy with on production? Is that “The Correct Plan”?

So maybe this is just semantics?

What people mean when they say “The Correct Plan” and “The Wrong Plan/s” is probably “An Acceptable Plan” and “An Unacceptable Plan”.

Maybe I need to get over the terminology, move on.

There might be something better, there might be something worse, but this plan is what they’re happy with at the moment.

This plan is acceptable.

Part II – “Hi, my name is Bob and I’m addicted to hinting.”

Why am I going on about this?

It’s a follow-up to my previous post on some frustrations on a SQL Tuning gig.

One of the reasons I blog is that it has carthartic properties. But it’s not carthartic enough – I can’t stop going on and on about the same stuff which drives me to distraction day in, day out.

I’m in a team that is addicted to hinting.

Developers addicted to hinting.

Managers addicted to developers hinting.

In fact, I can’t recall being somewhere with a greater affliction of hinting addiction.

And they are insistent on me hinting, all day, every day.

I’ve tried but they won’t give it up. They’ve got to want to give it up.

It is “The Silver Bullet” and “The Comfort Blanket”, whilst in reality being neither.

And hinting begets more hinting. See Rules for Hinting.

For a big statement, have you ever tried hinting it manually to get “The Correct Plan” consistently? There might be shortcuts by nicking the full set of hints from an outline or a profile or other_xml depending on version, but it’s hideous, it’s long winded, and at some point it’s likely to “go bad”.

I’ve tried to get the team to look at baselines as a way to lock in “The Correct Plan” but they have huge – quite possibly insurmountable (in terms of willingness to surmount them) – FUD about how these will fit into the development lifecycle, the release management process, change control.

The first nail in the coffin of getting them to experiment with baselines was a performance bug with recursive merge statements into the sqlobj$* tables. Mentioned that before. Several times.

That now having been fixed, the second nail in the coffin happened today.

There’s a lot of Pro*C code. I managed to get them to compile some of it with common_parser=yes so that, for the SQL that is inline in the Pro*C and not in a stored proc, we could use some of these new-fangled features that have come out since something like Oracle 8, you know like scalar subqueries, WITH, analytics even LISTAGG for example.

But this has had the nasty side-effect of materially changing some of the SQL statements – particularly around removing unnecessary parentheses around predicates – such that both the force and exact matching signatures of certain SQL statements changed. So, the few statements we were trying with baselines failed to use the baselined plans and “went bad”.

Not necessarily a problem in itself but adding to the insurmountable FUD…

Part III – Ignore the Plan

A while ago Doug Burns was talking about getting developers to ignore Cost.

Sometimes – not all the time, but most of the time – I would go further than that – “Ignore the plan”.

Obviously I don’t really mean “Ignore the plan”, at least not the whole plan.

When, for example, a statement performs acceptably or better than acceptably (which will then instantly become the new “acceptably”) – who cares about the plan at all (until it performs badly).

However, when performance problems set in, maybe there’s a tendency sometimes to get too hung up on whether the starting point should be a nested loop driven by an index range scan of index_1, etc?

A lot of the time, I bet that your idea of “The Correct Plan” is based on heuristics, a set of rules, a bit like the RBO but probably not as effective.

And there’s a reason why those similar heuristics in the RBO have been discarded.

So maybe Ignore the Plan?

Focus on the row estimates, the cardinalities.

If these are accurate, at least nine times out of ten, you’ll get an appropriate plan, “An Acceptable Plan”.

And if they’re not accurate, before looking at anything else, review the SQL.

Is that the best way to express the logic? There’s a good chance it’s not.

In the 9i to 11gR2 upgrade that I’m currently involved with, most of the SQL with the biggest performance problems and which deviate from “The Correct Plan”, can be rewritten, expressed in a slightly different, sometimes better, sometimes more natural, logical, set-based way.

And often, that’s enough.

SQL Tuning Set to Baseline to Advisor

In my previous post “Advisor to Baseline – am I missing something?”, the answer was an inevitable “Yes”.

Just as a reminder, what I had tried to do was:

  1. Create a Tuning Set from some statements in AWR
  2. Create and execute a Tuning Task based on the Tuning Set
  3. Accept the simple profile recommendations from the Advisor
  4. Create a SQL Plan Baseline from the Tuning Set and with it all the new profiles

What happened was that I ended up with a bunch of standalone profiles from the Advisor – see DBA_SQL_PROFILES and/or SQLOBJ$ of object type 1, looking at hints in COMP_DATA of SQLOBJ$DATA or packing to staging table and inspecting there.

And I ended up with a bunch of SQL Plan Baselines with hints from the old plans for statements that I had run through the Advisor because they were rubbish (DBA_SQL_PLAN_BASELINES and SQLOBJ$ of object type 2, looking at hints in COMP_DATA of SQLOBJ$DATA or packing to staging table and inspecting there.)

Quick question – what happens if you have some SQL Plan Baselines with some “bad” hints whilst there also exist some standalone sql profiles with some “good” hints?

From my observations, the Baselines will win. The bad plans will be used. However, because when baselines are being used, on hard parse the optimizer will generate a plan anyway and record any differences in plans generated. So when generating the plan anyway, the standalone sql profiles kick in and so the baseline will contain unaccepted “better” plans ready to be evolved for subsequent executions (unaccepted depending on whether you’re runing with automatic evolution or not).

And back to what I should have done initially and that’s:

  1. Create a Tuning Set from some statements in AWR
  2. Create a SQL Plan Baseline from the Tuning Set
  3. Create and execute a Tuning Task based on the Tuning Set
  4. Accept the simpler, non-parallel profile recommendations from the Advisor

This way the profiles get created not as standalone profiles but part of SQL Plan Baselines – SQLOB$ object type 2 – and accepted and enabled in DBA_SQL_PLAN_BASELINES (FLAGS=11 in SQLOBJ$).

I’ll back this all up with some proof and isolated examples one day.
At the minute, everything’s just too manic, manic trying to get this stuff to work, manic fighting management FUD about just about any new feature since Oracle 9…

Real Time SQL Monitoring

I’m a sucker for convenience.

I’ve always liked the /*+ GATHER_PLAN_STATISTICS */ hint when followed by


Yes, I know you can get the same information via other means but I don’t care, I love it (having regressed to a 9i site for nearly a year and only back to a civilised Oracle version for a couple of weeks, I’m probably guilty of a little overexuberance – it’s not often that can be said about me).

There’s always been a slight conundrum though in that you need to wait for your SQL to complete before you get your execution plan and runtime statistics back.

But, in 11g, wait no more.


It’s cool, it’s convenient. What’s not to like?

I know I’ve been back off 11g for about a year now, but how did I miss this previously?

This is old news – but how come more people don’t use it? (Other than they’re not on 11, of course).

Obviously, there are GUIs and buttons and HTML versions, etc.

Me – I like my command line and text reports 🙂

Anyway, just to demo an example.

I’ve pasted the text output of the report into Excel because

  1. WordPress only allows me upload a small set of filetypes and
  2. Excel was the only one that could cope nicely with the width (“never mind the quality, feel the width” as they say).

This demo was done at the end of the SQL – i.e. it wasn’t real-time monitoring but if you do it whilst the SQL is running, there’s extra progress information which comes from V$SESSION_LONGOPS – i.e. roughly how far through that HASH JOIN are we?

Also, this statement was run in parallel and it produced a really nice summary of parallel execution details.

SELECT dbms_sqltune.report_sql_monitor('8dwq85mf22bs2') 
FROM   dual;

can produce a report a bit like this

Other links:
Performance Tuning Guide
Kerry Osborne – Real time SQL Monitoring
Greg Rahn – Real-time SQl Monitoring Using…

IO Calibration

I’ve mentioned elsewhere that I’m in the early stages of an upgrade – on 32 bit Solaris to on 64 bit Linux.
(So, not exactly an illustration in implementing change incrementally. 🙂 )

Initial run of application code was slow, very slow. Across the board. Some plans had changed. But average IO seems much slower than the benchmarks using for comparison.

Statspack shows that a single block read takes on average 4 ms in production.

For 11gR2 on the new kit (which isn’t destined for production but which is at least meant to be used to provide a ballpark comparison), AWR shows average single block read times of more like 14ms.

But we’re not really comparing apples with apples. So many things are different, it’s more like apples and oranges – massive jump in Oracle version, new kit, different OS, different configurations, direct IO, etc, etc.

I have been advocating simplicity when approaching problems, especially with so many changes – divide & conquer, distill & simplify.

For example, let’s say that IO looks slow.

For now, let’s not use the application batch to test kernel settings, ODM, etc
(Don’t get me wrong – it will be important to use a workload absolutely representative of the the application, I just don’t think it might not be now).

Let’s benchmark using, for example, Oracle ORIONto see if we can achieve IO rates that someone, somewhere must know we need – they’ve purchased the kit after all – and then use that to test changes, use that to compare new environments against a benchmark, etc.
(Then, when we’re happy with simpler things, we can start adding back in and piecing things together again)

Anyway, today I came across an alternative to Orion that I can’t recall being previously aware of.


I like – it seems pretty neat to me.

Yet another feature which seems to have sneaked under my radar somehow (along with real time SQL monitoring – a quick blog post mention which I’ve half written but not published yet).

What does this CALIBRATE_IO do?

It’s like an easier way to do the ORION thing.

From the CALIBRATE_IO.PDF available via Metalink Doc Id 727062.1 “Configuring and using Calibrate I/O”:

When Calibrate I/O is invoked it will generate I/O intensive read-only random I/O (db_block_size)
and large-block (1MByte) sequential I/O workloads. Unlike various external I/O calibration tools, this
tool uses the Oracle code stack and runs in the database, issuing I/O against blocks stored in the
database. The results, therefore, much more closely match the actual database performance.
Once the workload execution is completed, a summary of the results is provided.

I ran it like this (DBA for env is away, SysAdmin sort of suggested 15 as NUM_DISKS but when I ran with 15 initially, it ran forever and it looked like the waits weren’t instrumented properly.)

SQL> declare
  2    l_latency   integer;
  3    l_iops      integer;     
  4    l_mbps      integer; 
  5  begin    
  6    dbms_resource_manager.calibrate_io
  7    (5,10,l_iops,l_mbps,l_latency);
  8    dbms_output.put_line ('I/O Ops/sec = '||l_iops);
  9    dbms_output.put_line ('Actual Latency = '||l_latency);
 10    dbms_output.put_line ('MB/sec = '||l_mbps);
 11  end;
 12  /

And got the following output:

I/O Ops/sec = 52
Actual Latency = 18
MB/sec = 93

You can also get this information by querying DBA_RSRC_IO_CALIBRATE:

SQL> select * from dba_rsrc_io_calibrate;

START_TIME                        END_TIME                            MAX_IOPS   MAX_MBPS  MAX_PMBPS    LATENCY NUM_PHYSICAL_DISKS
--------------------------------- --------------------------------- ---------- ---------- ----------
04-MAR-11 PM      04-MAR-11 PM              52         93          7         18                  5

To be honest, I have no feel for some of these numbers – no frame of reference.
But it’s not great, right?

Some other articles on CALIBRATE_IO:
Tim Hall – Measuring Storage Performance for Oracle Systems
Arup Nanda – Resource Manager IO Callibration in 11g