SQL Tuning Set to Baseline to Advisor

In my previous post “Advisor to Baseline – am I missing something?”, the answer was an inevitable “Yes”.

Just as a reminder, what I had tried to do was:

  1. Create a Tuning Set from some statements in AWR
  2. Create and execute a Tuning Task based on the Tuning Set
  3. Accept the simple profile recommendations from the Advisor
  4. Create a SQL Plan Baseline from the Tuning Set and with it all the new profiles

What happened was that I ended up with a bunch of standalone profiles from the Advisor – see DBA_SQL_PROFILES and/or SQLOBJ$ of object type 1, looking at hints in COMP_DATA of SQLOBJ$DATA or packing to staging table and inspecting there.

And I ended up with a bunch of SQL Plan Baselines with hints from the old plans for statements that I had run through the Advisor because they were rubbish (DBA_SQL_PLAN_BASELINES and SQLOBJ$ of object type 2, looking at hints in COMP_DATA of SQLOBJ$DATA or packing to staging table and inspecting there.)

Quick question – what happens if you have some SQL Plan Baselines with some “bad” hints whilst there also exist some standalone sql profiles with some “good” hints?

From my observations, the Baselines will win. The bad plans will be used. However, because when baselines are being used, on hard parse the optimizer will generate a plan anyway and record any differences in plans generated. So when generating the plan anyway, the standalone sql profiles kick in and so the baseline will contain unaccepted “better” plans ready to be evolved for subsequent executions (unaccepted depending on whether you’re runing with automatic evolution or not).

And back to what I should have done initially and that’s:

  1. Create a Tuning Set from some statements in AWR
  2. Create a SQL Plan Baseline from the Tuning Set
  3. Create and execute a Tuning Task based on the Tuning Set
  4. Accept the simpler, non-parallel profile recommendations from the Advisor

This way the profiles get created not as standalone profiles but part of SQL Plan Baselines – SQLOB$ object type 2 – and accepted and enabled in DBA_SQL_PLAN_BASELINES (FLAGS=11 in SQLOBJ$).

I’ll back this all up with some proof and isolated examples one day.
At the minute, everything’s just too manic, manic trying to get this stuff to work, manic fighting management FUD about just about any new feature since Oracle 9…


Advisor to Baseline – am I missing something?

I’ve mentioned before that I’m in the middle of an upgrade to

I’ve been looking at the SQL Tuning Advisor, looking at Baselines, running into a performance crippling bug with baseline capture, running away from baselines, and now coming back to baselines now that the bug has been approved by the client internally.

Here’s what I want to do:

  • Create a Guaranteed Restore Point
  • Run “The Batch”
  • Run a Tuning Advisor on the top N statements
  • Accept some of the Advisor recommendations
  • Export these recommendations
  • Flashback to restore point
  • Import
  • Rerun same “The Batch”

(It may be that RAT would be a good solution for this cycle – we’re not licensing it I believe)

So, running the batch – done.

Some performance degradations – check.

Given 30 minutes to look at some pretty big statements statement, the good old SQL Tuning Advisor can come up with far more than I can (there’s still a place for manual tuning, of course).

So, if I take from AWR the top N statements by some metric from “The Batch” and give the SQL Tuning Advisor a few hours to chunter away, it should be a really positive contribution in a short space of time.

First up, this is a nice little function – DBMS_SQLTUNE.SELECT_WORKLOAD_REPOSITORY.

So, I know the AWR snaps that cover “The Batch” so to get the top 50 statements for example by elapsed time from AWR, I do this:

FROM   TABLE(dbms_sqltune.select_workload_repository
             (begin_snap        => 4005,
              end_snap          => 4011,
              basic_filter      => 'parsing_schema_name in (''ABC'',''XYZ'')',
              ranking_measure1  => 'elapsed_time',
              result_limit      => 50,
              attribute_list    => 'ALL'
              )) p;

I’ve not included the output because it’s a bit wide. Suffice to say, I’ve got my 50 statements.

I can load these up into a SQL tuning set like this:

    sqlset_name => 'MY_11G_TUNING_SET', 
    description  => 'My 11g tuning set');
 OPEN baseline_cursor FOR
      FROM   TABLE(dbms_sqltune.select_workload_repository
             (begin_snap        => 4005,
              end_snap          => 4011,
              basic_filter      => 'parsing_schema_name in (''ABC'',''XYZ'')',
              ranking_measure1  => 'elapsed_time',
              result_limit      => 50,
              attribute_list    => 'ALL'
              )) p; 
             sqlset_name     => 'MY_11G_TUNING_SET',
             populate_cursor => baseline_cursor);

And then I can set the advisor on this bad boy thus:

 l_task_name   varchar2(200) := 'MY_11g_TUNING_TASK';
 l_sqlset_name varchar2(200) := 'MY_11G_TUNING_SET';
 l_task_name :=
 (sqlset_name  => l_sqlset_name,
  task_name    => l_task_name,
  time_limit   => 15300);
 (task_name    => l_task_name,
  parameter    => 'LOCAL_TIME_LIMIT',
 (task_name    => l_task_name);

Note that I’ve given it a few hours to go off and do its stuff but I’ve restricted it to 30 minutes per statement using the TIME_LIMIT_DEFAULT argument.

Some time later, it’s done and I can get back the results like so:

SELECT dbms_sqltune.report_tuning_task('MY_11g_TUNING_TASK'') 

And I get a great big stonking report back. Again, no point copying it in here.

Suffice to say that it mentions a bunch of indexes, doing lots more work in parallel, but for now I don’t want to rush into accepting profiles running all 50 statements simultaneously with a DOP of 48 🙂 – I’m just interested in the simpler SQL profiles it is recommending.

Next, I accept the sql profiles.

I can see them in dba_sql_profiles.


That would do me fine. I can do this. Initial aims up top met.

But, I thought I’d be able to run the advisor, accept the profiles and somehow move all this stuff into a baseline, then move the baseline.

But it’s not working as planned.

I’ve already got the tuning set and I’ve accepted the profiles recommended.

I had thought that if I used DBMS_SPM.LOAD_PLANS_FROM_SQLSET that it would use the information from the profiles.

But that doesn’t seem possible.

What’s in the baseline is the hints to get the original plans from the SQL statements concerned – i.e. the hints to get the plans which the Advisor subsequently poo-pooed as trumpable with a profile.

And I think it’s because I need to run the SQL with the profiles so that the appropriate plans exist against the relevant SQL, then create a tuning set of those sql statements and their plans and then load that into a baseline which can use the specific hints for a specific executed plan.

Of course, I might have got it the wrong way round. I know a baseline re-optimises statements at hard parse time and the SQL Tuning Advisor can run on baselines.

But it definitely seems at the moment that the initial baseline population needs an executed plan.

More later perhaps if I decide to knock up a specific isolated example to demonstrate or work it out … but not today though. It’s Friday after all.

Real Time SQL Monitoring

I’m a sucker for convenience.

I’ve always liked the /*+ GATHER_PLAN_STATISTICS */ hint when followed by


Yes, I know you can get the same information via other means but I don’t care, I love it (having regressed to a 9i site for nearly a year and only back to a civilised Oracle version for a couple of weeks, I’m probably guilty of a little overexuberance – it’s not often that can be said about me).

There’s always been a slight conundrum though in that you need to wait for your SQL to complete before you get your execution plan and runtime statistics back.

But, in 11g, wait no more.


It’s cool, it’s convenient. What’s not to like?

I know I’ve been back off 11g for about a year now, but how did I miss this previously?

This is old news – but how come more people don’t use it? (Other than they’re not on 11, of course).

Obviously, there are GUIs and buttons and HTML versions, etc.

Me – I like my command line and text reports 🙂

Anyway, just to demo an example.

I’ve pasted the text output of the report into Excel because

  1. WordPress only allows me upload a small set of filetypes and
  2. Excel was the only one that could cope nicely with the width (“never mind the quality, feel the width” as they say).

This demo was done at the end of the SQL – i.e. it wasn’t real-time monitoring but if you do it whilst the SQL is running, there’s extra progress information which comes from V$SESSION_LONGOPS – i.e. roughly how far through that HASH JOIN are we?

Also, this statement was run in parallel and it produced a really nice summary of parallel execution details.

SELECT dbms_sqltune.report_sql_monitor('8dwq85mf22bs2') 
FROM   dual;

can produce a report a bit like this

Other links:
Performance Tuning Guide
Kerry Osborne – Real time SQL Monitoring
Greg Rahn – Real-time SQl Monitoring Using…

IO Calibration

I’ve mentioned elsewhere that I’m in the early stages of an upgrade – on 32 bit Solaris to on 64 bit Linux.
(So, not exactly an illustration in implementing change incrementally. 🙂 )

Initial run of application code was slow, very slow. Across the board. Some plans had changed. But average IO seems much slower than the benchmarks using for comparison.

Statspack shows that a single block read takes on average 4 ms in production.

For 11gR2 on the new kit (which isn’t destined for production but which is at least meant to be used to provide a ballpark comparison), AWR shows average single block read times of more like 14ms.

But we’re not really comparing apples with apples. So many things are different, it’s more like apples and oranges – massive jump in Oracle version, new kit, different OS, different configurations, direct IO, etc, etc.

I have been advocating simplicity when approaching problems, especially with so many changes – divide & conquer, distill & simplify.

For example, let’s say that IO looks slow.

For now, let’s not use the application batch to test kernel settings, ODM, etc
(Don’t get me wrong – it will be important to use a workload absolutely representative of the the application, I just don’t think it might not be now).

Let’s benchmark using, for example, Oracle ORIONto see if we can achieve IO rates that someone, somewhere must know we need – they’ve purchased the kit after all – and then use that to test changes, use that to compare new environments against a benchmark, etc.
(Then, when we’re happy with simpler things, we can start adding back in and piecing things together again)

Anyway, today I came across an alternative to Orion that I can’t recall being previously aware of.


I like – it seems pretty neat to me.

Yet another feature which seems to have sneaked under my radar somehow (along with real time SQL monitoring – a quick blog post mention which I’ve half written but not published yet).

What does this CALIBRATE_IO do?

It’s like an easier way to do the ORION thing.

From the CALIBRATE_IO.PDF available via Metalink Doc Id 727062.1 “Configuring and using Calibrate I/O”:

When Calibrate I/O is invoked it will generate I/O intensive read-only random I/O (db_block_size)
and large-block (1MByte) sequential I/O workloads. Unlike various external I/O calibration tools, this
tool uses the Oracle code stack and runs in the database, issuing I/O against blocks stored in the
database. The results, therefore, much more closely match the actual database performance.
Once the workload execution is completed, a summary of the results is provided.

I ran it like this (DBA for env is away, SysAdmin sort of suggested 15 as NUM_DISKS but when I ran with 15 initially, it ran forever and it looked like the waits weren’t instrumented properly.)

SQL> declare
  2    l_latency   integer;
  3    l_iops      integer;     
  4    l_mbps      integer; 
  5  begin    
  6    dbms_resource_manager.calibrate_io
  7    (5,10,l_iops,l_mbps,l_latency);
  8    dbms_output.put_line ('I/O Ops/sec = '||l_iops);
  9    dbms_output.put_line ('Actual Latency = '||l_latency);
 10    dbms_output.put_line ('MB/sec = '||l_mbps);
 11  end;
 12  /

And got the following output:

I/O Ops/sec = 52
Actual Latency = 18
MB/sec = 93

You can also get this information by querying DBA_RSRC_IO_CALIBRATE:

SQL> select * from dba_rsrc_io_calibrate;

START_TIME                        END_TIME                            MAX_IOPS   MAX_MBPS  MAX_PMBPS    LATENCY NUM_PHYSICAL_DISKS
--------------------------------- --------------------------------- ---------- ---------- ----------
04-MAR-11 PM      04-MAR-11 PM              52         93          7         18                  5

To be honest, I have no feel for some of these numbers – no frame of reference.
But it’s not great, right?

Some other articles on CALIBRATE_IO:
Tim Hall – Measuring Storage Performance for Oracle Systems
Arup Nanda – Resource Manager IO Callibration in 11g

11g extended SQL_TRACE event

Not for the first time, I’ve gone looking for more detail on this extended functionality in 11g.
And not for the first time, I’ve started off looking for dbms_monitor extended functionality and not found it and then wasted time trying to remember what I’m looking for:

Note to self – what I’m looking for is not dbms_monitor but simply:

alter session|system set events 'sql_trace[SQL:<sql_id>]...';

For further details, see:

Upgrade experiences

I’m planning for an Oracle upgrade, 9i to 11gR2. Cross-platform – Sun Solaris to Linux. Different Endian.

It’s just under 1TB so not big these days.

Downtime shouldn’t be too much of an issue. It’s not like it needs to be done in an hour or anything demanding like that. As long as it can be done within a day.

Within the client, there’s no official build for 11g on Solaris, so I think that might rule out solutions like an in situ upgrade to 11g on Solaris followed immediately by the move & conversion to Linux.

Imp/Exp seems to be the recommended approach given all of the above.

There are lots of resources out there to help with an upgrade.

So, no shortage of advice and guidance.

But I’m also interested in hearing some real world experiences.

So, if you’ve been through something similar, please let me know.

Tell me how you did it and how it went.

Not waiting and not on SQL*Net Message from client

I’m in the early stages of investigating a performance degradation on, just tracing the session at the moment and waiting for it to complete.

This is distilled from a POC for an application persisting generic messages off a message bus.

At the moment, there’s no requirement to query/inspect the messages themselves so they’re being passed down as a BLOB.

I’ve reproduced the problem with a standalone piece of PL/SQL which removes any distractions from the rest of the application tech stack. This will be useful to raise an SR if necessary.

Performance degradation for a collection of 1000 blobs goes from roughly 1-2 seconds on a corporate DB or on on my laptop to currently over 1 hour on corporate DB.

Obviously there are a lot of variations between environments and versions that might account for reasonable differences in performance.
I’ve not yet started investigating any of these differences in settings, parameters, etc but there’s something very bad about this sort of degradation.

Current session wait says that I’ve been waiting on SQL*Net message from client for pretty much all this time. So, a good illustration that this is wait is not always an idle event.

Edit 1
I’ve been a bit dim on this one. I’m NOT waiting on SQL*Net message from client. But I am in a strange place somewhere.

I was being lazy, using Toad to monitor and not looking at the information properly – stupid, stupid, stupid (not that Toad doesn’t display the requisite information properly, just that when you look at things from different perspectives sometimes you miss something that you would have seen when looking at it in a different way).

My session is active. But my wait state is WAITED_KNOWN_TIME so that wait was the last known wait, WAIT_TIME is > 0, SECONDS_IN_WAIT is ticking up to 20000.

Session statistics show that session is doing stuff.

Need to look at this from OS upwards – oradebug, pstack, etc. Or get someone to do it because I don’t have access 😦
End Edit 1

I’m currently gathering the evidence. It’s a bit slow because I don’t have full access to the 11g environment.

SQL to reproduce is below.

DROP TABLE sr_payload;
DROP TYPE tt_sr_payload;
DROP TYPE to_sr_payload;

CREATE TABLE sr_payload
(id         RAW(16) NOT NULL
,lastupdate DATE    NOT NULL
,payload    BLOB    NOT NULL)
LOB (payload) 
STORE AS sr_payload_payload
ALTER TABLE sr_payload ADD CONSTRAINT pk_sr_payload PRIMARY KEY (id);

(id         RAW(16)
,lastupdate DATE
,payload    BLOB);

CREATE OR REPLACE TYPE tt_sr_payload AS TABLE OF to_sr_payload;

  PROCEDURE p_upsert_payload_mtc (
    i_payload                  IN     tt_sr_payload
  PROCEDURE p_upsert_payload_itc (
    i_payload                  IN     tt_sr_payload

  PROCEDURE p_upsert_payload_mtc (
    i_payload                  IN     tt_sr_payload
     INTO  sr_payload p
     USING (SELECT id         id
            ,      lastupdate lastupdate
            ,      payload    payload
            FROM   TABLE ( CAST ( i_payload AS tt_sr_payload) ) )x
     ON    (p.id = x.id)
            SET    payload    = empty_blob()
            ,      lastupdate = SYSDATE
          (      id
          ,      lastupdate
          ,      payload)
          (      x.id
          ,      x.lastupdate
          ,      x.payload);
  END p_upsert_payload_mtc;
  PROCEDURE p_upsert_payload_itc (
    i_payload                  IN     tt_sr_payload
     INTO   sr_payload
     (      id
     ,      lastupdate
     ,      payload)
     SELECT x.id         id
     ,      x.lastupdate lastupdate
     ,      x.payload    payload
     FROM   TABLE ( CAST ( i_payload AS tt_sr_payload) ) x
                        FROM   sr_payload p
                        WHERE  p.id = x.id);  
  END p_upsert_payload_itc;
END pkg_payload_sr;

 l_start NUMBER;  
 l_stop  NUMBER;  
 l_pl    tt_sr_payload := tt_sr_payload(); 
  FOR i IN 1 .. 1000 
      l_pl(l_pl.LAST) := to_sr_payload(SYS_GUID,SYSDATE,TO_BLOB('12312312312'));  
  dbms_output.put_line('  ');
  l_start := dbms_utility.get_time(); 
  l_stop  := dbms_utility.get_time(); 
  dbms_output.put_line('Start pl: '||l_start); 
  dbms_output.put_line('Stop  pl: '||l_stop); 

DROP PACKAGE pkg_payload_sr;
DROP TABLE sr_payload;
DROP TYPE tt_sr_payload;
drop type to_sr_payload;

Edit 2
Some results from ps and pstack** shows:

XXXXXXX$ ps -o user,pid,s,pcpu,time,etime,wchan,comm -p 27974

oracle   27974 R 99.2 05:36:56    05:39:35 -      oracle

XXXXXXXX$ pstack 27974
#0  0x000000000718ab58 in kolrgrfc ()
#1  0x00000000019fa69f in kolrrdl ()
#2  0x00000000019fa7ce in kolradc ()
#3  0x0000000000e0df3f in __PGOSF350_kokegPromoteLob ()
#4  0x0000000000e0decb in kokegApplyOp ()
#5  0x0000000000e0de8e in kokegOpnGarbagePromote ()
#6  0x0000000000e0dc14 in kokegGarbageCollectBinds ()
#7  0x000000000705edb3 in kxsiis ()
#8  0x00000000070d0074 in __PGOSF445_qerltcNoKdtBufferedInsRowCBK ()
#9  0x000000000151924b in qerltcLoadStateMachine ()
#10 0x000000000151b64f in __PGOSF455_qerltcInsertSelectRop ()
#11 0x0000000003668205 in qerstRowP ()
#12 0x0000000003668205 in qerstRowP ()
#13 0x00000000014cd178 in qerjoFetch ()
#14 0x00000000036683f6 in qerstFetch ()
#15 0x0000000006f60b39 in rwsfcd ()
#16 0x00000000036683f6 in qerstFetch ()
#17 0x00000000070ce5d2 in qerltcFetch ()
#18 0x00000000036683f6 in qerstFetch ()
#19 0x000000000702a1db in insexe ()
#20 0x00000000070dc89a in opiexe ()
#21 0x00000000070e4513 in opipls ()
#22 0x0000000006fc979a in opiodr ()
#23 0x00000000070671b9 in __PGOSF123_rpidrus ()
#24 0x0000000007189474 in skgmstack ()
#25 0x0000000007067525 in rpidru ()
#26 0x00000000070667c0 in rpiswu2 ()
#27 0x0000000007065de3 in rpidrv ()
#28 0x000000000704e8c3 in psddr0 ()
#29 0x000000000704e594 in psdnal ()
#30 0x00000000071ce501 in pevm_EXECC ()
#31 0x00000000071c74ca in pfrinstr_EXECC ()
#32 0x00000000071c6721 in pfrrun_no_tool ()
#33 0x00000000071c51bd in pfrrun ()
#34 0x00000000071cb2ba in plsql_run ()
#35 0x00000000071c1f54 in peicnt ()
#36 0x00000000071c11fa in kkxexe ()
#37 0x00000000070dee11 in opiexe ()
#38 0x000000000154dbc1 in kpoal8 ()
#39 0x0000000006fc979a in opiodr ()
#40 0x000000000713cd4b in ttcpip ()
#41 0x0000000000e1630f in opitsk ()
#42 0x0000000000e18c50 in opiino ()
#43 0x0000000006fc979a in opiodr ()
#44 0x0000000000e12244 in opidrv ()
#45 0x00000000011b393a in sou2o ()
#46 0x0000000000920183 in opimai_real ()
#47 0x00000000011b7871 in ssthrdmain ()
#48 0x00000000009200af in main ()

Reading from bottom to top and using Metalink note 175982.1, the kol functions deal with “support for object Lob buffering , object lob evaluation and object Language/runtime functions for Opaque types” – sounds about right for my test case.

* I’m not saying that sending down a collection of 1000 blobs is necessarily a good thing but I’m just using it to compare good and bad.

** This is just a test case on a dev box. Note what Tanel has to say particularly about using pstack on Linux on production.