Are my Stored Outlines being used? Why not?

I seem to be blogging today about loose ends from previous blog articles that I’ve not tidied up 

Frustrations with Stored Outlines today.

Yesterday I mentioned how Stored Outlines offered a solution to fixing a SQL performance problem within an application generating SQL via an ORM layer (i.e. no or little control over the SQL concerned).

It all seemed so simple. Create outlines on a “good” database, export them, import them and enable the usage of them on the “bad” database.

The first attempt was fine with an outline against one bit of SQL. It did exactly what it said on the tin. However subsequent attempts have been less than perfect, much to my disappointment because it encourages “the fear” and vindicates the position of sceptics who would resist this approach. I am 99% sure that there is a simple example of user error at work here (where user=me).

This morning I had some inconsistencies in usage of the outlines and I put it down to TOAD. I’m pretty sure it was TOAD in that two apparently identical SQL statements (copied and pasted) run one from SQL*Plus and one from Toad had different hash_values in V$SQL. And I put it down to some sort of wrapper or something else internal to do with how TOAD itself works. Anyway, moving on from Toad.

There are three main “tools” that I have been using to figure things out (apart from Toad and SQL*Plus) – V$SQL, extended session tracing and the OUTLN.OL$ table.

V$SQL is cool because it has a column OUTLINE_CATEGORY which is some sort of indication of whether the SQL used a Stored Outline.
And OUTLN.OL$ is good because it because it shows me what outlines are available for what SQL in what outline categories.

But what I’m struggling with is that if the usage of Stored Outlines is enabled (via alter session set use_stored_outlines=%lt;category_name>) and if the outlines/outline categories were there in the OUTLN tables, and if the hash_value of a SQL statement in V$SQL and in the extended trace file was the same as that in the OUTLN.OL$ table, why would that not be used?

I dunno. Not yet at least. Further tracing and investigation required tomorrow. I know that they are not being used now (since my first couple of outline extractions) primarily because of performance of the SQL on the “bad” database. It’s bad. Plus I can see from V$SQL and the trace that it has not been used.

I am currently not sure whether tracing event 10053 shows any information about Stored Outlines. I will see tomorrow, can’t investigate further at the moment. However, from the tracing I did earlier I was coming across the fact that you can’t do a 10053 trace against SQL which is embedded in PL/SQL. Not in 8i at least.

Which then makes my investigation slightly more complicated / artificial because the SQL has got “:1” bind variable placeholders in it. So I can’t run it via SQL*Plus because you can’t create a variable called 1. And I can’t change the application code. And whilst I have a PLSQL harness that does allow this, I then have the 10053 trace issue. News of tomorrow’s success to follow later, I hope.


Application Hanging

Today a problem with the application “hanging”.

The background is that there has been some development in an application against database A.
Database A is now out of action due to a disk controller failure of something akin to that and so the application is now pointing at database B.
When pointing at database A, the application was performing acceptably, not so when pointing at database B where it is “hanging”.

So, straightaway, in response to one of the questions we should ask ourselves when there is a sudden problem – what has changed? – we know that there has been a change in database and the immediate follow-up is what might the differences be between database A and database B.

Fortunately, one of the differences is already known to me and, given the symptoms, is jumping up and down, waving its hands and shouting ‘Look at me’.

Both database A and database B are relatively recent copies of the production database.
All 8.1.7 by the way.
However, the production database has a bad mix of some tables having statistics (some stale) and some having none which can lead the CBO to make some inappropriate decisions due to the difference between default values it uses for those objects without statistics and the data distribution reality.

Database A had had a statistics update subsequent to the refresh.
Database B had not.

Therefore it is highly likely that the symptom of the application hanging is a less than optimal plan on the SQL concerned.

For the application operation that is hanging, quite a few bits of SQL might be issued.
There might be more than one bit of SQL that is performing unacceptably but in any event, we need to work through each suboptimal performer as we find it.

The application mostly uses ORM to generate its lovely queries (although in this case the development that has occurred is to optimise some of the iterative, slow-by-slow dynamic SQL that is generated).

Therefore the best mechanism is to trace the database session concerned, replicate the “hanging” behaviour and see what it is doing. What I am expecting to find is that, on “hanging”, the last piece of SQL in the trace file will be doing more IO than it needs to.

So, a trigger is required which sets the extended trace event for a specific user:

IF USER = 'yyyyy'
EXECUTE IMMEDIATE 'ALTER SESSION SET events ''10046 trace name context forever, level 12''';

The next step is to reproduce the “hanging” behaviour via the application.
Sure enough this generates a trace file on the server which is doing many, many “db file sequential reads”.
More than normal? (where normal is the behaviour when it performs acceptably)
Don’t know for sure yet, but it’s a good bet.

The SQL in question is something like this:

FROM News t0
FROM nolindex T2
, NewsIndex T1
WHERE T3.destindexid = :1
AND T1.sectionid = T2.indexid
AND T2.indexid = T3.sourceindexid) v1
WHERE t0.ID = v1.newsid

The trace file is incomplete because, due to all the IO, it reached its size limit. Also the application was killed because we had the information we needed.

On database B (some stats but not all), running the SQL with a SET AUTOTRACE TRACE EXPLAIN reveals the following:

Execution Plan
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=216293 Card=59764261584 Bytes=196445127826608)
1 0 MERGE JOIN (Cost=216293 Card=59764261584 Bytes=196445127826608)
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'NEWS' (Cost=826 Card=1515434 Bytes=4961530916)
3 2 INDEX (FULL SCAN) OF 'PKNEWSTABLE' (UNIQUE) (Cost=26 Card=1515434)
4 1 SORT (JOIN) (Cost=50339 Card=3943706 Bytes=51268178)
5 4 VIEW (Cost=165128 Card=3943706 Bytes=51268178)
6 5 SORT (UNIQUE) (Cost=165128 Card=3943706 Bytes=256340890)
7 6 NESTED LOOPS (Cost=24 Card=3943706 Bytes=256340890)
8 7 MERGE JOIN (CARTESIAN) (Cost=24 Card=2589433 Bytes=134650516)
9 8 TABLE ACCESS (BY INDEX ROWID) OF 'IMPORTERS' (Cost=2 Card=11 Bytes=286)
11 8 SORT (JOIN) (Cost=22 Card=235403 Bytes=6120478)

There are some big numbers in there.

Next step is to run against database C which should be similar to database A before it died – i.e. refresh from production with statistics on more objects:

Execution Plan
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=206 Card=89 Bytes=26700)
1 0 NESTED LOOPS (Cost=206 Card=89 Bytes=26700)
2 1 VIEW (Cost=28 Card=89 Bytes=1157)
3 2 SORT (UNIQUE) (Cost=28 Card=89 Bytes=2314)
4 3 NESTED LOOPS (Cost=26 Card=89 Bytes=2314)
5 4 NESTED LOOPS (Cost=2 Card=8 Bytes=120)
6 5 TABLE ACCESS (FULL) OF 'IMPORTERS' (Cost=2 Card=8 Bytes=80)
9 1 TABLE ACCESS (BY INDEX ROWID) OF 'NEWS' (Cost=2 Card=4695495 Bytes=1347607065)
10 9 INDEX (UNIQUE SCAN) OF 'PKNEWSTABLE' (UNIQUE) (Cost=1 Card=4695495)

Some much smaller numbers and completely different access paths.

The numbers themselves are not an issue – it’s the different access paths that is the problem.
What I mean is that if, via hints for example, we forced the second access path on the poorly performing database, explain plan would still have much bigger numbers.

And so, for possible resolutions, we have a number of options, at least four, possibly more:
Option numero uno: Gather appropriate statistics on all objects so that the CBO has a decent chance.
Option numero the next one: Get rid of the partial statistics so that we have none and go back to using RBO.
Option three: Hint the SQL in the application.
Option four: Gather a stored outline for the query.

Options one and two can be discounted due to organisational “fear” of change. I would say that option one – all objects with appropriate statistics – is much preferable to option two. However, neither is a go-er due to fear that this might have a wider detrimental effect. (Incidentally, option one was attempted previously but the deployment was backed out because, according to rumour, the performance of something somewhere degraded)
Option three might be ok. The downside from an application perspective is that to put the hints in requires an application deployment, an application stop and start, and to take it out requires the same. As part of a wider strategy for this particular application, its suitability is questionable because it’s ORM generated code and therefore it’s probably not possible to control the hinting.
That leaves option four – capture a stored outline for the query on the “good” database, export, import to the “bad” database(s) and enable usage thereof. Might require some shared pool flushage but easily turn-on-and-offable, can be repeated easily for other problematic queries.

And just for completeness, if we were going for option four, the steps would be to first create the stored outline on the “good” database:

FOR CATEGORY <category_name>

(Note that the text of the sql statement must match exactly that that will be executed by the application.)

Next, export the outline from the “good database”, e.g.

exp userid=outln/ query="where category = ''<category_name>" tables=(ol$,ol$hints)

Then, import into the “bad” database, e.g.

imp userid=outln/ full=y ignore=yes

Finally, maybe use a trigger to enable the usage of the stored outline for the problem application user:

IF USER = yyyyyyyyyyyyy

Using Stored Outlines to figure out a hint or two

Stored Outlines capture a set of hints which direct the optimizer how to execute a specific query.

I’ve already mentioned previously on a plan here to use these for Optimizer Plan Stability for an upgrade from 8i to 10g (however I think the current thinking is that we’re just going to wing it).

However, I’ve also mentioned that in our current 8i database some objects have some statistics (mostly stale) and some objects have no statistics, and when you mix both into a query the optimizer makes some potential dodgy guesses about how much data is in the objects without stats. And the same guesses/defaults apply to the usage of GLOBAL TEMPORARY TABLES and CASTing ARRAYs to with the TABLE operator.

As a result, some recent queries have been performing suboptimally.

Like most, I prefer to not use hints where possible and recognise that in most cases it is much better to leave a well-informed optimizer to make its own mind up, being a clever little thing with access to lots of information that I don’t. However, unfortunately, there are circumstances where it needs a bit of a hand and that’s where hints come in.

I don’t have a comprehensive understanding of hints and join mechanisms, but I’m working on it slowly. And I certainly can’t look at a query and visualise how it should all come together, well, not unless it’s pretty darn simple.

What I often find in a more complex query is that I’ll try to add in one or two hints and things might improve but they don’t quite arrive where I’d like them to.

So, and I have touched on the titled side-effect usage of Stored Outlines very briefly in another previous post, I tend to use Explain Plan in conjunction with Stored Outlines on a healthy development system (one with relevant statistics from like production data volumes and distribution) to tell me how it a fully-informed optimizer would do things:

create or replace outline myoutline
for category mycategory
on select.....

And then look at the hints here:

select * from user_outline_hints order by stage;

And this might involve creating a dummy table with some specific data in there, fully analysing it, and substituting that for a collection CAST or a GLOBAL TEMPORARY TABLE in the SQL before capturing the outline.

And then I tend to extract some key hints in the right order and bung them into the original SQL. Usually with the desired effect.

This won’t work in all scenarios but it does some/most of the time. In my situation, at least.

I do feel deeply uncomfortable using hints on SQL just because the statistics situation in a production database is not good. And obviously, in most cases that I’m talking about, these are workarounds precisely because of that. But sometimes the perceived risks of making changes (in this case here, sorting out the statistics and in fact, more like ‘often’ than ‘sometimes’ everywhere in IT these days) cause a stalemate that is difficult to break free from.

Autotrace – no cost, part II

I wrote just a bit earlier about how a lack of statistics on some objects in one system was causing the CBO is use some defaults and generate a sub-optimal plan, especially compared to a similar system which does have accurate statistics for all objects.

I tried adding various hints to the SQL in the “bad” system to get to the same plan as the “good” system but, whilst nearly there, I wasn’t quite nailing it.

As I’ve touched on in previous plans, I’ve been looking at stored outlines as part of the strategy to upgrade to 10g. So, I captured the stored outline for the statement in the “good” system using:

create or replace outline myoutline
for category mycategory
on select.....

I was not interested in exporting and enabling this in the “bad” system. But by looking at the outline views in the data dictionary, I was able to more easily figure out which hints to put where in what order.


Stored Outlines

The usage of Stored Outlines offers a number of benefits to my current client.

Firstly, there is a perceived sensitivity regarding the database. It seems, historically, that any changes to the database have led to significant performance degradation. I can’t argue with this, not having been here very long nor having witnessed any such deployments. These people are not at all daft and so I believe them.

There are plenty of changes to be done, however.

So, my theory is that Stored Outlines will allow us to have some Optimizer Plan Stability, even if those captured plans are not as good as they might be.

Which brings me on to the second benefit. I believe that some of the dodgy performance on the database is due to there being incomplete stats – some old, some objects have no stats, etc. Unfortunately, a deployment to save and calculate stats was rolled back due to performance degradation issues. Therefore, I figure that we can use stored outlines to preserve the existing plans whilst introducing some proper stats gathering.

By dint of using stored outlines, we will have captured much of the SQL that various applications issue – so we will have a better understanding of what is connecting and using the DB.

Then we will be able to review much of this SQL and review how it is executing, review any missing indexes and having gathered stats we can start to turn off certain outlines.

One thing is key to this and that is CURSOR_SHARING. Some of the connecting applications do not use bind variables and many of the bind variables that they do use are the sequence number generated ids of news articles. Therefore everyday, the actual values being used change. So, for stored outlines to work at all we need to implement cursor_sharing and hope that that that in itself does not lead to performance degradation – it shouldn’t but who knows. You can’t assume anything.

One of the challenges will be capturing the Stored Outlines. It would not be recommended practice to gather these in production. However, there is no non-production system here that can generate realistic usage to get the sort of coverage that we would need. So, it will have to be production if at all. I’m very concerned about the overhead of enabling the capture in prod.