Strategies for Minimising SQL Execution Plan Instability

Execution Plan Instability – What is the problem?

The Oracle Optimizer is a complex piece of software and with every release it becomes more complex.


In the beginning, the Optimizer was rule-based.

The Optimizer had a ranked list of heuristics used to optimize a query, picking the lowest ranked rule available.

This rule-based mode, whilst still in use with some internal Oracle dictionary queries, has been unsupported since version 10.1.

This means that no code changes have been officially made to the RBO and no bug fixes are provided. There are many features that the RBO is unaware of.

Applications should not still be using the rule-based optimizer.


The Cost-Based Optimizer is designed to evaluate a number of execution plans and pick the one estimated to be the fastest.

Many of the CBO features are designed to combat common optimization problems and many of these problems occur where development teams are not necessarily aware of the full implications of the features that they are using.

These built-in default behaviours roughly conform to the infamous 80%:20% rule in that most of the time they do a good job but they are not infallible.

Bind Variables, Literals and Histograms

Most of the features which are deliberately designed such that plan instability is difficult to avoid stem from the decision to use bind variables or literals (NOT that they should be mutually exclusive) and the interaction of the former with histograms.

In article 1 of his Philosophy series, Jonathan Lewis neatly sums this up:

Histograms and bind variables exist for diametrically opposed reasons – they won’t work well together without help

Fundamentally, bind variables exist to provide shareable plans.

Bind variables should be used where we are not interested in getting specific optimizations for differing parameters.

Literals should be used where we want the Optimizer to pay particular attention to data distribution and skew for the specific parameters supplied.

A SQL statement will/should often have some parameters which should be literals and some which should be binds.

From this point onwards, there has been a whole raft of features designed to treat the misuse of one or the other.

In Oracle, histograms were introduced.

Histograms exist to provide specific data distribution information, particularly relevant to specific parameters.

Also, in that version, we got CURSOR_SHARING – targeted at applications using literals instead of binds such that SQL which was identical part from the use of binds was rewritten to use sytem-generated bind variables.

Then in 9.2, we got bind variable peeking.

This feature was introduced so that the optimizer could peek at the values supplied at parse time and use data distribution information specific to these parse-time values to generate an execution plan which suited those values.

In addition and at the same time, through these various versions to present day, we have had the default behaviour of DBMS_STATS statistic gathering to let the database decide which columns it will create histograms on, based on the SQL which has been running.

This means that new histograms can suddenly spring up – or existing histograms unexpectedly disappear – on all sorts of columns. This can be problematic on columns with large numbers of distinct values AND particularly so on join cardinalities where there may be a mismatch of histograms on both sides of the join.

Ever since this point, we have had a conflict of interest in feature usage and an ever increasing number of additional functionality to battle against this conflict – adaptive cursor sharing, cardinality feedback, etc, etc

Finally, the education message got blurred or lost somewhere along the line to the extent that a lot of systems blindly overuse bind variables because of the perceived performance impact of using literals.

This situation is not helped by the fact PL/SQL is designed to encourage bind variables.

Using supplied parameters as literals means using some construct of dynamic SQL, not difficult but nevertheless an added complexity and also another feature which is often blindly discouraged.

SQL Execution Plan Instability – Is SPM a viable approach?

SPM Overview

In Oracle 11g, Oracle introduced SQL Plan Baselines as a SQL Plan Management feature.

The core attraction of this functionality is that you can “lock in” a particular plan or plans for a SQL statement. This stores a set of outline hints and a specific plan hash value in relation to a specific SQL signature.

The Optimizer then uses that set of hints to try to reproduce the desired plan. If it can it will, if it can’t it will reject the hintset.

Additionally, the Optimizer completes its best-cost execution plan optimization anyway so that it can provide the ability to do a controlled evolution of baselined plans in the event that the lower-cost plan that it would have used is different and better performing than the baselined plan.

To use this the database parameters (session or system) just need to be configured to capture plans into a baseline and then use them.

There is flexibility to this capture/use. You can capture everything as it is generated; you could capture everything from memory now and/or at regular intervals and import into a baseline; you could capture everything from AWR into a SQL Tuning Set and then import into a baseline; or you could capture everything from another environment and export/import it into another.

And at any point, you can turn off capture and continue to use those which you currently have – this usage continues to capture any lower cost plans that the optimizer would have generated for any existing baselined plans

For a more detailed look at SPM and a comparison with SQL Profiles, see documentation.

Sounds good – why isn’t it the silver bullet for system-wide stability?

This approach might be good enough, but it is not a cast-iron guarantee of system-wide stability.

There are a number of reasons why not.

New & Changed SQL

Firstly, you need to have captured all your plans already. If you get brand new SQL, then there will be nothing in SPM.

Depending on your application, this may be a significant concern.

For example, consider an application making heavy usage of Hibernate which generates SQL statements.

A minor change in the domain model can mean a change of system-generated table alias in many statements.

As a result, you may suddenly get a lot of brand new SQL and significant numbers of baselined statements which you will now never see again.

What are the baselined plans based on? Are they the best plans? The only plans ever used?

If you suddenly baseline the plans for a large number of SQL statements, you are dictating which plan is to be used.

The plans will be based on the parsed plans in memory or in AWR at the time.
Are these the best plans?

Does/should this SQL statement always use this plan?

Are there normally multiple plans for different bindsets?

What if you normally get benefit from adapative cursor sharing?

ACS and baselines

What if your application benefits from adapative cursor sharing?

Sure, you can baseline multiple plans but these plans have no ACS information.

As soon as that ACS information is no longer in memory (as happens), there is no shortcut in a baseline to regain that, you still have to have the multiple executions required for the optimizer to recognize that which plans to use for which bindsets.

Parsing overhead

Widespread usage of baselines might, depending on your system performance profile, have a significant impact on the parsing resources.

This is because it always generates a best-cost plan anyway.

Then if that is not the baselined plan, it will use the baselined hintset to try to generate the specific plan hash.

In addition, it that is not possible, it will use just the optimizer_features_enable hint to try to generate the required plan.

So, you might in a heavily-baselined system to be doing 2x the parse work of a non-baselined system.

This might well be easily accommodated but there are systems where this would cause a problem.

Integration with development and testing processes

A SQL Plan Baseline is tied to a SQL statement based on the statement’s EXACT_MATCHING_SIGNATURE – a hash of the SQL statement which has been case and space normalized.

If a SQL statement materially changes, the baseline no longer applies.

How aware are developers of the presence of baselines?

And how to integrate with the development process?

How will our release process deal with baselines?

And if baselining large numbers of plans is being considered, then we have to think about where these will be generated.

The natural implication (of widespread baseline usage) is that new functionality being promoted to Prod would have a set of tested, baselined plans accompanying it and these would presumably have to be generated in a Prod-like environment which included Prod-like volumes.

SQL Execution Plan Instability – Decision Time

There is a decision to be made.

(And/or perhaps there is often a conceptual over-simplification by senior management to combat? Or at least a lack of deep understanding of the beast that we’re dealing with here?)

Do you want the Optimizer to try to get a better execution plan sometimes?

If the answer is yes, then you have to accept that it will get it wrong from time to time.

In particular, the various feedback and adaptive mechanisms are designed to recognize that they have got it wrong.

BUT they need that problematic execution in the the first place – sometimes more than one – to recognize that fact.

That one problematic execution could be your next Priority 1 incident.

In addition, the feedback mechanism is not perfect and it still can make subsequent executions worse in some circumstances.

SQL Execution Plan Instability – Turn it off?

IF your primary goal is plan stability – and I think many teams would claim this is this their goal but they do not embrace the implications of this – then perhaps a better decision is to turn off the various features which cause or combine to cause most of the problems of instability.

Appropriate usage of binds, appropriate usage of literals

Choose whether to use a bind variable or a literal as is appropriate for the value / column / SQL statement.

A SQL statement might have a mix of both.

DBMS_STATS defaults

A METHOD_OPT of FOR ALL INDEXED COLUMNS SIZE AUTO is an immediate red flag. This is never a good setting.

FOR ALL COLUMNS SIZE AUTO without table-level preferences (SET_TABLE_PREFS) is another red flag.

As an interim step, consider use FOR ALL COLUMNS SIZE REPEAT to lock in the current histogram usage.

The end goal should be to have table level preferences set for all tables.

This relies on knowing your data, your data distribution, your code, and knowing which histograms make sense (i.e. for skewed columns) – it will be far fewer than gathered by default.

For columns with significant numbers of distinct skew, it may be necessary to manually craft the histograms.

Volatile tables

Volatile tables should have stats set to an appropriate setting to generate appropriate plans for all situations and then those stats should be locked.

Stats which are gathered at any point during the volatility cycle may be good or may be problematic.

Similarly dynamic sampling can only see the data at the time of hard parse – you might be lucky and this is better than stats which say 0 rows but it can be a time bomb.

Turn off optimizer features

Turning off optimizer features might be best done via a LOGON trigger and turning such off for a subset of application users. These features include:

  • Bind Variable Peeking – off via _optim_peek_user_binds = false
  • Cardinality Feedback and ACS – should be disabled by turning off bind variable peeking but off via _optimizer_use_feedback = false, _optimizer_adaptive_cursor_sharing = false, _optimizer_extended_cursor_sharing_rel = “none”
  • Dynamic Sampling – optimizer_dynamic_sampling to 0
  • 12c additional adaptive features – e.g. adaptive execution plans

Additionally it probably makes sense to turn off the adaptive direct path read behaviour or anything with the word adaptive or dynamic in it or associated to it

This functionality decides on whether to do full segment scans via the buffer cache or not and the behaviour is a runtime decision depending on the size of the object relative to the buffer cache AND depending on how much of the segment is currently in the cache.

  • Adaptive direct path reads – _serial_direct_read = always

All too often I’ve seen a concurrently executed SQL statement switch to a “bad” plan involving a full table scan delivered via direct path reads stress out the IO subsystem because of the number of concurrent executions of that query which then affects performance across the DB.


The actions above are still not sufficient to guarantee plan stability but, for this goal above all else, this is likely to be the most appropriate action.

However, to further guarantee stability it is still likely that some degree of hinting – whether via manual hints, sql profiles or baselines – might be necessary for small numbers of SQL statements where the intial cost-based plan is not appropriate e.g. GTTs and other situations but it should be small number of statements.

SQL Execution Plan Instability – Summary & Opinion

The actions discussed above are made on the basis that we want to minimise the chance of execution plan instability at all costs.

By making this decision, we are prioritizing stability over all the features within Oracle designed to generate better plans for specific situations, sets of binds, etc.

Personally, I always recommend going with the default behaviour until such time as it causes a significant problem.

I also always recommend matching the scope of a solution to the scope of a problem.

For example, if we have a problem with one or two SQL statements, the potential solutions should be limited to those SQL statements.

We should never be making changes with a potential system-wide impact for the sake of a couple of SQL statements.

And even parameter changes can be injected to specific SQL statements either via a SQL Patch or via manual hints.

In my opinion, having multiple plans for a SQL statement is expected.

But what is under-appreciated is the extent to which this is normal.

These are normally only noticed when they cause a problem and the significant number of plans which regularly change plans without impact tend to go unnoticed.

It is also my opinion that SQL execution issues occur mostly when the SQL is badly written or when incompatible features are combined – e.g. peeked binds together with histograms – and I prefer to tackle that root cause rather than a generic concept of plan instability being inherently problematic.

A future path?

Might it be a good idea for there to be a built-in choice of optimizer “umbrella” modes – maximum performance vs maximum stability which would turn on/off some of the features above and/or adjust their sensitivity/aggressiveness?


Prod DBA 2.0 says no**

Dominic Delmolino has made a welcome return to regular blogging and it’s good to see that I’m not the only one forgetting the basics.

In the body and comments of Dominic’s*** first few posts back, the subject of database development is very much in focus.

So, DD got me thinking and here is my stream of random ramblings – they jump about a bit, it must be a Friday.

Production DBAs have no place on most development teams!

When was the last time you saw a production DBA in a scrum standup? (Genuine question – do you do that?).

Today’s production DBA is a busy man whether he’s 0.x (seen a lot of these), 1.0 (met a handful), 2.0 (heard tale of a few) 🙂

Some might say they’re too busy coming up withinsane “policies” and standards.

At the very least, even if he has automated everything that he can automate(*), with the vast estate of DBs that a small, modern production DBA team *should* be able to support, the ebb and flow of urgent production issues means that he’s unsuited to the predictability of the resource availability that modern agile project development demands.

That vast estate of DBs that a small, modern DBA *should* be able to support also means that the knowledge that the production DBA of any one specific application or schema tends to be very limited.

However, for most enterprise-level data-centric applications (caveats and clarities due to previous sweeping statements), the data model is still so important and it’s just not being done right in so many places.

Here is a link post to a selection of other writings on the subject.

What Dominic D says strikes a chord with me. (But I’ve got a vested interest. The role he describes is essentially what I do or what I try to find.)

Most agile development teams (caveat again – most agile development teams developing enterprise-level data-centric applications) need access to a database development specialist.

Someone who knows SQL, knows PLSQL (in the case of Oracle), knows performance and knows the features and functionality of a database ideally to at least the same level expected of a production DBA (albeit with different peaks and troughs in specific areas) but without the irregular demands on their time from urgent priorities.

Most of them need this to be able to build data models that scale.

And to keep everyone on track with the data model and the big data picture – this can quickly get lost in translation especially with commodity developers (plug and play).

Most of them need this to write or at least QA SQL and PLSQL that will scale.

Most of them need this to strike a balance between new features and stability.

Most of them need this as a regular interface to the production DBAs as a communication conduit and filter, and to uphold the requisite standards and policies for development to move into production.

And absolutely this database specialist should take full responsibility for what goes into production – it’s all about control and responsibility.

Two other thoughts occur (and I’m not sure I’d finished on the previous thought but it’s gone now, flushed).

Firstly, and I commented this on DD’s blog, there is belief that all developers are equal and that you can plug and play any developer into any project without any consequence. I don’t believe at all in this approach. I’ve never seen it work effectively apart from in the presentation layer – i.e. coding input fields and buttons, etc – and even then I’m far from convinced.

What I am definitely convinced about is that this in no way works for any layer that requires significant business knowledge and goddamit does that include the database.

On the subject of the second of the two thoughts mentioned above, the phone rang before I could write it down and so that thought has now disappeared. If I remember, I’ll update this. Until then my mind’s gone blank….

Oh yes, the second thought. Development DBA is sometimes the starting point for a junior DBA – en route to the production DBA. This is a nonsense given the control, responsibility and knowledge required. As mentioned they need to know at least as much as the typical production DBA.

*I’m not a production DBA. Never have been. And although I’ve thought about it, not sure I want to be. Development’s where it’s at for me. But how DBA teams have much automated? And automated well? Everywhere I’ve been recently, the DBAs are firefighting constantly. Or they’re refreshing environments. Or they’re just completely overloaded with stuff that sounds like it should be automated. I mentioned this to someone else recently and they commented that it was because a large percentage of the production DBA team were contractors and has no vested interest in automating. I don’t know if that’s true. Any thoughts? There should be no greater compliment than going into a company and improving it to such an extent that the manual intervention you were providing was no longer required. But I can see the conflict.

**Prod DBA 2.0 says no -> Not sure how many, people outside the UK are familiar with Little Britain. Think it was way overhyped myself, followed by way overplayed. Personally (as comedy appeal always is), although there have been exceptions like the Armstrong and Miller fighter pilot sketch, there’s not much sketch comedy that has regularly and consitently tickled my fancy in the last decade since the Fast Show.

*** Confusing – am I referring to Dominic Delmolino or have I taken to the convention of referring to myself in the third person.Did you know that the latter is known as Illeism? I didn’t.
Stylistic device or a form of narcissism?
My CV is in the third person and I’ve never been even 83.2% happy with that approach.
But I’ve never liked it when I’ve rewritten it with a personal pronoun – “I” – neither with the modern trend of no pronoun whatsoever, e.g. “Worked on an oil rig. Implemented XYZ. Put out lots of fires.”
It stems from my time at a consultancy – that was their style.
Then again, I don’t agree with the narcissism interpretation (e.g. a bit like celebrities and some sports persons).
If anything, for me at least, it’s completely opposite – a form of anti-narcissism, a self-demotion which I think of as being particularly British.
It can be easier to say “X achieved this” than “I achieved this” and maybe there’s a comforting distance between this CV character “Dominic Brooks” and myself?
Don’t know if there’s an established phrase of “take the blame but share the glory” but maybe it’s something like that …

Database Agility

A collation, for my benefit, of the thoughts of some others in this area:


YAGNI, it’s a very common term in development teams these days. Overused even.

However, in today’s more austere and economically challenging times, I have some new acronyms for you.

I’m not really up on popular acronyms so these may already exist in some form or another, but here are some more appropriate sayings that I’m saying/seeing more and more, together especially:

I Do Bloody Well Need It (IBLOWENI – ha ha) … but … NO-one’s Gonna Pay For It (NOGOPAFI).

Flexible demand-driven development

Outsourcing small projects or using some management strapline like “flexible demand-driven development model”, it never works.

This is the Nth place I’ve worked where they’ve decided that to be able to supply the non-linear development needs of the business, the next solution to try is to outsource individual projects to “strategic development partners”.

In my experience, this solution is normally the third to be tried after

  1. Hiring and firing contractors in line with demand. This never works because business knowledge is key to decent development. You can cope with a few business-agnostic resources but not many. And I would say that it takes at least six months before you start to get comfortable with most business knowledge.
  2. Outsourcing to bodyshops on the subcontinent. I reckon this never works either because of many of the same arguments above. Plus the time differences affect your communication.

And so, you get in some onshore or near-shore “development partners”. If you’re lucky, this works the first couple of times because your development partners put on their industry-specific high-rollers but then it’s not long before the quality degrades. Even if you’re lucky, the headaches of integration with existing in-house systems, differing standards and ongoing support are too big.

Unfortunately, I’ve not seen the solution anywhere yet. It’s a crisis that IT is still working through to find the solution, although I suspect that, until the magical system comes along with you draw some boxes and lines in Visio and then something builds your whole system, it’s about being more realistic about what the IT department can deliver in the time, no matter how unpalatable that might be.