Oracle 12c VirtualBox VM

Just a quick note to highlight that the Oracle pre-built Developer VMs have been updated for Oracle 12c.

Oracle pre-built Developer VMs

Scalar Subselect Costing

This issue is an oldie but deserving of a quick post to stop me going off on a tangent in another post.

It is an oddity of scalar subselects/subqueries that their cost is not taken into account in the top level cost of a query.

In older versions of Oracle, it used to be the case that you didn’t even see the scalar subquery in the execution plan.

However, even in the latest versions, the cost still isn’t accounted for.

Always something to keep in mind.

For example:

SQL> create table t1
  2  (col1 number not null);

Table created.

SQL> 
SQL> insert into t1
  2  select rownum
  3  from   dual
  4  connect by rownum <= 10000;

10000 rows created.

SQL> 
SQL> commit;

Commit complete.

SQL> 
SQL> create table t2
  2  (col1 number not null primary key);

Table created.

SQL> 
SQL> 
SQL> insert into t2
  2  select rownum
  3  from   dual
  4  connect by rownum <= 10000;

10000 rows created.

SQL> 
SQL> commit;

Commit complete.

SQL> 

Let’s do a scalar subselect to do an index lookup on t2 for every row in t1:

SQL> explain plan for
  2  select t1.col1
  3  ,      (select t2.col1 from t2 where t2.col1 = t1.col1)
  4  from   t1;

Explained.

SQL> 
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
Plan hash value: 2339000913

----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              | 10000 |   126K|     8   (0)| 00:00:01 |
|*  1 |  INDEX UNIQUE SCAN| SYS_C0078310 |     1 |    13 |     1   (0)| 00:00:01 |
|   2 |  TABLE ACCESS FULL| T1           | 10000 |   126K|     8   (0)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T2"."COL1"=:B1)

You can see that the cost of the scalar subquery is 1 per execution and it’s not accounted for at the top level.

Let’s force a full table scan of the row-by-row lookup:

SQL> explain plan for
  2  select t1.col1
  3  ,      (select /*+ full(t2) */ t2.col1 from t2 where t2.col1 = t1.col1)
  4  from   t1;

Explained.

SQL> 
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------
Plan hash value: 637946564

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 10000 |   126K|     8   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T2   |     1 |    13 |     2   (0)| 00:00:01 |
|   2 |  TABLE ACCESS FULL| T1   | 10000 |   126K|     8   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("T2"."COL1"=:B1)

Obviously a much more expensive operation but, again, not properly accounted for in the overall costing.

Wouldn’t it be preferable that as the optimizer has estimated the number of rows in the top level select:

|   0 | SELECT STATEMENT  |      | 10000 |   126K|     8   (0)| 00:00:01 |

and it has estimated the cost per execution of the scalar subselect:

|*  1 |  TABLE ACCESS FULL| T2   |     1 |    13 |     2   (0)| 00:00:01 |

that the top level cost include to some degree the cost of scalar subselect per execution * estimated executions?

For example, if we code a join roughly equivalent to the scalar subselect then:

SQL> explain plan for
  2  select /*+ 
  3           full(t2) 
  4           use_nl(t2)
  5           */
  6         t1.col1
  7  ,      t2.col1
  8  from   t1
  9  ,      t2
 10  where t2.col1 (+) = t1.col1;

Explained.

SQL> 
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------
Plan hash value: 2453408398

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |   253K| 66919   (7)| 00:05:35 |
|   1 |  NESTED LOOPS OUTER|      | 10000 |   253K| 66919   (7)| 00:05:35 |
|   2 |   TABLE ACCESS FULL| T1   | 10000 |   126K|     8   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| T2   |     1 |    13 |     7  (15)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("T2"."COL1"(+)="T1"."COL1")

Also see:
http://jonathanlewis.wordpress.com/2007/10/12/scalar-subqueries/

http://oracle-randolf.blogspot.com/2010/01/when-your-projection-is-not-cost-free.html

http://blog.sydoracle.com/2005/09/explain-plans-and-scalar-subqueries.html

Any thoughts on stats validation?

There is an application running on 9.2.0.8 and this application’s approach to object statistics is that they are gathered along with each quarterly release of code and do not change outside of the release schedule.

There are a number of other additional issues with some elements of this particular implementation, discussion of which are outside the scope of this.

Now, as part of the upgrade to 11.2, it’s highly likely that we will move to the default built-in stat proc, albeit gathering in a weekend window not daily.

One of the biggest advantages of the current situation is that the stats are tested along with the release and released with the release (of course, plans can still change depite stats being fixed – even because they are fixed due to time ticking ever onwards, sequences increasing, etc).

And so, as part of a possibly non-negotiable comfort blanket to move to the default 11g stats job, there is a management requirement to have some sort of validation routine – i.e. a set of sql statements broadly representing important or problem areas of the application – that will be run before/after and compared on some sort of metric and judgement passed on whether the stats are “good”.

There are a number of nifty features like pending stats that we can use here but my initial thoughts on this was that this sounded like a fairly easy job for SQL Tuning Sets (STS) and the SQL Performance Analyzer (SPA).

SPA requires the additional Real Application Testing (RAT) license and, now that our licensing has been negotiated, this isn’t something we’ve got.

So… hand-rolling a validation routine.

I would see SQL Tuning Sets with the sql statements and bind sets as still part of a bespoke solution
(STS are available with the Tuning Pack license which is on top of EE both of which we have).

I could picture a simple approach that would execute these statements in parallel sessions probably via DBMS_SQL, fetch all the rows, record the session stats before/after, pick a metric e.g. buffer gets, elapsed time, or some sort of ratio between those sorts of things and then and do some sort of metric comparison – yes, that run was broadly the same or better, or at least not a catastrophic degradation – these stats are ok.

(Of course, being able to picture a possible approach does not mean that I advocate it)

But I just wondered whether this is something anyone else has done?

I’d be very interested in hearing more if you have.

Edit:
I’ve tried to reword this to avoid some confusion.
Alex – I’ve deleted your comments and my replies because a) they reduce the chance of getting the input the article was aiming for b) they were based on nested recursive misunderstandings and c) much of what you were saying is preaching to the choir.

Prod DBA 2.0 says no**

Dominic Delmolino has made a welcome return to regular blogging and it’s good to see that I’m not the only one forgetting the basics.

In the body and comments of Dominic’s*** first few posts back, the subject of database development is very much in focus.

So, DD got me thinking and here is my stream of random ramblings – they jump about a bit, it must be a Friday.

Production DBAs have no place on most development teams!

When was the last time you saw a production DBA in a scrum standup? (Genuine question – do you do that?).

Today’s production DBA is a busy man whether he’s 0.x (seen a lot of these), 1.0 (met a handful), 2.0 (heard tale of a few) :)

Some might say they’re too busy coming up withinsane “policies” and standards.

At the very least, even if he has automated everything that he can automate(*), with the vast estate of DBs that a small, modern production DBA team *should* be able to support, the ebb and flow of urgent production issues means that he’s unsuited to the predictability of the resource availability that modern agile project development demands.

That vast estate of DBs that a small, modern DBA *should* be able to support also means that the knowledge that the production DBA of any one specific application or schema tends to be very limited.

However, for most enterprise-level data-centric applications (caveats and clarities due to previous sweeping statements), the data model is still so important and it’s just not being done right in so many places.

Here is a link post to a selection of other writings on the subject.

What Dominic D says strikes a chord with me. (But I’ve got a vested interest. The role he describes is essentially what I do or what I try to find.)

Most agile development teams (caveat again – most agile development teams developing enterprise-level data-centric applications) need access to a database development specialist.

Someone who knows SQL, knows PLSQL (in the case of Oracle), knows performance and knows the features and functionality of a database ideally to at least the same level expected of a production DBA (albeit with different peaks and troughs in specific areas) but without the irregular demands on their time from urgent priorities.

Most of them need this to be able to build data models that scale.

And to keep everyone on track with the data model and the big data picture – this can quickly get lost in translation especially with commodity developers (plug and play).

Most of them need this to write or at least QA SQL and PLSQL that will scale.

Most of them need this to strike a balance between new features and stability.

Most of them need this as a regular interface to the production DBAs as a communication conduit and filter, and to uphold the requisite standards and policies for development to move into production.

And absolutely this database specialist should take full responsibility for what goes into production – it’s all about control and responsibility.

Two other thoughts occur (and I’m not sure I’d finished on the previous thought but it’s gone now, flushed).

Firstly, and I commented this on DD’s blog, there is belief that all developers are equal and that you can plug and play any developer into any project without any consequence. I don’t believe at all in this approach. I’ve never seen it work effectively apart from in the presentation layer – i.e. coding input fields and buttons, etc – and even then I’m far from convinced.

What I am definitely convinced about is that this in no way works for any layer that requires significant business knowledge and goddamit does that include the database.

On the subject of the second of the two thoughts mentioned above, the phone rang before I could write it down and so that thought has now disappeared. If I remember, I’ll update this. Until then my mind’s gone blank….

Update:
Oh yes, the second thought. Development DBA is sometimes the starting point for a junior DBA – en route to the production DBA. This is a nonsense given the control, responsibility and knowledge required. As mentioned they need to know at least as much as the typical production DBA.


*I’m not a production DBA. Never have been. And although I’ve thought about it, not sure I want to be. Development’s where it’s at for me. But how DBA teams have much automated? And automated well? Everywhere I’ve been recently, the DBAs are firefighting constantly. Or they’re refreshing environments. Or they’re just completely overloaded with stuff that sounds like it should be automated. I mentioned this to someone else recently and they commented that it was because a large percentage of the production DBA team were contractors and has no vested interest in automating. I don’t know if that’s true. Any thoughts? There should be no greater compliment than going into a company and improving it to such an extent that the manual intervention you were providing was no longer required. But I can see the conflict.

**Prod DBA 2.0 says no -> Not sure how many, people outside the UK are familiar with Little Britain. Think it was way overhyped myself, followed by way overplayed. Personally (as comedy appeal always is), although there have been exceptions like the Armstrong and Miller fighter pilot sketch, there’s not much sketch comedy that has regularly and consitently tickled my fancy in the last decade since the Fast Show.

*** Confusing – am I referring to Dominic Delmolino or have I taken to the convention of referring to myself in the third person.Did you know that the latter is known as Illeism? I didn’t.
Stylistic device or a form of narcissism?
My CV is in the third person and I’ve never been even 83.2% happy with that approach.
But I’ve never liked it when I’ve rewritten it with a personal pronoun – “I” – neither with the modern trend of no pronoun whatsoever, e.g. “Worked on an oil rig. Implemented XYZ. Put out lots of fires.”
It stems from my time at a consultancy – that was their style.
Then again, I don’t agree with the narcissism interpretation (e.g. a bit like celebrities and some sports persons).
If anything, for me at least, it’s completely opposite – a form of anti-narcissism, a self-demotion which I think of as being particularly British.
It can be easier to say “X achieved this” than “I achieved this” and maybe there’s a comforting distance between this CV character “Dominic Brooks” and myself?
Don’t know if there’s an established phrase of “take the blame but share the glory” but maybe it’s something like that …

Database Agility

A collation, for my benefit, of the thoughts of some others in this area:

Top 3 Oracle Features of the Decade

Yesterday I made an early break for the annual New Year resolutions post.

However, there’s another end-of-year post that’s up for consideration in this particular year and, rather than looking forward to the coming year, that’s a look back over the past 10, 2000-2010.

So, I ask you what are your top 3 Oracle RDBMS features over the past decade.

Oracle 8i was 1999 so we’re really talking about:

  1. 9iR1 in 2001
  2. 9iR2 in 2002
  3. 10gR1 in 2003
  4. 10gR2 in 2005
  5. 11gR1 in 2007
  6. 11gR2 in 2009

A Top 3 is very, very challenging when you think of some of the enhancements which have come along.

Obviously it depends on your own, personal perspective – naturally, mine are going to have a Development bias.

Each version fixes a significant number of bugs from the previous release, but I seem to remember 8i being a rush job for the buzzwords ahead of the new millenium (the early patches at least), 9iR1 a hatchet job and 9iR2 being a relative bastion of stability.

Then again it’s also a long time ago now and my memories may be unreliable.

The New Features Guides of 9i (links above) lists amongst others:

  • LogMiner improvements, DataGuard, RAC, Flashback Query, Some online redefinition & reorganisation features, VPD enhancements, Automatic Undo Management, dynamic memory management, spfles, RMAN improvements, native XML functionality and XMLDB, the MERGE statement, TIMESTAMPs and INTERVALs, CASE statement, External Tables, associative arrays/index-by tables indexed by VARCHAR2, Streams, CDC, Index Skip Scans, ANSI SQL, OMF, multiple blocksizes, dynamic sampling, table compression, subquery factoring, pipelined table functions, etc.

A list of new features from 10g (or my list at least) is shorter, but even so it seems a bigger hitter in terms of weighty marketing-savvy acronyms and features:

  • ADDM, ASH, ASM, AWR, Automatic SQL Tuning, DataPump, Job Scheduler, SQL Access Advisor, HTMLDB, Online Table Redefinition, Oracle XE, DBMS_XPLAN…

11g has always felt like 10gR3. Nevertheless, there have been still some impressive features therein:

  • Adaptive Cursor Sharing, Result Cache, Database Resident Connection Pool Caching, Invisible Indexes, SecureFiles, Binary XML…

And then recently Exadata and the Oracle Database Machine (although I think it’s too soon to make any judgement on these and if they have a big impact then it will be mostly in the next decade).

For the summaries above, I’ve missed out a lot (probably accidentally – let me know – but I was rushing towards the end).

But I’ve focused on the initial introductions of headline new features.

However, I would argue that the gradual evolution of some of the main features and functionality has had some of the biggest impact as we are now at the end of the decade compared to the end of 1999. In addition, some of the tweaks and internals have also had a big impact whilst avoiding the headlines – mutexes for example.

Think about how the CBO has moved on over the past ten years.

Or how HTMLDB has morphed and moved on to APEX.

Or how parallel capabilities have developed. Or partitoning.

Maybe you like your GUIs like OEM.

Or the evolution of IDEs has been revolutionary for you e.g. SQL Developer, or the explosion of functionality in third party tools like Toad or Quest Code Tester for Oracle.

Or the combination of several features – external tables plus pipelined functions are pretty cool for loading in data, for example. Or partitioning plus parallel.

All of these are up for consideration.

What’s not up for consideration is stuff that predates the decade. There are several features which I thought of while doing this and which turned out to be introduced in 8i or even before. Man, tempus fugit…

So, my Top 3 is based on what I do or use in Oracle most days (or even what I don’t have to do anymore) because of features introduced or evolved over the past decade. The evolution of collection functionality in SQL and PLSQL came close. A bit further behind was some of the XML capabilities. A whole host of ineligible features from older versions were scrubbed out. However, in no particular order, my vote goes to:

  • Analytics – It’s amazing how many time I end up using analytic functions. As Tom Kyte says “Analytics rock, Analytics roll”. Countless multipasses of data have been avoided with these babies.(Analytics ruled ineligible as an 8i introduction, dammit)
  • The WITH clause aka Subquery Factoring – Most of my sql, if it has any sort of complexity, ends up using this feature; fantastic for breaking up and understanding a complex bit of sql that you’ve never seen before; great at encouraging you to think in SETS.
  • AWR / ASH / Wait model evolution – Shame AWR & ASH are licensed separately and there are alternatives for earlier Oracle versions or if you don’t want to fork out the big bucks. But so useful for diagnosing performance problems, particularly retrospectively and/or identifying session problems which might have drowned previously in a system-level Statspack.
  • DBMS_XPLAN – Another feature that I use day in, day out. So easy to get explain plans, actual plans from memory or from AWR. Invaluable!

I would imagine that if you were a production DBA then the evolution of RMAN and Flashback has been fantastically useful and time saving.

These are not things that I’ve tended to use much in my role.

But if something’s revolutionised your decade, let me know.

Data modelling in the modern world

I first read Robyn Sands’s article on the importance of a good data model back when it was published in May.

However, the comments have since moved on and taken a life of their own – a sure sign of a great post – with some excellent points and poignant observations on the frequent neglect of the data model in the agile world.

If you’ve not read it or have not kept updated with the comments, then I urge you to.

Follow

Get every new post delivered to your Inbox.

Join 70 other followers