_replace_virtual_columns = false

A follow-up to my post yesterday.

A quick recap on the situation:

A sql statement such as this:

FROM my_schema.table1 tab1

is raising an error such as this:

ORA-00904: “MY_SCHEMA”.”TAB1″.”SYS_NC00062$”: invalid identifier

where SYS_NC00062$ is not a column directly specified in the sql but a virtual column related to a function-based index.

This matches a bug which is currently under development and not yet published (Bug 7026584).

The explanation is that the 11g optimizer uses a new mechanism to work with function-based indexes at parse time, a new mechanism which as yet not fully robust.

The problem comes when virtual column processing is done from qkacol where it clones the predicates and later tries to resolve the copied columns in the predicate with the supplied fro. The root cause here, the unnesting driver generates an alias for table and adds it for a correlated column. [sic]

In addition to the 10046 and 10053 trace files, a 904 errorstack trace file provided the information required:

alter system set events='904 trace name errorstack';
alter system set events='904 trace name errorstack off';

Oracle’s currently (at the time of writing this) recommended solution to that bug to set undocumeted parameter _replace_virtual_columns to false.

I’m still awaiting an explanation on that cost_io virtual column adjustment.

Interesting cost_io in 10053 trace

184124858368.00 to be precise.

I’ve got a problem with a set of sql statements in production. Fortunately the problem is reproducible in a non-prod environment.

These statements all reference one particular virtual column in a composite function-based index.

The sql statements all look something like this:

FROM my_schema.table1 tab1

And they are raising the following error:

ORA-00904: "MY_SCHEMA"."TAB1"."SYS_NC00062$": invalid identifier

This “SYS_NC00062$” is a hidden/virtual column on this table that is part of a function-based index. I can see it here at least:

select *
from sys.col$
where obj# in (select object_id from dba_objects where object_name = 'TABLE1' and owner = 'MY_SCHEMA') order by col#;

This article isn’t about this error specifically. Not yet at least but I might blog about it later once I know some more.

However, in the meantime, I’ve just started gathering details, running some extended trace files (10046, 10053), etc with a view to finding some extra information and then probably raising an issue on metalink.

I’ve not even started analysing this info yet but as I zipped through the 10053 file, I noted something odd.

Note that in this 10053 trace file, the original reported SYS_NC00062$ has now become SYS_NC00066$ because I dropped and recreated the FBI to see if that helped – it didn’t.

Looking at the cost_io numbers, you get various costs for various access methods – e.g a 16.00, a 3.00, a 15632.00, a 53.00 but then holy moly:

***** Virtual column Adjustment ******
Column name SYS_NC00066$
cost_cpu 300.00
cost_io 17976931348623157081452742373170435679807056752584499
***** End virtual column Adjustment ******

I’ve literally done nothing yet but seen that number and thought I’d put in down quickly in an article.

But wow – that’s a big adjustment! What’s that about?

I picked the wrong day to leave my Jonathan Lewis CBO book at home…

Here’s the 10053 trace file.

Using function result_cache for timed caching*

*Or “can I use result_cache to cache for 1 minute/day/hour/day/week/month”

There is already a plethora of excellent resources on the new 11g result_cache functionality, notably:

Including a couple of interesting angles on things here:

Now one of my first tasks now that we’re on 11g and EE (having upgraded from 9i SE) is to identify new features to implement from a performance perspective.

Having read some of the above resources, it won’t surprise you to know that you can’t just point anything at the result_cache and expect a dramatic improvement – quite the contrary sometimes. As Adrian Billington notes in his article, if you’re already doing session memory caching using pl/sql structures, you’re unlikely to see any saving in time – the opposite in fact – but the advantages of the result cache are “cross-session availability, managed SGA memory usage and protected data integrity”. Of course, depending on what you’re doing, you could even choose to use pl/sql caching on top of the result cache.

So, if you’ve got a result_cache target in mind – make sure you test it.

One of the things I’m currently looking at is whether to implement a function result_cache lookup for time sensitive information. The sort of thing I mean is information that changes daily or hourly or weekly independently of the underlying data changing.

For example, fund prices and valuation dates. A fund might have a scheduled set of valuation dates from which a certain price applies. And on any given day, you might be in a different valuation period.

Exactly the sort of thing you might cache using whatever caching method you might choose. And exactly the sort of thing that you could use function result_cache functionality for.

If the underlying data changes, the cache will be refreshed (having used the RELIES_ON clause).

If the timing point changes (the day/week/hour/minute/whatever), a new set of data is cached.

Here is a simple example of what I mean.

- Create a function that will cache by the minute. We will do this by creating a function with a single parameter that defaults to the current date truncated to the current minute and by calling it without specifying a parameter:

CREATE OR REPLACE FUNCTION f_test_minute_result_cache (
END f_test_minute_result_cache;

Then call it:

1 select to_char(sysdate,'DD-MON-YYYY HH24:MI:SS') now
2 ,to_char(trunc(sysdate,'MI'),'DD-MON-YYYY HH24:MI:SS') this_minute
3 ,to_char(f_test_minute_result_cache,'DD-MON-YYYY HH24:MI:SS') cached_result
4* from dual

-------------------- -------------------- --------------------
17-JUN-2008 10:16:02 17-JUN-2008 10:16:00 17-JUN-2008 10:16:02

Elapsed: 00:00:00.00

Call it again, in the same minute, but a few seconds later:

1 select to_char(sysdate,'DD-MON-YYYY HH24:MI:SS') now
2 ,to_char(trunc(sysdate,'MI'),'DD-MON-YYYY HH24:MI:SS') this_minute
3 ,to_char(f_test_minute_result_cache,'DD-MON-YYYY HH24:MI:SS') cached_result
4* from dual

-------------------- -------------------- --------------------
17-JUN-2008 10:16:06 17-JUN-2008 10:16:00 17-JUN-2008 10:16:02

Elapsed: 00:00:00.00

You can see from “Now” that time has inevitably moved on, but as the default in parameter has not changed, the result has come from the cache.

Then, run the statement again a bit later:

1 select to_char(sysdate,'DD-MON-YYYY HH24:MI:SS') now
2 ,to_char(trunc(sysdate,'MI'),'DD-MON-YYYY HH24:MI:SS') this_minute
3 ,to_char(f_test_minute_result_cache,'DD-MON-YYYY HH24:MI:SS') cached_result
4* from dual

-------------------- -------------------- --------------------
17-JUN-2008 10:55:29 17-JUN-2008 10:55:00 17-JUN-2008 10:55:29

Elapsed: 00:00:00.00

Time has moved on, the default in parameter has moved on, a new result is generated and cached.

So, just to show that you could also use result_cache in this way. Doesn’t necessarily mean that you should though.

Bug in 11g AWR report

Post 11g upgrade, one of the first things I noticed was in the AWR reports.

In the Instance Efficiency Percentages, the “Parse CPU to Parse Elapsd %” was a less than impressive “0.00”. I’ve become accustomed to this being low in reports here, but not that low ;-) .

This has been confirmed as a new bug but the support analyst was not hopeful of it being fixed at any point soon (periods of not months but years and 12g was mooted).

It’s not a big deal. The forumula for this ratio is an unsurprising 100*(parse time cpu / parse time elapsed), figures which can be found elsewhere in AWR (and statspack of course) in the Instance Activity Stats.

So, the figure that should have reported was 85% – still less than impressive.

I’m slightly surprised that something so obvious slipped through the net but then again maybe not if it’s unlikely that more than 1% of Oracle sites are running 11g.

11g upgrade still good

Still looking good.

CPU on the new box is hovering under the 20% mark having been habitually > 80% on the old kit. But it could be a quiet time for the business so we will have to compare the metrics that we capture as a proxy for “business activity”.

Best news is that we were getting a significant number of “ORA-01801: date format too long for internal buffer” errors every day causing the client-server application to crash.

These have been due to cursors being shared when they shouldn’t be, for example due to NLS mismatches.

As hoped (and promised to the business), these have disappeared with the upgrade.

we have 11g lift off

Production database was upgraded to 11g over the weekend: 9i -> 11g, SE -> EE, Windows -> Linux.

This is the first application in this client’s company to upgrade.

Seems to have gone well given that I was not called over the weekend.

As ever though, the only true test is going live and I would be very, very surprised if there were no suprises!

So, in the absence of any ringing hotlines and before I start on something new later this morning, I’m just going to be monitoring active session and also seeing what rises to the top by looking at sql with greater than average buffer_gets (I know roughly what’s normally there so will be interesting to see if anything changes).

Fine-grained access to network services

(or my database emails have stopped working)


Here’s the background:
We’re in the latter stages of preparing for a production upgrade to 11g (2 weeks to go).

Best practice would say that we should have a stable database by now and very high confidence factor, but the truth is that the production upgrade coincides with a new release of functionality and every now and then it is “discovered” that some core functionality is not working (For example 1200 lines of code to change from using the function XSLPROCESSOR.VALUEOF which has a 4k node size limit to use the procedure which doesn’t. Or actually creating a wrapper for the function signature that calls the procedure, etc, etc)

Back to the matter at hand:

Emails failing with

ORA-24247: network access denied by access control list (ACL)
ORA-06512: at "SYS.UTL_TCP", line 17
ORA-06512: at "SYS.UTL_TCP", line 246
ORA-06512: at "SYS.UTL_SMTP", line 115

This is part of the nattily named “Fine-grained Access to External Network Services“, a new feature in 11g. Essentially, previously allowed access to services over the network using packages like UTL_SMTP, UTL_HTTP, UTL_TCP, etc.

Have some links:

So, the problem itself was pretty easy to distill:

l_email clob := 'test email';
i_sender => 'me@myemail.com',
i_recipients => 'me@myemail.com',
i_subject => 'test',
i_mime_type => 'text/html',
i_priority => NULL,
i_message => l_email);

which gave, as mentioned:

ORA-24247: network access denied by access control list (ACL)
ORA-06512: at "SYS.UTL_TCP", line 17
ORA-06512: at "SYS.UTL_TCP", line 246
ORA-06512: at "SYS.UTL_SMTP", line 115

Solution is to create an ACL:

acl => 'my_mailserver_acl.xml');
WHEN others THEN
acl => 'my_mailserver_acl.xml',
description => 'Connect Access To Mail Server',
principal => 'UTILITY',
is_grant => TRUE,
privilege => 'connect',
start_date => SYSTIMESTAMP,
end_date => NULL);
acl => 'my_mailserver_acl.xml',
host => 'my.mailserver.com',
lower_port => 25);

This failed for me initially because:

function-based index XDB.XDB$ACL_XIDX is disabled

After rebuild of index (note to self – don’t be lazy and use Toad because it expands the syntax to include defaults that aren’t always valid), email test works fine.

Job done, move along – nothing to see here.

Upgrading to 11g

I feel very fortunate that my current client is planning to upgrade a significant production system to Oracle 11g very shortly – you can’t get to know something like this properly until you use it in anger and work through the problems in a time-constrained environment. And I get the impression that there are very few production 11g sites out there. So it will be a good experience and it will look good for me.

Some say that 11g is no more than 10gR3, nevertheless it’s a bold upgrade decision – we’ll be going from 9i Standard Edition to 11g Enterprise Edition and switching from Windows to Linux (with a delayed phase of RAC thrown in).

It’s a bit like going from a Ford Focus to a Ferrari (but I’ll keep my thoughts on the futility of certain aspects of this to the RAC rant).

The upgrade is scheduled for March or Aprial and, so far, things have looked relatively pain free. In terms of general performance, not unexpectedly, Oracle 11g on Linux seems significantly faster than 9i on Windows.

The changes we have had to make so far seem mostly to concern changes in XML behaviour (change that actually mostly happened in 10gR2, but we’ve skipped that generation).
For example:

  • In 9i, if you tried to create an XML element with a space in the tag name using XMLELEMENT, the space would be replaced with “_x0020_”. In 11i, the space is preserved as a space. I’m not convinced by this change in behaviour as an element surely cannot have a space in it’s name. Anyway, in those unwise places where spaces have been used, these tags will have to change in both our client and database code.
  • There has been a change in behaviour regarding dates within XMLELEMENTs. In 9i, a date without an explicit format would pick up the default NLS_DATE_FORMAT of the session. In 11g, the date format will be YYYY-MM-DD. So, to preserve our 9i behaviour, we need to TO_CHAR our dates in XML so that they pick up the session format.
  • When using XMLFOREST in 9i, empty columns would result in no tag being present in the resultant XML. Although deliberate, I think there’s something dodgy about this behaviour anyway. However, it’s changed slightly now we’re in 11g. Columns with not null columns will always produce a tag, even if empty (consider outer joins to tables with not null colums). Empty nullable columns continue to have their tags omitted. I’m not sure how we will address this in our particular situation (empty tags cause us issues if the client entity is not a nullable object, but if the tag is missing apparently the whole enigma is avoided. Somehow!). The solution certainly won’t involve changing not null columns to nullable, but, to limit changes in both client and database, it could involve using XSL to strip out empty tags.

I anticipate further issues to come out of the wash however I expect the majority of these to only reveal themselves post-production upgrade. There are bound to be some statements that degrade significantly but it might not be until the business do their daily thing day-in day-out that the most significant problems for them float to the surface (as opposed to things that are most obvious from automated load testing which I am always doubtful regarding the efficacy thereof).

Fighting off RAC

As mentioned, my current client is upgrading from 9i Standard Edition on Windows to 11g Enterprise Edition on Linux.

The timeframe for this upgrade is quite aggressive. While normal development continues at a blistering pace, automated regression testing began part-time in January for an upgrade originally in March but now re-arranged for April.

Also in January, I was told that it had been decided that we would be going live on a 2-node RAC cluster. Alarm bells rang, sirens went off, I was bewildered as to how such a decision could be made.

This was a decision that management had apparently made in conjunction with the DBAs, with RAC sold as a black box database clustering solution with no question marks over its suitability.

There’s no architecture department/function and in terms of Oracle and databases, it’s a role I have previously filled elsewhere. I immediately went into overdrive, briefing against the wisdom of going to RAC before it was too late.

For us, there are numerous issues that I see with RAC:

  • The first is the upgrade timeline. The proposal, in January, was that in March we would be changing hardware, changing OS, changing database version, changing database edition and now also moving to RAC. There are plenty of references out there that suggest RAC alone can take a few months to stabilise and to nail your proper configuration. For a business that was committed to a March (even April now) deadline, this seemed to be to be a big risk.
  • Secondly, I am far convinced about the suitability of RAC for our system. It’s one thing for senior management to upgrade the business criticality of a system and give the greenlight for some spending to upgrade the platform. It’s another thing to upgrade your Ford Focus to a Ferrari only to stay stuck in the same commuting traffic. That’s my analogy of our system – lots of bad SQL, lots of statements that cope with 1001 different inputs, far too many lookups in functions called from SQL. At the end of the day there’s no getting away from the fact that our system has something like an average of 8 concurrent requests. Historically, it has been maxed out or close to being maxed out on CPU because the SQL is bad.
  • More importantly, I remain very concerned about the suitability of RAC for the profile of our system. I think of RAC as suitable for high-end OLTP systems of for multiple systems with non-OLTP systems restricting themselves to one of the nodes (unless non-conflicting activity can be determined somehow and allocated to relevant nodes). I would probably categorise our single system as a hybrid system with a very small element of OLTP and a large element of reporting. Furthermore, a large proportion of that reporting revolves around the top X% of a single table. The RAC architecture is such that repeated requests for the same blocks will cause significant waits as the instances coordinate and pass the blocks back and forth.
  • The bottom line is that RAC is complex and there’s no getting away from the fact complex is bad. Complexity eventually comes back round and bites you. And complexity usually costs more.

For now, my briefing has been a partial success, although really the only point that had any effective leverage was that the the upgrade date was under threat. A 2-node RAC solution is currently on hold for the initial upgrade. The idea being that we upgrade to 11g on the new hardware and then we expand to a 2-node cluster once that initial upgrade phase is stable.

But, and here’s the stickler, the upgrade is going to be to a 1-node RAC “cluster”. I can understand that there is a once-in-a-blue-moon opportunity to go to the new hardware, but I don’t like the smell of a 1-node RAC cluster at all. For a start, there surely has to be some overhead to the RAC installation. But we’re not even testing that. But what’s more is the cost. There is a significant licensing overhead to RAC. It’s an overhead that surely means that going to a 2-node cluster is a formality regardless. I’m just glad that it’s not my name on the cheque. It’s one of those things where I just hope that someone down the line doesn’t say that we’re moving off Oracle because it’s so expensive.

A couple of other folks’ thoughts on issues with RAC:


Get every new post delivered to your Inbox.

Join 68 other followers