Interesting cost_io in 10053 trace

17976931348623157081452742373170435679807056752584499
6598917476803157260780028538760589558632766878171540458953514
3824642343213268894641827684675467035375169860499105765512820
7624549009038932894407586850845513394230458323690322294816580
8559332123348274797826204144723168738177180919299881250404026
184124858368.00 to be precise.

I’ve got a problem with a set of sql statements in production. Fortunately the problem is reproducible in a non-prod environment.

These statements all reference one particular virtual column in a composite function-based index.

The sql statements all look something like this:


SELECT col1
FROM my_schema.table1 tab1
WHERE ...
AND NVL(col2,'UNASSIGNED') = SYS_CONTEXT('MY_CONTEXT','MY_COL2_VALUE')
...

And they are raising the following error:


ORA-00904: "MY_SCHEMA"."TAB1"."SYS_NC00062$": invalid identifier

This “SYS_NC00062$” is a hidden/virtual column on this table that is part of a function-based index. I can see it here at least:


select *
from sys.col$
where obj# in (select object_id from dba_objects where object_name = 'TABLE1' and owner = 'MY_SCHEMA') order by col#;

This article isn’t about this error specifically. Not yet at least but I might blog about it later once I know some more.

However, in the meantime, I’ve just started gathering details, running some extended trace files (10046, 10053), etc with a view to finding some extra information and then probably raising an issue on metalink.

I’ve not even started analysing this info yet but as I zipped through the 10053 file, I noted something odd.

Note that in this 10053 trace file, the original reported SYS_NC00062$ has now become SYS_NC00066$ because I dropped and recreated the FBI to see if that helped – it didn’t.

Looking at the cost_io numbers, you get various costs for various access methods – e.g a 16.00, a 3.00, a 15632.00, a 53.00 but then holy moly:


***** Virtual column Adjustment ******
Column name SYS_NC00066$
cost_cpu 300.00
cost_io 17976931348623157081452742373170435679807056752584499
6598917476803157260780028538760589558632766878171540458953514
3824642343213268894641827684675467035375169860499105765512820
7624549009038932894407586850845513394230458323690322294816580
8559332123348274797826204144723168738177180919299881250404026
184124858368.00
***** End virtual column Adjustment ******

I’ve literally done nothing yet but seen that number and thought I’d put in down quickly in an article.

But wow – that’s a big adjustment! What’s that about?

I picked the wrong day to leave my Jonathan Lewis CBO book at home…

Here’s the 10053 trace file.

Are my Stored Outlines being used? Why not?

I seem to be blogging today about loose ends from previous blog articles that I’ve not tidied up 

Frustrations with Stored Outlines today.

Yesterday I mentioned how Stored Outlines offered a solution to fixing a SQL performance problem within an application generating SQL via an ORM layer (i.e. no or little control over the SQL concerned).

It all seemed so simple. Create outlines on a “good” database, export them, import them and enable the usage of them on the “bad” database.

The first attempt was fine with an outline against one bit of SQL. It did exactly what it said on the tin. However subsequent attempts have been less than perfect, much to my disappointment because it encourages “the fear” and vindicates the position of sceptics who would resist this approach. I am 99% sure that there is a simple example of user error at work here (where user=me).

This morning I had some inconsistencies in usage of the outlines and I put it down to TOAD. I’m pretty sure it was TOAD in that two apparently identical SQL statements (copied and pasted) run one from SQL*Plus and one from Toad had different hash_values in V$SQL. And I put it down to some sort of wrapper or something else internal to do with how TOAD itself works. Anyway, moving on from Toad.

There are three main “tools” that I have been using to figure things out (apart from Toad and SQL*Plus) – V$SQL, extended session tracing and the OUTLN.OL$ table.

V$SQL is cool because it has a column OUTLINE_CATEGORY which is some sort of indication of whether the SQL used a Stored Outline.
And OUTLN.OL$ is good because it because it shows me what outlines are available for what SQL in what outline categories.

But what I’m struggling with is that if the usage of Stored Outlines is enabled (via alter session set use_stored_outlines=%lt;category_name>) and if the outlines/outline categories were there in the OUTLN tables, and if the hash_value of a SQL statement in V$SQL and in the extended trace file was the same as that in the OUTLN.OL$ table, why would that not be used?

I dunno. Not yet at least. Further tracing and investigation required tomorrow. I know that they are not being used now (since my first couple of outline extractions) primarily because of performance of the SQL on the “bad” database. It’s bad. Plus I can see from V$SQL and the trace that it has not been used.

I am currently not sure whether tracing event 10053 shows any information about Stored Outlines. I will see tomorrow, can’t investigate further at the moment. However, from the tracing I did earlier I was coming across the fact that you can’t do a 10053 trace against SQL which is embedded in PL/SQL. Not in 8i at least.

Which then makes my investigation slightly more complicated / artificial because the SQL has got “:1″ bind variable placeholders in it. So I can’t run it via SQL*Plus because you can’t create a variable called 1. And I can’t change the application code. And whilst I have a PLSQL harness that does allow this, I then have the 10053 trace issue. News of tomorrow’s success to follow later, I hope.

Application Hanging

Today a problem with the application “hanging”.

The background is that there has been some development in an application against database A.
Database A is now out of action due to a disk controller failure of something akin to that and so the application is now pointing at database B.
When pointing at database A, the application was performing acceptably, not so when pointing at database B where it is “hanging”.

So, straightaway, in response to one of the questions we should ask ourselves when there is a sudden problem – what has changed? – we know that there has been a change in database and the immediate follow-up is what might the differences be between database A and database B.

Fortunately, one of the differences is already known to me and, given the symptoms, is jumping up and down, waving its hands and shouting ‘Look at me’.

Both database A and database B are relatively recent copies of the production database.
All 8.1.7 by the way.
However, the production database has a bad mix of some tables having statistics (some stale) and some having none which can lead the CBO to make some inappropriate decisions due to the difference between default values it uses for those objects without statistics and the data distribution reality.

Database A had had a statistics update subsequent to the refresh.
Database B had not.

Therefore it is highly likely that the symptom of the application hanging is a less than optimal plan on the SQL concerned.

For the application operation that is hanging, quite a few bits of SQL might be issued.
There might be more than one bit of SQL that is performing unacceptably but in any event, we need to work through each suboptimal performer as we find it.

The application mostly uses ORM to generate its lovely queries (although in this case the development that has occurred is to optimise some of the iterative, slow-by-slow dynamic SQL that is generated).

Therefore the best mechanism is to trace the database session concerned, replicate the “hanging” behaviour and see what it is doing. What I am expecting to find is that, on “hanging”, the last piece of SQL in the trace file will be doing more IO than it needs to.

So, a trigger is required which sets the extended trace event for a specific user:


CREATE OR REPLACE TRIGGER xxxxxx
AFTER LOGON ON DATABASE
DECLARE
BEGIN
IF USER = 'yyyyy'
THEN
--
EXECUTE IMMEDIATE 'ALTER SESSION SET events ''10046 trace name context forever, level 12''';
--
END IF;
END;

The next step is to reproduce the “hanging” behaviour via the application.
Sure enough this generates a trace file on the server which is doing many, many “db file sequential reads”.
More than normal? (where normal is the behaviour when it performs acceptably)
Don’t know for sure yet, but it’s a good bet.

The SQL in question is something like this:


SELECT ....
FROM News t0
, (SELECT DISTINCT T1.newsid
FROM nolindex T2
, IMPORTERS T3
, NewsIndex T1
WHERE T3.destindexid = :1
AND T1.sectionid = T2.indexid
AND T2.indexid = T3.sourceindexid) v1
WHERE t0.ID = v1.newsid

The trace file is incomplete because, due to all the IO, it reached its size limit. Also the application was killed because we had the information we needed.

On database B (some stats but not all), running the SQL with a SET AUTOTRACE TRACE EXPLAIN reveals the following:


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=216293 Card=59764261584 Bytes=196445127826608)
1 0 MERGE JOIN (Cost=216293 Card=59764261584 Bytes=196445127826608)
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'NEWS' (Cost=826 Card=1515434 Bytes=4961530916)
3 2 INDEX (FULL SCAN) OF 'PKNEWSTABLE' (UNIQUE) (Cost=26 Card=1515434)
4 1 SORT (JOIN) (Cost=50339 Card=3943706 Bytes=51268178)
5 4 VIEW (Cost=165128 Card=3943706 Bytes=51268178)
6 5 SORT (UNIQUE) (Cost=165128 Card=3943706 Bytes=256340890)
7 6 NESTED LOOPS (Cost=24 Card=3943706 Bytes=256340890)
8 7 MERGE JOIN (CARTESIAN) (Cost=24 Card=2589433 Bytes=134650516)
9 8 TABLE ACCESS (BY INDEX ROWID) OF 'IMPORTERS' (Cost=2 Card=11 Bytes=286)
10 9 INDEX (RANGE SCAN) OF 'DESTINDEXID_IDX' (NON-UNIQUE) (Cost=1 Card=11)
11 8 SORT (JOIN) (Cost=22 Card=235403 Bytes=6120478)
12 11 INDEX (FAST FULL SCAN) OF 'NEWSINDEXSECTLANGNEWSINDEX' (NON-UNIQUE) (Cost=2 Card=235403 Bytes=6120478)
13 7 INDEX (UNIQUE SCAN) OF 'NOLINDEX_PK' (UNIQUE)

There are some big numbers in there.

Next step is to run against database C which should be similar to database A before it died – i.e. refresh from production with statistics on more objects:


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=206 Card=89 Bytes=26700)
1 0 NESTED LOOPS (Cost=206 Card=89 Bytes=26700)
2 1 VIEW (Cost=28 Card=89 Bytes=1157)
3 2 SORT (UNIQUE) (Cost=28 Card=89 Bytes=2314)
4 3 NESTED LOOPS (Cost=26 Card=89 Bytes=2314)
5 4 NESTED LOOPS (Cost=2 Card=8 Bytes=120)
6 5 TABLE ACCESS (FULL) OF 'IMPORTERS' (Cost=2 Card=8 Bytes=80)
7 5 INDEX (UNIQUE SCAN) OF 'NOLINDEX_PK' (UNIQUE)
8 4 INDEX (RANGE SCAN) OF 'NEWSINDEXSECTLANGNEWSINDEX' (NON-UNIQUE) (Cost=3 Card=225099 Bytes=2476089)
9 1 TABLE ACCESS (BY INDEX ROWID) OF 'NEWS' (Cost=2 Card=4695495 Bytes=1347607065)
10 9 INDEX (UNIQUE SCAN) OF 'PKNEWSTABLE' (UNIQUE) (Cost=1 Card=4695495)

Some much smaller numbers and completely different access paths.

The numbers themselves are not an issue – it’s the different access paths that is the problem.
What I mean is that if, via hints for example, we forced the second access path on the poorly performing database, explain plan would still have much bigger numbers.

And so, for possible resolutions, we have a number of options, at least four, possibly more:
Option numero uno: Gather appropriate statistics on all objects so that the CBO has a decent chance.
Option numero the next one: Get rid of the partial statistics so that we have none and go back to using RBO.
Option three: Hint the SQL in the application.
Option four: Gather a stored outline for the query.

Options one and two can be discounted due to organisational “fear” of change. I would say that option one – all objects with appropriate statistics – is much preferable to option two. However, neither is a go-er due to fear that this might have a wider detrimental effect. (Incidentally, option one was attempted previously but the deployment was backed out because, according to rumour, the performance of something somewhere degraded)
Option three might be ok. The downside from an application perspective is that to put the hints in requires an application deployment, an application stop and start, and to take it out requires the same. As part of a wider strategy for this particular application, its suitability is questionable because it’s ORM generated code and therefore it’s probably not possible to control the hinting.
That leaves option four – capture a stored outline for the query on the “good” database, export, import to the “bad” database(s) and enable usage thereof. Might require some shared pool flushage but easily turn-on-and-offable, can be repeated easily for other problematic queries.

And just for completeness, if we were going for option four, the steps would be to first create the stored outline on the “good” database:


CREATE OR REPLACE OUTLINE <outline_name>
FOR CATEGORY <category_name>
ON
<sql_statement>;

(Note that the text of the sql statement must match exactly that that will be executed by the application.)

Next, export the outline from the “good database”, e.g.


exp userid=outln/ query="where category = ''<category_name>" tables=(ol$,ol$hints)

Then, import into the “bad” database, e.g.


imp userid=outln/ full=y ignore=yes

Finally, maybe use a trigger to enable the usage of the stored outline for the problem application user:


CREATE OR REPLACE TRIGGER xxxxxxxxx
AFTER LOGON ON DATABASE
DECLARE
BEGIN
IF USER = yyyyyyyyyyyyy
THEN
--
EXECUTE IMMEDIATE 'ALTER SESSION SET USE_STORED_OUTLINES = <category_name> ;
--
END IF;
EXCEPTION WHEN OTHERS THEN NULL;
END;

Varying IN lists

In his column in the current edition of Oracle magazine, Tom Kyte wrote about the recurring issue of varying in lists and provided some solutions for this.

This ties in nicely with a recommendation I had already made at my current client where there is a significant usage of IN lists within dynamic SQL issued by applications on top of the 8.1.7.4 DB. So, if we were using bind variables for these IN lists, then if there were 1 -> n arguments in the IN lists, we would probably find n similar queries in the shared pool, each with a different number of arguments in that IN list.

However, these dynamic varying IN lists are not using bind variables.

 As a result, there is a fair amount of SQL ageing out of the shared pool as these sequence-generated ids creep upwards with every day bringing a new range of ids and, not surprisingly, we are doing too much hard parsing.

So, there is a two-pronged recommendation, firstly to change the CURSOR_SHARING to FORCE in order to do some auto-binding for certain users (to be set using an AFTER LOGON trigger) and, secondly, to use a better approach for IN lists, as detailed in the link above.

The advantages should hopefully be a reduction in hard parsing and, within the application, the end of the limit on the number of arguments in the dynamic string-concatenated IN list.

I have been using an approach very similar to the 8i approach that Tom describes in his column.

An example of the transition, using a two-argument IN list, is from:

select n.slug, n.id
from news n
where n.status=15
and n.sectionid in ('42299, 42295')

to

select n.slug, n.id
from news n
where n.status=15
and n.sectionid in
(
  select * from table ( cast ( f_numinlist('42299, 42295') as tt_number))
)

and, in the above example, ‘42299, 42295′ should be bound as a single bind (this is for an application with varying numbers of arguments and, as Tom points out in his article, if there is a finite number of arguments, then the best approach might be to just individually bind each item).

However, on implementing this approach, there was a very definite performance impact.
An explain plan for the original code looked like this:

SQL> select n.slug, n.id
  2 from news n
  3 where n.status=15
  4 and n.sectionid in ('42299, 42295')
  5 /
Execution Plan
----------------------------------------------------------
  0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=48 Bytes=1488)
  1 0 INDEX (RANGE SCAN) OF 'I_NEWS_001' (NON-UNIQUE) (Cost=4 Card=48 Bytes=1488)

An explain plan for the revised statement revealed the following:

SQL> l
  1 select n.slug, n.id
  2 from news n
  3 where n.status=15
  4 and n.sectionid in
  5 (
  6 select * from table ( cast ( f_numinlist('42299, 42295') as tt_number))
  7* )
SQL> /
Execution Plan
----------------------------------------------------------
  0 SELECT STATEMENT Optimizer=CHOOSE (Cost=5365 Card=192277 Bytes=8460188)
  1 0 HASH JOIN (Cost=5365 Card=192277 Bytes=8460188)
  2 1 VIEW OF 'VW_NSO_1' (Cost=14 Card=4072 Bytes=52936)
  3 2 SORT (UNIQUE) (Cost=14 Card=4072)
  4 3 COLLECTION ITERATOR (PICKLER FETCH)
  5 1 INDEX (FAST FULL SCAN) OF 'I_NEWS_001' (NON-UNIQUE) (Cost=841 Card=306123 Bytes=9489813)

My initial thoughts were that a hash join was probably not the most efficient mechanism. I don’t want to digress into a discussion on joining mechanisms partly because my knowledge is less than comprehensive and also because there are far, far better sources out there. I refer the reader to the Oracle documentation – 8i, 9iR2, and 10gR2 – and, of course, to Jonathan Lewis’s book on the CBO.
I also suspected that the problem was probably related to the number of rows that the CBO thought were in the dynamic IN list collection.The reason is right there in the autotrace, but a quick trace of the CBO decision making using event 10053 confirmed the following:

Table stats Table: KOKBF$ Alias: KOKBF$
TOTAL :: (NOT ANALYZED) CDN: 4072 NBLKS: 100 TABLE_SCAN_CST: 2 AVG_ROW_LEN: 100
Column: KOKBC$ Col#: 1 Table: KOKBF$ Alias: KOKBF$
NO STATISTICS (using defaults)
NDV: 127 NULLS: 0 DENS: 7.8585e-03
***************************************
SINGLE TABLE ACCESS PATH
TABLE: KOKBF$ ORIG CDN: 4072 CMPTD CDN: 4072
Access path: tsc Resc: 2 Resp: 2
BEST_CST: 2.00 PATH: 2 Degree: 1

Basically, the CBO is using a default of 4072 rows in the collection generated from the IN list. I have seen a similar problem with GLOBAL TEMPORARY TABLES and which was helped by using a DYNAMIC_SAMPLING hint. Unfortunately, I believe that the latter was introduced in 9i and certainly any attempts to use it in the above problem on my version were less than fruitful.  Same with the CARDINALITY hint.

Anyway, I took the easy route to the solution. I quickly created a normal heap table with 2 rows in it to see how the CBO would react in those circumstances:


SQL> create table domtest
2 (col1 number);

Table created.


SQL>
SQL> insert into domtest values (42299);

1 row created.


SQL>
SQL> insert into domtest values (42295);

1 row created.


SQL>
SQL> begin
2 dbms_stats.gather_table_stats(USER,'DOMTEST');
3 end;
4 /

PL/SQL procedure successfully completed.


SQL>
SQL> set autotrace trace explain
SQL>
SQL> select n.slug, n.id
2 from news n
3 where n.status=15
4 and n.sectionid in
5 (
6 select * from domtest
7 );

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=11 Card=94 Bytes=4136)
1 0 NESTED LOOPS (Cost=11 Card=94 Bytes=4136)
2 1 VIEW OF 'VW_NSO_1' (Cost=3 Card=2 Bytes=26)
3 2 SORT (UNIQUE) (Cost=3 Card=2 Bytes=10)
4 3 TABLE ACCESS (FULL) OF 'DOMTEST' (Cost=1 Card=2 Bytes=10)
5 1 INDEX (RANGE SCAN) OF 'I_NEWS_001' (NON-UNIQUE) (Cost=4 Card=306123 Bytes=9489813)

Under these circumstances, a nested loop seemed more in line with what I was expecting previously.So, I decided to try a nested loop hint, the syntax being


/*+ ordered use_nl (--table / alias--) */

and which says that the table specified should be used as the inner table on the nested loop – so, this should drive off the entries in the IN list array.


SQL> select /*+ ordered use_nl (n) */
2 n.slug, n.id
3 from news n
4 where n.status=15
5 and n.sectionid in
6 (
7 select * from table ( cast ( f_numinlist('42299, 42295') as tt_number))
8 )
9 /


Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=16302 Card=192277 Bytes=8460188)
1 0 NESTED LOOPS (Cost=16302 Card=192277 Bytes=8460188)
2 1 VIEW OF 'VW_NSO_1' (Cost=14 Card=4072 Bytes=52936)
3 2 SORT (UNIQUE) (Cost=14 Card=4072)
4 3 COLLECTION ITERATOR (PICKLER FETCH)
5 1 INDEX (RANGE SCAN) OF 'I_NEWS_001' (NON-UNIQUE) (Cost=4 Card=306123 Bytes=9489813)

The numbers are still a little off because the CBO is still using a default on 4072 rows but it runs like a dream.

Event 10053 and flushing

At various points today, I have convinced myself that various daft things were happening with Oracle, my hints and/or SQL*Plus when the real cause was just me being really daft.

I’ve been looking at a couple of problem queries today and using extended tracing of the CBO using event 10053.

However, every now and then my tracing would cease to provide anything meaningful.

The reason was that the CBO wasn’t doing it’s calculations because I was repeatedly running the same queries and much of the time the execution plans were in the shared pool.

So I should have been flushing the shared pool, changing the string literals in the SQL, or adding a space here or there to force a hard parse.

Doh!

PGA dump

In a number of previous blogs, I serialised some problems and thoughts on those problems that I had had creating an Oracle Text Context index where some five million rows took six days to index.

So, the basic process is that each index creation process (between 1 and n depending on the parallelism – in my case, 4 but just for index creation purposes) indexes a bunch of documents until it reaches the memory limit and then flushes those to disk.
The logs of the parallel index creation show that the in-memory indexing slows down over time but the flushing to disk remains fairly constant.

I originally posted a question on the OTN Oracle Text forum to try to get some ideas and one suggested possibility was that this might be hitting a memory leak bug which could be proven by taking PGA dumps and validating that some sub-heap is growing uncontrollably.

My immediate thoughts were that I hadn’t done this before (and I still haven’t done this against a renewed attempt at doing the index creation) and that I wasn’t sure of the mechanism to do this but that it might well involve the ORADEBUG command.

A search on a well-known search engine suggested a couple of helpful links:
http://www.evdbt.com/Oradebug_Modrakovic.pdf
http://julian.dyke.users.btopenworld.com/Oracle/Diagnostics/Tools/ORADEBUG/ORADEBUG.html

So, it turns out that the command I need is probably:

ORADEBUG DUMP HEAPDUMP 1

which should dump the PGA.

Note that this command can be run via SQL*Plus but not via Toad, for example.
So, typically, I would need to logon to SQL*Plus as an appropriately privileged user, identify the process(es) into which I wish to “inject” the debug process using the SETORAPID or SETOSPID subcommands.
Then, I would use ORADEBUG DUMP HEAPDUMP 1 to do a PGA dump and at a later interval do it again, probably several times, and then inspect the trace file.

Handily, the command ORADEBUG TRACEFILE_NAME tells you the name of the trace file produced.

Here is an example PGA dump. Interesting stuff. And once I’ve tried to figure out how to read this, I might be in a position to do this against a renewed indexing attempt.

Driver palavers – Missing data bind variable: Java -> Web Objects -> JDBC -> Oracle 8i

Another day, another interesting investigation.

Today, we had to investigate a strange hanging of the application on executing a certain bit of SQL.

On initial inspection, there didn’t seem much to go wrong in this application-generated SQL:

SELECT t0.languageID, t0.CREATED, t0.DOB, t0.PUB, t0.STORYSOID, t0.sectionID, t0.ID, t0.lastModified, t0.CURRENTOWNERID, t0.lifeSpan, t0.OrigPubTime, t0.newsID, t0.STATUS, t0.pubTime, t0.ORIGINATORID, t0.origid, t0.EDITSOID, t0.dobuserid, t0.description
FROM News t0
WHERE (t0.lastModified > :1
AND t0.STATUS = :2)
ORDER BY t0.lastModified DESC

To me, the first step that occurred was to do a quick explain plan (version 8i) substituting in some sensible variables:

1 SELECT t0.languageID……
2 FROM News t0
3 WHERE (t0.lastModified > TO_DATE(’01/01/2007′,’DD/MM/YYYY’)
4 AND t0.STATUS = 3)
5* ORDER BY t0.lastModified DESC
SQL> /

Execution Plan
———————————————————-
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=3 Card=1 Bytes=1236)
1 0 TABLE ACCESS (BY INDEX ROWID) OF ‘NEWS’ (Cost=3 Card=1 Bytes=1236)
2 1 INDEX (RANGE SCAN DESCENDING) OF ‘NEWSLASTMODIFIEDINDEX’ (NON-UNIQUE) (Cost=2 Card=1)

Incidentally, I did not realise that you couldn’t set up a SQL*Plus variable with a date datatype.

Anyway, the above plan did not seem unreasonable.

Next step was to have a quick look at the SGA to see if that could reveal much. So, the shared pool was flushed and some example SQL run via the application and via SQL*Plus . Sure enough – there was the statement. However, the strange thing was the high level of consistent gets.

A light bulb moment – sure the application might look like it was hanging if it was doing a lot of IO. However, the value being passed in for the date was such that no rows would be found. And we had a quick discussion about the mapping of Oracle dates, Java dates, Oracle Timestamps and Java Timestamps and recent version changes.

Curious. Initial thoughts revolved around date conversion. Maybe something was wrong with the date format and the value passed in such that we were matching too many rows.

A quick step back. This application is Java with WebObjects and JDBC Thin driver – a recent 10g version, I believe.

So, I decided to cut to the chase and do an extended trace on event 10046 to see if that could reveal more.

First step was to create a LOGON trigger to capture the binds and waits:


CREATE OR REPLACE TRIGGER trig_ald_set_session_trace
AFTER LOGON ON DATABASE
DECLARE
BEGIN
--
IF USER='USER1'
THEN
--
EXECUTE IMMEDIATE 'ALTER SESSION SET events ''10046 trace name context forever, level 12''';
--
END IF;
--
EXCEPTION WHEN OTHERS THEN NULL;
END;
/

Eventually, after unsuccessfully testing this a couple of times, the constant potential danger of the EXCEPTION WHEN OTHERS THEN NULL clause (not one that I normally use, honest guv) was revealed as the DBA type user that I was using did not have direct ALTER SESSION privilege.

So, once rectified, the trigger was enabled and we were able to run the application to see what would happen:

PARSING IN CURSOR #1 len=314 dep=0 uid=48 oct=3 lid=48 tim=1008224631 hv=3758786868 ad='90be9478'
SELECT t0.languageID, t0.CREATED, t0.DOB, t0.PUB, t0.STORYSOID, t0.sectionID, t0.ID, t0.lastModified, t0.CURRENTOWNERID, t0.lifeSpan, t0.OrigPubTime, t0.newsID, t0.STATUS, t0.pubTime, t0.ORIGINATORID, t0.origid, t0.EDITSOID, t0.dobuserid, t0.description FROM News t0 WHERE (t0.lastModified = :1 AND t0.STATUS = :2)
END OF STMT
PARSE #1:c=0,e=0,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=0,tim=1008224631
BINDS #1:
bind 0: dty=180 mxl=11(11) mal=00 scl=00 pre=00 oacflg=03 oacfl2=0 size=36 offset=0
bfp=01a6fac8 bln=11 avl=07 flg=05
value=
Dump of memory from 0x1A6FAC8 to 0x1A6FACF
1A6FAC0 786B0119 0B290900 [xk...)..]
bind 1: dty=2 mxl=22(22) mal=00 scl=00 pre=00 oacflg=03 oacfl2=0 size=0 offset=12
bfp=01a6fad4 bln=22 avl=02 flg=01
value=11
EXEC #1:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=4,tim=1008224632
WAIT #1: nam='SQL*Net message to client' ela= 0 p1=1952673792 p2=1 p3=0
WAIT #1: nam='SQL*Net message from client' ela= 3 p1=1952673792 p2=1 p3=0
WAIT #1: nam='file open' ela= 0 p1=0 p2=0 p3=0
WAIT #1: nam='db file sequential read' ela= 0 p1=4 p2=5343 p3=1
WAIT #1: nam='db file sequential read' ela= 1 p1=4 p2=17552 p3=1
WAIT #1: nam='db file sequential read' ela= 0 p1=4 p2=85979 p3=1
...... lots more db file sequential reads .......

I thought this bit was a little strange:

BINDS #1:
bind 0: ……
value=

My initial thoughts were that this should raise an ORA-1008: Not all variables bound. However, clearly we were past that stage.

So, at the moment my best reading between the lines revolves around the optimizer doing some sort of predicate pruning given that bind variable :1 has been lost – hence the full table scan on the table concerned and an awful lot of rows being fetched.

The application developer rebuilt the app using the 8i driversand sure enough, no lost bind variable. Also interestingly, the trace of the 8i driver session revealed something that was absent from the previous traced session – the checking of NLS settings.

PARSING IN CURSOR #1 len=43 dep=0 uid=48 oct=42 lid=48 tim=1008340170 hv=1567373371 ad='904ffb90'
ALTER SESSION SET NLS_LANGUAGE = 'AMERICAN'
END OF STMT
....
PARSING IN CURSOR #1 len=43 dep=0 uid=48 oct=42 lid=48 tim=1008340170 hv=3073757550 ad='904f8ef8'
ALTER SESSION SET NLS_TERRITORY = 'AMERICA'
END OF STMT
....
PARSING IN CURSOR #1 len=76 dep=0 uid=48 oct=3 lid=48 tim=1008340170 hv=3617023014 ad='904f7e04'
SELECT VALUE FROM NLS_INSTANCE_PARAMETERS WHERE PARAMETER ='NLS_DATE_FORMAT'
END OF STMT
....
PARSING IN CURSOR #1 len=314 dep=0 uid=48 oct=3 lid=48 tim=1008340180 hv=3758786868 ad='90be9478'
SELECT t0.languageID, t0.CREATED, t0.DOB, t0.PUB, t0.STORYSOID, t0.sectionID, t0.ID, t0.lastModified, t0.CURRENTOWNERID, t0.lifeSpan, t0.OrigPubTime, t0.newsID, t0.STATUS, t0.pubTime, t0.ORIGINATORID, t0.origid, t0.EDITSOID, t0.dobuserid, t0.description FROM News t0 WHERE (t0.lastModified = :1 AND t0.STATUS = :2)
END OF STMT
PARSE #1:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=4,tim=1008340180
BINDS #1:
bind 0: dty=12 mxl=07(49) mal=00 scl=00 pre=00 oacflg=01 oacfl2=0 size=32 offset=0
bfp=01a59f40 bln=07 avl=07 flg=05
value="1/25/2007 10:59:23"
bind 1: dty=2 mxl=22(22) mal=00 scl=00 pre=00 oacflg=01 oacfl2=0 size=0 offset=8
bfp=01a59f48 bln=22 avl=02 flg=01
value=11
EXEC #1:c=2,e=1,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=4,tim=1008340181
WAIT #1: nam='SQL*Net message to client' ela= 0 p1=1952673792 p2=1 p3=0
WAIT #1: nam='SQL*Net message from client' ela= 0 p1=1952673792 p2=1 p3=0
WAIT #1: nam='SQL*Net message to client' ela= 0 p1=1952673792 p2=1 p3=0
WAIT #1: nam='SQL*Net message from client' ela= 0 p1=1952673792 p2=1 p3=0
FETCH #1:c=0,e=0,p=0,cr=4,cu=0,mis=0,r=0,dep=0,og=4,tim=1008340182
WAIT #1: nam='SQL*Net message to client' ela= 0 p1=1952673792 p2=1 p3=0
WAIT #1: nam='SQL*Net message from client' ela= 0 p1=1952673792 p2=1 p3=0
WAIT #1: nam='SQL*Net message to client' ela= 0 p1=1952673792 p2=1 p3=0
WAIT #1: nam='SQL*Net message from client' ela= 1 p1=1952673792 p2=1 p3=0
XCTEND rlbk=0, rd_only=1
WAIT #0: nam='SQL*Net message to client' ela= 0 p1=1952673792 p2=1 p3=0
WAIT #0: nam='SQL*Net message from client' ela= 0 p1=1952673792 p2=1 p3=0
STAT #1 id=1 cnt=0 pid=0 pos=0 obj=2498 op='TABLE ACCESS BY INDEX ROWID NEWS '
STAT #1 id=2 cnt=1 pid=1 pos=1 obj=2737 op='INDEX RANGE SCAN '
WAIT #1: nam='SQL*Net message to client' ela= 0 p1=1952673792 p2=1 p3=0
WAIT #1: nam='SQL*Net message from client' ela= 3 p1=1952673792 p2=1 p3=0
WAIT #1: nam='SQL*Net message to client' ela= 0 p1=1952673792 p2=1 p3=0
WAIT #1: nam='SQL*Net message from client' ela= 0 p1=1952673792 p2=1 p3=0

So, there could be something environmental that’s not correct – for example local settings, driver config, etc – or I suppose it could be nothing but different behaviour between the drivers. Gut feeling is that I wouldn’t rule out the former.

So, for me, a number of searches to do starting at Metalink, then Jonathan Lewis’s sites and a quick flick through his book, then Google.

Although this is now no longer a pressing problem, this is something we will need to look into again some day soon en route to 10g upgrade. For example, is this an incompatibility between the application and the 10g drivers, or an issue between the 10g drivers and an 8i database etc. Plus, of course, it could be to do with the behaviour of dates and timestamps between Java and Oracle and the compatibility between, and changes to, aforementioned datatypes between different versions of said technologies.

Follow

Get every new post delivered to your Inbox.

Join 68 other followers