Another day, another interesting investigation.
Today, we had to investigate a strange hanging of the application on executing a certain bit of SQL.
On initial inspection, there didn’t seem much to go wrong in this application-generated SQL:
SELECT t0.languageID, t0.CREATED, t0.DOB, t0.PUB, t0.STORYSOID, t0.sectionID, t0.ID, t0.lastModified, t0.CURRENTOWNERID, t0.lifeSpan, t0.OrigPubTime, t0.newsID, t0.STATUS, t0.pubTime, t0.ORIGINATORID, t0.origid, t0.EDITSOID, t0.dobuserid, t0.description
FROM News t0
WHERE (t0.lastModified > :1
AND t0.STATUS = :2)
ORDER BY t0.lastModified DESC
To me, the first step that occurred was to do a quick explain plan (version 8i) substituting in some sensible variables:
1 SELECT t0.languageID……
2 FROM News t0
3 WHERE (t0.lastModified > TO_DATE(’01/01/2007′,’DD/MM/YYYY’)
4 AND t0.STATUS = 3)
5* ORDER BY t0.lastModified DESC
SQL> /
Execution Plan
———————————————————-
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=3 Card=1 Bytes=1236)
1 0 TABLE ACCESS (BY INDEX ROWID) OF ‘NEWS’ (Cost=3 Card=1 Bytes=1236)
2 1 INDEX (RANGE SCAN DESCENDING) OF ‘NEWSLASTMODIFIEDINDEX’ (NON-UNIQUE) (Cost=2 Card=1)
Incidentally, I did not realise that you couldn’t set up a SQL*Plus variable with a date datatype.
Anyway, the above plan did not seem unreasonable.
Next step was to have a quick look at the SGA to see if that could reveal much. So, the shared pool was flushed and some example SQL run via the application and via SQL*Plus . Sure enough – there was the statement. However, the strange thing was the high level of consistent gets.
A light bulb moment – sure the application might look like it was hanging if it was doing a lot of IO. However, the value being passed in for the date was such that no rows would be found. And we had a quick discussion about the mapping of Oracle dates, Java dates, Oracle Timestamps and Java Timestamps and recent version changes.
Curious. Initial thoughts revolved around date conversion. Maybe something was wrong with the date format and the value passed in such that we were matching too many rows.
A quick step back. This application is Java with WebObjects and JDBC Thin driver – a recent 10g version, I believe.
So, I decided to cut to the chase and do an extended trace on event 10046 to see if that could reveal more.
First step was to create a LOGON trigger to capture the binds and waits:
CREATE OR REPLACE TRIGGER trig_ald_set_session_trace
AFTER LOGON ON DATABASE
DECLARE
BEGIN
--
IF USER='USER1'
THEN
--
EXECUTE IMMEDIATE 'ALTER SESSION SET events ''10046 trace name context forever, level 12''';
--
END IF;
--
EXCEPTION WHEN OTHERS THEN NULL;
END;
/
Eventually, after unsuccessfully testing this a couple of times, the constant potential danger of the EXCEPTION WHEN OTHERS THEN NULL clause (not one that I normally use, honest guv) was revealed as the DBA type user that I was using did not have direct ALTER SESSION privilege.
So, once rectified, the trigger was enabled and we were able to run the application to see what would happen:
PARSING IN CURSOR #1 len=314 dep=0 uid=48 oct=3 lid=48 tim=1008224631 hv=3758786868 ad='90be9478'
SELECT t0.languageID, t0.CREATED, t0.DOB, t0.PUB, t0.STORYSOID, t0.sectionID, t0.ID, t0.lastModified, t0.CURRENTOWNERID, t0.lifeSpan, t0.OrigPubTime, t0.newsID, t0.STATUS, t0.pubTime, t0.ORIGINATORID, t0.origid, t0.EDITSOID, t0.dobuserid, t0.description FROM News t0 WHERE (t0.lastModified = :1 AND t0.STATUS = :2)
END OF STMT
PARSE #1:c=0,e=0,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=0,tim=1008224631
BINDS #1:
bind 0: dty=180 mxl=11(11) mal=00 scl=00 pre=00 oacflg=03 oacfl2=0 size=36 offset=0
bfp=01a6fac8 bln=11 avl=07 flg=05
value=
Dump of memory from 0x1A6FAC8 to 0x1A6FACF
1A6FAC0 786B0119 0B290900 [xk...)..]
bind 1: dty=2 mxl=22(22) mal=00 scl=00 pre=00 oacflg=03 oacfl2=0 size=0 offset=12
bfp=01a6fad4 bln=22 avl=02 flg=01
value=11
EXEC #1:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=4,tim=1008224632
WAIT #1: nam='SQL*Net message to client' ela= 0 p1=1952673792 p2=1 p3=0
WAIT #1: nam='SQL*Net message from client' ela= 3 p1=1952673792 p2=1 p3=0
WAIT #1: nam='file open' ela= 0 p1=0 p2=0 p3=0
WAIT #1: nam='db file sequential read' ela= 0 p1=4 p2=5343 p3=1
WAIT #1: nam='db file sequential read' ela= 1 p1=4 p2=17552 p3=1
WAIT #1: nam='db file sequential read' ela= 0 p1=4 p2=85979 p3=1
...... lots more db file sequential reads .......
I thought this bit was a little strange:
BINDS #1:
bind 0: ……
value=
My initial thoughts were that this should raise an ORA-1008: Not all variables bound. However, clearly we were past that stage.
So, at the moment my best reading between the lines revolves around the optimizer doing some sort of predicate pruning given that bind variable :1 has been lost – hence the full table scan on the table concerned and an awful lot of rows being fetched.
The application developer rebuilt the app using the 8i driversand sure enough, no lost bind variable. Also interestingly, the trace of the 8i driver session revealed something that was absent from the previous traced session – the checking of NLS settings.
PARSING IN CURSOR #1 len=43 dep=0 uid=48 oct=42 lid=48 tim=1008340170 hv=1567373371 ad='904ffb90'
ALTER SESSION SET NLS_LANGUAGE = 'AMERICAN'
END OF STMT
....
PARSING IN CURSOR #1 len=43 dep=0 uid=48 oct=42 lid=48 tim=1008340170 hv=3073757550 ad='904f8ef8'
ALTER SESSION SET NLS_TERRITORY = 'AMERICA'
END OF STMT
....
PARSING IN CURSOR #1 len=76 dep=0 uid=48 oct=3 lid=48 tim=1008340170 hv=3617023014 ad='904f7e04'
SELECT VALUE FROM NLS_INSTANCE_PARAMETERS WHERE PARAMETER ='NLS_DATE_FORMAT'
END OF STMT
....
PARSING IN CURSOR #1 len=314 dep=0 uid=48 oct=3 lid=48 tim=1008340180 hv=3758786868 ad='90be9478'
SELECT t0.languageID, t0.CREATED, t0.DOB, t0.PUB, t0.STORYSOID, t0.sectionID, t0.ID, t0.lastModified, t0.CURRENTOWNERID, t0.lifeSpan, t0.OrigPubTime, t0.newsID, t0.STATUS, t0.pubTime, t0.ORIGINATORID, t0.origid, t0.EDITSOID, t0.dobuserid, t0.description FROM News t0 WHERE (t0.lastModified = :1 AND t0.STATUS = :2)
END OF STMT
PARSE #1:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=4,tim=1008340180
BINDS #1:
bind 0: dty=12 mxl=07(49) mal=00 scl=00 pre=00 oacflg=01 oacfl2=0 size=32 offset=0
bfp=01a59f40 bln=07 avl=07 flg=05
value="1/25/2007 10:59:23"
bind 1: dty=2 mxl=22(22) mal=00 scl=00 pre=00 oacflg=01 oacfl2=0 size=0 offset=8
bfp=01a59f48 bln=22 avl=02 flg=01
value=11
EXEC #1:c=2,e=1,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=4,tim=1008340181
WAIT #1: nam='SQL*Net message to client' ela= 0 p1=1952673792 p2=1 p3=0
WAIT #1: nam='SQL*Net message from client' ela= 0 p1=1952673792 p2=1 p3=0
WAIT #1: nam='SQL*Net message to client' ela= 0 p1=1952673792 p2=1 p3=0
WAIT #1: nam='SQL*Net message from client' ela= 0 p1=1952673792 p2=1 p3=0
FETCH #1:c=0,e=0,p=0,cr=4,cu=0,mis=0,r=0,dep=0,og=4,tim=1008340182
WAIT #1: nam='SQL*Net message to client' ela= 0 p1=1952673792 p2=1 p3=0
WAIT #1: nam='SQL*Net message from client' ela= 0 p1=1952673792 p2=1 p3=0
WAIT #1: nam='SQL*Net message to client' ela= 0 p1=1952673792 p2=1 p3=0
WAIT #1: nam='SQL*Net message from client' ela= 1 p1=1952673792 p2=1 p3=0
XCTEND rlbk=0, rd_only=1
WAIT #0: nam='SQL*Net message to client' ela= 0 p1=1952673792 p2=1 p3=0
WAIT #0: nam='SQL*Net message from client' ela= 0 p1=1952673792 p2=1 p3=0
STAT #1 id=1 cnt=0 pid=0 pos=0 obj=2498 op='TABLE ACCESS BY INDEX ROWID NEWS '
STAT #1 id=2 cnt=1 pid=1 pos=1 obj=2737 op='INDEX RANGE SCAN '
WAIT #1: nam='SQL*Net message to client' ela= 0 p1=1952673792 p2=1 p3=0
WAIT #1: nam='SQL*Net message from client' ela= 3 p1=1952673792 p2=1 p3=0
WAIT #1: nam='SQL*Net message to client' ela= 0 p1=1952673792 p2=1 p3=0
WAIT #1: nam='SQL*Net message from client' ela= 0 p1=1952673792 p2=1 p3=0
So, there could be something environmental that’s not correct – for example local settings, driver config, etc – or I suppose it could be nothing but different behaviour between the drivers. Gut feeling is that I wouldn’t rule out the former.
So, for me, a number of searches to do starting at Metalink, then Jonathan Lewis’s sites and a quick flick through his book, then Google.
Although this is now no longer a pressing problem, this is something we will need to look into again some day soon en route to 10g upgrade. For example, is this an incompatibility between the application and the 10g drivers, or an issue between the 10g drivers and an 8i database etc. Plus, of course, it could be to do with the behaviour of dates and timestamps between Java and Oracle and the compatibility between, and changes to, aforementioned datatypes between different versions of said technologies.
Recent Comments