Why did that report run slow? ASH says log file sequential read

“Dump Online Redo for Buffers in Pin History”

I’ve just been investigating why a certain report ran slow last night.
Read more of this post

Upgrade 11.2.0.3 Days 1 and 2

AKA being knee deep in …

Go-live at the weekend from 9i Solaris to 11.2.0.3 Linux.

There were a couple of minor issues on arrival Monday morning but first critical situation started @ 8:29 am Monday morning and is still ongoing Tuesday evening.

More details later but key points are:
1. Degradation of various durations, longest so far about two hours.
2. Three deliberate bounces, one crash.
3. Unrelated to above, ongoing ORA-00600s at a rate of a couple per hour, ongoing ORA-07445s at a peak rate of about 4 per minute.

Technical considerations from above:
1. Library Cache : Mutex X Contention and change to DBMS_ALERT.REGISTER
If you make significant use of DBMS_ALERT then beware a signature change in 11.2 to DBMS_ALERT.REGISTER which relates to cleanup of orphaned pipes, the default value of which is TRUE.

This new cleanup behaviour, introduced due to some other bug that I don’t have detail of right now, is apparently quite an expensive operation and in practice for us, this has resulted in critical hanging scenarios on “library cache: mutex x” effectively bringing the db to its knees.

The key recursive SQL that seems to be at the heart of this mutex issue is:

SELECT DISTINCT SUBSTR(KGLNAOBJ,11) SID FROM X$KGLOB WHERE KGLHDNSP = 7 AND KGLNAOBJ LIKE 'ORA$ALERT$%' AND BITAND(KGLHDFLG,128)!=0 UNION SELECT DISTINCT SID FROM DBMS_ALERT_INFO

Changing calls to DBMS_ALERT.REGISTER to make an explicit FALSE value for cleanup is currently the advised approach but raises the question of how you should deal with the cleanup that you’d think must be required if it was introduced as default behaviour.

At the moment, I am unconvinced by the declaration that this is expected behaviour of DBMS_ALERT.REGISTER. An expensive operation is not one that causes a massive chain of “library cache:mutex x” waits for 2 hours until the database is effectively down and has to be bounced. That smacks of bug to me.

Observations from the issue – firstly that 11.2.0.3 seems to be reporting mutex waits better than previous versions but whilst all blocking indicators (in ASH, V$SESSION and V$MUTEX_SLEEP_HISTORY) were clear about who they thought was the suspect – i.e. they all identified the same session – that session itself was waiting on the same mutex with a blocking_session of UNKNOWN. And killing that session at the head of the chain just resulted in someone higher up the chain taking their place and everyone else pointing the finger at the new #1.

2. ORA-00600 and ORA-07445s
We have multiple ORA-00600 and ORA-07445 errors, most of which seem to be related to the same area – application contexts and the introduction of the caching of instantiation objects (iobs) for application contexts in order to reduce library cache mutex contention. As mentioned this has caused one crash from PMON which inevitably crashed the instance and a steady stream of application errors. Not all the below are related to the same issue – those that are will “probably” be fixed by the same patch – but this is the current list of ongoing:

ORA-600: [kgldafr-bad-ext]
ORA-600: [kglbrk-bad-lock]
ORA-07445: [kglbrk()+45]
ORA-07445: [kglLockHandle()+23]
ORA-07445: [kksampp()+97]
ORA-07445: [qertbStart()+119]
ORA-07445: [keswxCurPushPlanMonitoring()+510]
ORA-07445: [ktspsrchsc_free()+321]

Just a brief summary for visibility, more details another time…

As I was coming in on Monday morning, I thought that maybe I was being too pessimistic with my expectations and that it might turn out to be a quiet day. Tuns out I wasn’t pessimistic enough. And we spent months testing this…

Fun, fun…

ORA-600 ktagetg can’t be used for a DDL

Not much to say.

So far been unable to prove my fledgling theory via an isolated test case, but if someone else gets this error, then this post might at least provide a possible resolution.

Background is that release scripts failed to create a materialized view at time T1 due to dependent object not existing, @ T2 revised release scripts failed repeatedly with:

ORA-00600: internal error code, arguments: [ktagetg can't be used for a DDL.], [], [], [], [], [], [], []

 

No published matches via metalink ORA-600 lookup.

Resolution, in my particular circumstances, was to create a dummy materialized view with the same name but with a much simpler definition, drop it and then recreate via the original script. The creating and dropping cleaned up/out something that wasn’t otherwise being cleaned out.

Also on metalink, a useful article here (175982.1), including information on error lookups by mnemonic – ktagetg as in “ORA-00600: internal error code, arguments: [ktagetg can't be used for a DDL.]” stands for “Kernel Transaction Access Get Generic DML lock” and the kta module provides “support for DML locks and temporary tables associated with table access”.

Follow

Get every new post delivered to your Inbox.

Join 70 other followers