Why did that report run slow? ASH says log file sequential read

“Dump Online Redo for Buffers in Pin History”

I’ve just been investigating why a certain report ran slow last night.
Read more of this post

Upgrade 11.2.0.3 Days 1 and 2

AKA being knee deep in …

Go-live at the weekend from 9i Solaris to 11.2.0.3 Linux.

There were a couple of minor issues on arrival Monday morning but first critical situation started @ 8:29 am Monday morning and is still ongoing Tuesday evening.

More details later but key points are:
1. Degradation of various durations, longest so far about two hours.
2. Three deliberate bounces, one crash.
3. Unrelated to above, ongoing ORA-00600s at a rate of a couple per hour, ongoing ORA-07445s at a peak rate of about 4 per minute.

Technical considerations from above:
1. Library Cache : Mutex X Contention and change to DBMS_ALERT.REGISTER
If you make significant use of DBMS_ALERT then beware a signature change in 11.2 to DBMS_ALERT.REGISTER which relates to cleanup of orphaned pipes, the default value of which is TRUE.

This new cleanup behaviour, introduced due to some other bug that I don’t have detail of right now, is apparently quite an expensive operation and in practice for us, this has resulted in critical hanging scenarios on “library cache: mutex x” effectively bringing the db to its knees.

The key recursive SQL that seems to be at the heart of this mutex issue is:

SELECT DISTINCT SUBSTR(KGLNAOBJ,11) SID FROM X$KGLOB WHERE KGLHDNSP = 7 AND KGLNAOBJ LIKE 'ORA$ALERT$%' AND BITAND(KGLHDFLG,128)!=0 UNION SELECT DISTINCT SID FROM DBMS_ALERT_INFO

Changing calls to DBMS_ALERT.REGISTER to make an explicit FALSE value for cleanup is currently the advised approach but raises the question of how you should deal with the cleanup that you’d think must be required if it was introduced as default behaviour.

At the moment, I am unconvinced by the declaration that this is expected behaviour of DBMS_ALERT.REGISTER. An expensive operation is not one that causes a massive chain of “library cache:mutex x” waits for 2 hours until the database is effectively down and has to be bounced. That smacks of bug to me.

Observations from the issue – firstly that 11.2.0.3 seems to be reporting mutex waits better than previous versions but whilst all blocking indicators (in ASH, V$SESSION and V$MUTEX_SLEEP_HISTORY) were clear about who they thought was the suspect – i.e. they all identified the same session – that session itself was waiting on the same mutex with a blocking_session of UNKNOWN. And killing that session at the head of the chain just resulted in someone higher up the chain taking their place and everyone else pointing the finger at the new #1.

2. ORA-00600 and ORA-07445s
We have multiple ORA-00600 and ORA-07445 errors, most of which seem to be related to the same area – application contexts and the introduction of the caching of instantiation objects (iobs) for application contexts in order to reduce library cache mutex contention. As mentioned this has caused one crash from PMON which inevitably crashed the instance and a steady stream of application errors. Not all the below are related to the same issue – those that are will “probably” be fixed by the same patch – but this is the current list of ongoing:

ORA-600: [kgldafr-bad-ext]
ORA-600: [kglbrk-bad-lock]
ORA-07445: [kglbrk()+45]
ORA-07445: [kglLockHandle()+23]
ORA-07445: [kksampp()+97]
ORA-07445: [qertbStart()+119]
ORA-07445: [keswxCurPushPlanMonitoring()+510]
ORA-07445: [ktspsrchsc_free()+321]

Just a brief summary for visibility, more details another time…

As I was coming in on Monday morning, I thought that maybe I was being too pessimistic with my expectations and that it might turn out to be a quiet day. Tuns out I wasn’t pessimistic enough. And we spent months testing this…

Fun, fun…

Beware the compound trigger

In 11g, Oracle introduced a new feature in the compound trigger.

Through many years and versions, the solution to the mutating table / ora-04091 problem was often a clumsy combination of packaged global, before statement trigger, after row trigger and after statement trigger.

However, in 11g, the introduction of the compound trigger provided a much more compact solution, where you could declare your global, your statement level logic and any row level code in a single trigger.

Examples from the Oracle documentation:

Trouble is, under certain simple circumstances, they don’t work.

Take this illustration abstracted from a proposed VPD labelling solution on 11.1.0.6.

The illustration is a little artificial because all distractions have been removed but it’s taken from a real world example.

Essentially, three updates via a package cause an ORA-07445 but three direct update statements do not. Take a look:


dominic@DOM11G>prompt ---------- cleanup previous creates
---------- cleanup previous creates
dominic@DOM11G>drop table compound_trigger_test_tab1;

Table dropped.

Elapsed: 00:00:01.02
dominic@DOM11G>drop package compound_trigger_test_pkg;

Package dropped.

Elapsed: 00:00:00.00
dominic@DOM11G>
dominic@DOM11G>prompt ---------- create table
---------- create table
dominic@DOM11G>
dominic@DOM11G>create table compound_trigger_test_tab1
  2  (col1 number
  3  ,col2 number);

Table created.

Elapsed: 00:00:00.00
dominic@DOM11G>
dominic@DOM11G>prompt ---------- insert data
---------- insert data
dominic@DOM11G>
dominic@DOM11G>insert into compound_trigger_test_tab1 values (1,1);

1 row created.

Elapsed: 00:00:00.00
dominic@DOM11G>
dominic@DOM11G>
dominic@DOM11G>prompt ---------- create compound trigger
---------- create compound trigger
dominic@DOM11G>
dominic@DOM11G>CREATE OR REPLACE TRIGGER compound_trigger_test_tri1
  2  FOR INSERT OR UPDATE ON compound_trigger_test_tab1
  3  REFERENCING NEW AS NEW OLD AS OLD
  4  COMPOUND TRIGGER
  5   --
  6   TYPE r_ctt IS RECORD 
  7   (col1    NUMBER
  8   ,col2    NUMBER);
  9   --
 10   TYPE t_ctt IS TABLE OF r_ctt
 11   INDEX BY PLS_INTEGER;
 12   --
 13   vt_ctt t_ctt;
 14   --
 15  BEFORE STATEMENT
 16  IS
 17  BEGIN
 18     --
 19     DBMS_OUTPUT.PUT_LINE('compound_trigger_test_tri1::before statement');
 20     NULL;
 21     --
 22  END BEFORE STATEMENT;
 23  --
 24  BEFORE EACH ROW
 25  IS
 26  BEGIN
 27      --
 28      DBMS_OUTPUT.PUT_LINE('compound_trigger_test_tri1::before row');
 29      NULL;
 30      --
 31  END BEFORE EACH ROW;
 32  --
 33  AFTER EACH ROW
 34  IS
 35  BEGIN
 36     --
 37     DBMS_OUTPUT.PUT_LINE('compound_trigger_test_tri1::after row');
 38     NULL;
 39     --
 40  END AFTER EACH ROW;
 41  --
 42  AFTER STATEMENT
 43  IS
 44    i PLS_INTEGER;
 45  BEGIN
 46     --
 47     DBMS_OUTPUT.PUT_LINE('compound_trigger_test_tri1::after statement');
 48     i := vt_ctt.FIRST;
 49     --
 50     WHILE (i IS NOT NULL)
 51     LOOP
 52         --
 53         NULL;
 54         --
 55         i := vt_ctt.NEXT(i);
 56         --
 57     END LOOP;
 58     --
 59  END AFTER STATEMENT;
 60  --
 61  END compound_trigger_test_tri1;
 62  /

Trigger created.

Elapsed: 00:00:00.00
dominic@DOM11G>
dominic@DOM11G>show errors
No errors.
dominic@DOM11G>
dominic@DOM11G>
dominic@DOM11G>prompt ---------- create package header
---------- create package header
dominic@DOM11G>
dominic@DOM11G>CREATE OR REPLACE PACKAGE compound_trigger_test_pkg
  2  AS
  3    --
  4    PROCEDURE p_update_tab1 (
  5      i_col1                     in     compound_trigger_test_tab1.col1%TYPE
  6    );
  7    --
  8  END;
  9  /

Package created.

Elapsed: 00:00:00.00
dominic@DOM11G>
dominic@DOM11G>
dominic@DOM11G>prompt ---------- create package body
---------- create package body
dominic@DOM11G>
dominic@DOM11G>CREATE OR REPLACE PACKAGE BODY compound_trigger_test_pkg
  2  AS
  3    --
  4    PROCEDURE p_update_tab1 (
  5      i_col1                     in     compound_trigger_test_tab1.col1%TYPE
  6    )
  7    AS
  8    BEGIN
  9       --
 10       DBMS_OUTPUT.PUT_LINE('compound_trigger_test_pkg.p_update_tab1');
 11       UPDATE compound_trigger_test_tab1
 12       SET    col2 = col2
 13       WHERE  col1 = i_col1;
 14       --
 15    END p_update_tab1;
 16  END;
 17  /

Package body created.

Elapsed: 00:00:00.00
dominic@DOM11G>
dominic@DOM11G>
dominic@DOM11G>prompt ---------- run update via package test 1 - should be ok
---------- run update via package test 1 - should be ok
dominic@DOM11G>exec compound_trigger_test_pkg.p_update_tab1(1);

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.00
dominic@DOM11G>
dominic@DOM11G>prompt ---------- run update via package 2 - should be ok
---------- run update via package 2 - should be ok
dominic@DOM11G>exec compound_trigger_test_pkg.p_update_tab1(2);

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.00
dominic@DOM11G>
dominic@DOM11G>prompt ---------- run update via package 3 - should go bang
---------- run update via package 3 - should go bang
dominic@DOM11G>exec compound_trigger_test_pkg.p_update_tab1(3);
BEGIN compound_trigger_test_pkg.p_update_tab1(3); END;

*
ERROR at line 1:
ORA-03113: end-of-file on communication channel


Elapsed: 00:00:00.05
dominic@DOM11G>conn dominic@dom11g
Enter password: *******
Connected.
dominic@DOM11G>UPDATE compound_trigger_test_tab1
  2  SET    col2 = col2
  3  WHERE  col1 = 1;

1 row updated.

Elapsed: 00:00:00.00
dominic@DOM11G>
dominic@DOM11G>UPDATE compound_trigger_test_tab1
  2  SET    col2 = col2
  3  WHERE  col1 = 2;

0 rows updated.

Elapsed: 00:00:00.00
dominic@DOM11G>
dominic@DOM11G>UPDATE compound_trigger_test_tab1
  2  SET    col2 = col2
  3  WHERE  col1 = 3;

0 rows updated.

Elapsed: 00:00:00.00
dominic@DOM11G>

Alert log and trace file reveal the cause to be an ORA-07445:


Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x448] [PC:0x1B6A3B8, kkxtexe()+988]
Errors in file /app/oracle/diag/rdbms/dom11g/DOM11G/trace/dom11G_ora_4860.trc  (incident=845912):
ORA-07445: exception encountered: core dump [kkxtexe()+988] [SIGSEGV] [ADDR:0x448] [PC:0x1B6A3B8] [Address not mapped to object] []
ORA-21779: duration not active

So, for the moment, I would not go near them.
I’ve got an SR raised (it’s all I seem to doing these days is find bugs in 11.1.0.6 (more on the incompatibility between VPD and Materialized Views another time) and 11.1.0.7)
It’s back to multiple row and statement level triggers.

Follow

Get every new post delivered to your Inbox.

Join 70 other followers