Not waiting and not on SQL*Net Message from client

I’m in the early stages of investigating a performance degradation on 11.1.0.6, just tracing the session at the moment and waiting for it to complete.

This is distilled from a POC for an application persisting generic messages off a message bus.

At the moment, there’s no requirement to query/inspect the messages themselves so they’re being passed down as a BLOB.

I’ve reproduced the problem with a standalone piece of PL/SQL which removes any distractions from the rest of the application tech stack. This will be useful to raise an SR if necessary.

Performance degradation for a collection of 1000 blobs goes from roughly 1-2 seconds on a 9.2.0.8 corporate DB or on 10.2.0.4 on my laptop to currently over 1 hour on 11.1.0.6 corporate DB.

Obviously there are a lot of variations between environments and versions that might account for reasonable differences in performance.
I’ve not yet started investigating any of these differences in settings, parameters, etc but there’s something very bad about this sort of degradation.

Current session wait says that I’ve been waiting on SQL*Net message from client for pretty much all this time. So, a good illustration that this is wait is not always an idle event.

Edit 1
I’ve been a bit dim on this one. I’m NOT waiting on SQL*Net message from client. But I am in a strange place somewhere.

I was being lazy, using Toad to monitor and not looking at the information properly – stupid, stupid, stupid (not that Toad doesn’t display the requisite information properly, just that when you look at things from different perspectives sometimes you miss something that you would have seen when looking at it in a different way).

My session is active. But my wait state is WAITED_KNOWN_TIME so that wait was the last known wait, WAIT_TIME is > 0, SECONDS_IN_WAIT is ticking up to 20000.

Session statistics show that session is doing stuff.

Need to look at this from OS upwards – oradebug, pstack, etc. Or get someone to do it because I don’t have access 😦
End Edit 1

I’m currently gathering the evidence. It’s a bit slow because I don’t have full access to the 11g environment.

SQL to reproduce is below.

DROP TABLE sr_payload;
DROP TYPE tt_sr_payload;
DROP TYPE to_sr_payload;

CREATE TABLE sr_payload
(id         RAW(16) NOT NULL
,lastupdate DATE    NOT NULL
,payload    BLOB    NOT NULL)
LOB (payload) 
STORE AS sr_payload_payload
 (NOCACHE);
 
ALTER TABLE sr_payload ADD CONSTRAINT pk_sr_payload PRIMARY KEY (id);

CREATE OR REPLACE TYPE to_sr_payload AS OBJECT
(id         RAW(16)
,lastupdate DATE
,payload    BLOB);
/

CREATE OR REPLACE TYPE tt_sr_payload AS TABLE OF to_sr_payload;
/

CREATE OR REPLACE PACKAGE pkg_payload_sr
AS
  --
  PROCEDURE p_upsert_payload_mtc (
    i_payload                  IN     tt_sr_payload
  );
  -- 
  PROCEDURE p_upsert_payload_itc (
    i_payload                  IN     tt_sr_payload
  );
  --
END;
/

CREATE OR REPLACE PACKAGE BODY pkg_payload_sr
AS
  --
  PROCEDURE p_upsert_payload_mtc (
    i_payload                  IN     tt_sr_payload
  )
  AS
  BEGIN
     --
     MERGE
     INTO  sr_payload p
     USING (SELECT id         id
            ,      lastupdate lastupdate
            ,      payload    payload
            FROM   TABLE ( CAST ( i_payload AS tt_sr_payload) ) )x
     ON    (p.id = x.id)
     WHEN MATCHED THEN
            UPDATE
            SET    payload    = empty_blob()
            ,      lastupdate = SYSDATE
     WHEN NOT MATCHED THEN
          INSERT
          (      id
          ,      lastupdate
          ,      payload)
          VALUES
          (      x.id
          ,      x.lastupdate
          ,      x.payload);
     --
  END p_upsert_payload_mtc;
  -- 
  PROCEDURE p_upsert_payload_itc (
    i_payload                  IN     tt_sr_payload
  )
  AS
  BEGIN
     --
     INSERT
     INTO   sr_payload
     (      id
     ,      lastupdate
     ,      payload)
     SELECT x.id         id
     ,      x.lastupdate lastupdate
     ,      x.payload    payload
     FROM   TABLE ( CAST ( i_payload AS tt_sr_payload) ) x
     WHERE  NOT EXISTS (SELECT 1
                        FROM   sr_payload p
                        WHERE  p.id = x.id);  
     --
  END p_upsert_payload_itc;
  --
END pkg_payload_sr;
/

DECLARE  
 l_start NUMBER;  
 l_stop  NUMBER;  
 l_pl    tt_sr_payload := tt_sr_payload(); 
BEGIN  
  FOR i IN 1 .. 1000 
  LOOP
      l_pl.EXTEND();      
      l_pl(l_pl.LAST) := to_sr_payload(SYS_GUID,SYSDATE,TO_BLOB('12312312312'));  
  END LOOP;
  dbms_output.put_line('  ');
  l_start := dbms_utility.get_time(); 
  pkg_payload_sr.p_upsert_payload_itc(l_pl); 
  l_stop  := dbms_utility.get_time(); 
  dbms_output.put_line('Start pl: '||l_start); 
  dbms_output.put_line('Stop  pl: '||l_stop); 
  COMMIT;
END;
/

DROP PACKAGE pkg_payload_sr;
DROP TABLE sr_payload;
DROP TYPE tt_sr_payload;
drop type to_sr_payload;

Edit 2
Some results from ps and pstack** shows:

XXXXXXX$ ps -o user,pid,s,pcpu,time,etime,wchan,comm -p 27974

USER       PID S %CPU     TIME     ELAPSED WCHAN  COMMAND
oracle   27974 R 99.2 05:36:56    05:39:35 -      oracle

XXXXXXXX$ pstack 27974
#0  0x000000000718ab58 in kolrgrfc ()
#1  0x00000000019fa69f in kolrrdl ()
#2  0x00000000019fa7ce in kolradc ()
#3  0x0000000000e0df3f in __PGOSF350_kokegPromoteLob ()
#4  0x0000000000e0decb in kokegApplyOp ()
#5  0x0000000000e0de8e in kokegOpnGarbagePromote ()
#6  0x0000000000e0dc14 in kokegGarbageCollectBinds ()
#7  0x000000000705edb3 in kxsiis ()
#8  0x00000000070d0074 in __PGOSF445_qerltcNoKdtBufferedInsRowCBK ()
#9  0x000000000151924b in qerltcLoadStateMachine ()
#10 0x000000000151b64f in __PGOSF455_qerltcInsertSelectRop ()
#11 0x0000000003668205 in qerstRowP ()
#12 0x0000000003668205 in qerstRowP ()
#13 0x00000000014cd178 in qerjoFetch ()
#14 0x00000000036683f6 in qerstFetch ()
#15 0x0000000006f60b39 in rwsfcd ()
#16 0x00000000036683f6 in qerstFetch ()
#17 0x00000000070ce5d2 in qerltcFetch ()
#18 0x00000000036683f6 in qerstFetch ()
#19 0x000000000702a1db in insexe ()
#20 0x00000000070dc89a in opiexe ()
#21 0x00000000070e4513 in opipls ()
#22 0x0000000006fc979a in opiodr ()
#23 0x00000000070671b9 in __PGOSF123_rpidrus ()
#24 0x0000000007189474 in skgmstack ()
#25 0x0000000007067525 in rpidru ()
#26 0x00000000070667c0 in rpiswu2 ()
#27 0x0000000007065de3 in rpidrv ()
#28 0x000000000704e8c3 in psddr0 ()
#29 0x000000000704e594 in psdnal ()
#30 0x00000000071ce501 in pevm_EXECC ()
#31 0x00000000071c74ca in pfrinstr_EXECC ()
#32 0x00000000071c6721 in pfrrun_no_tool ()
#33 0x00000000071c51bd in pfrrun ()
#34 0x00000000071cb2ba in plsql_run ()
#35 0x00000000071c1f54 in peicnt ()
#36 0x00000000071c11fa in kkxexe ()
#37 0x00000000070dee11 in opiexe ()
#38 0x000000000154dbc1 in kpoal8 ()
#39 0x0000000006fc979a in opiodr ()
#40 0x000000000713cd4b in ttcpip ()
#41 0x0000000000e1630f in opitsk ()
#42 0x0000000000e18c50 in opiino ()
#43 0x0000000006fc979a in opiodr ()
#44 0x0000000000e12244 in opidrv ()
#45 0x00000000011b393a in sou2o ()
#46 0x0000000000920183 in opimai_real ()
#47 0x00000000011b7871 in ssthrdmain ()
#48 0x00000000009200af in main ()

Reading from bottom to top and using Metalink note 175982.1, the kol functions deal with “support for object Lob buffering , object lob evaluation and object Language/runtime functions for Opaque types” – sounds about right for my test case.

* I’m not saying that sending down a collection of 1000 blobs is necessarily a good thing but I’m just using it to compare good and bad.

** This is just a test case on a dev box. Note what Tanel has to say particularly about using pstack on Linux on production.

5 Responses to Not waiting and not on SQL*Net Message from client

  1. coskan says:

    Hi Dominic,

    I just wonder if this was 32bit or 64 bit machine because I could not manage to find 64 bit pstack for linux maybe you have the solution for me 🙂

  2. dombrooks says:

    Hi Coskan,

    64 bit RHEL5.1
    I’ve not got access to the box so I can’t verify where this pstack in the output above is pointing to.

    I would ask but I’m waiting on other information off them first.

    Cheers,
    Dominic

  3. dombrooks says:

    Incidentally, Oracle support have requested the following:

    With the problem query in progress please generate the traces below:

    1. Two system state dumps:

    sqlplus “/ as sysdba”
    alter session set max_dump_file_size=’UNLIMITED’;
    alter session set events ‘immediate trace name SYSTEMSTATE level 266’;
    wait 2 minutes
    alter session set events ‘immediate trace name SYSTEMSTATE level 266’;
    exit

    2. Three error stack traces:

    SQL>sqlplus “/as sysdba”
    oradebug setospid
    oradebug unlimit
    oradebug dump errorstack 3
    wait 1 minute
    oradebug dump errorstack 3
    wait 1 minute
    oradebug dump errorstack 3
    exit

    3. Five to ten minutes truss-like trace of the process using strace.

    But sounds like it will be tomorrow before I can have this information.

  4. Pingback: Call that support « OraStory

  5. dombrooks says:

    Oracle support “confirmed” as some sort of bug in 11.1.0.6 and some 10.2, allegedly fixed (from their empirical observations) in 11.2.

Leave a comment