Not waiting and not on SQL*Net Message from client
July 9, 2010 5 Comments
I’m in the early stages of investigating a performance degradation on 188.8.131.52, just tracing the session at the moment and waiting for it to complete.
This is distilled from a POC for an application persisting generic messages off a message bus.
At the moment, there’s no requirement to query/inspect the messages themselves so they’re being passed down as a BLOB.
I’ve reproduced the problem with a standalone piece of PL/SQL which removes any distractions from the rest of the application tech stack. This will be useful to raise an SR if necessary.
Performance degradation for a collection of 1000 blobs goes from roughly 1-2 seconds on a 184.108.40.206 corporate DB or on 10.2.0.4 on my laptop to currently over 1 hour on 220.127.116.11 corporate DB.
Obviously there are a lot of variations between environments and versions that might account for reasonable differences in performance.
I’ve not yet started investigating any of these differences in settings, parameters, etc but there’s something very bad about this sort of degradation.
Current session wait says that I’ve been waiting on SQL*Net message from client for pretty much all this time. So, a good illustration that this is wait is not always an idle event.
I’ve been a bit dim on this one. I’m NOT waiting on SQL*Net message from client. But I am in a strange place somewhere.
I was being lazy, using Toad to monitor and not looking at the information properly – stupid, stupid, stupid (not that Toad doesn’t display the requisite information properly, just that when you look at things from different perspectives sometimes you miss something that you would have seen when looking at it in a different way).
My session is active. But my wait state is WAITED_KNOWN_TIME so that wait was the last known wait, WAIT_TIME is > 0, SECONDS_IN_WAIT is ticking up to 20000.
Session statistics show that session is doing stuff.
Need to look at this from OS upwards – oradebug, pstack, etc. Or get someone to do it because I don’t have access :(
End Edit 1
I’m currently gathering the evidence. It’s a bit slow because I don’t have full access to the 11g environment.
SQL to reproduce is below.
DROP TABLE sr_payload; DROP TYPE tt_sr_payload; DROP TYPE to_sr_payload; CREATE TABLE sr_payload (id RAW(16) NOT NULL ,lastupdate DATE NOT NULL ,payload BLOB NOT NULL) LOB (payload) STORE AS sr_payload_payload (NOCACHE); ALTER TABLE sr_payload ADD CONSTRAINT pk_sr_payload PRIMARY KEY (id); CREATE OR REPLACE TYPE to_sr_payload AS OBJECT (id RAW(16) ,lastupdate DATE ,payload BLOB); / CREATE OR REPLACE TYPE tt_sr_payload AS TABLE OF to_sr_payload; / CREATE OR REPLACE PACKAGE pkg_payload_sr AS -- PROCEDURE p_upsert_payload_mtc ( i_payload IN tt_sr_payload ); -- PROCEDURE p_upsert_payload_itc ( i_payload IN tt_sr_payload ); -- END; / CREATE OR REPLACE PACKAGE BODY pkg_payload_sr AS -- PROCEDURE p_upsert_payload_mtc ( i_payload IN tt_sr_payload ) AS BEGIN -- MERGE INTO sr_payload p USING (SELECT id id , lastupdate lastupdate , payload payload FROM TABLE ( CAST ( i_payload AS tt_sr_payload) ) )x ON (p.id = x.id) WHEN MATCHED THEN UPDATE SET payload = empty_blob() , lastupdate = SYSDATE WHEN NOT MATCHED THEN INSERT ( id , lastupdate , payload) VALUES ( x.id , x.lastupdate , x.payload); -- END p_upsert_payload_mtc; -- PROCEDURE p_upsert_payload_itc ( i_payload IN tt_sr_payload ) AS BEGIN -- INSERT INTO sr_payload ( id , lastupdate , payload) SELECT x.id id , x.lastupdate lastupdate , x.payload payload FROM TABLE ( CAST ( i_payload AS tt_sr_payload) ) x WHERE NOT EXISTS (SELECT 1 FROM sr_payload p WHERE p.id = x.id); -- END p_upsert_payload_itc; -- END pkg_payload_sr; / DECLARE l_start NUMBER; l_stop NUMBER; l_pl tt_sr_payload := tt_sr_payload(); BEGIN FOR i IN 1 .. 1000 LOOP l_pl.EXTEND(); l_pl(l_pl.LAST) := to_sr_payload(SYS_GUID,SYSDATE,TO_BLOB('12312312312')); END LOOP; dbms_output.put_line(' '); l_start := dbms_utility.get_time(); pkg_payload_sr.p_upsert_payload_itc(l_pl); l_stop := dbms_utility.get_time(); dbms_output.put_line('Start pl: '||l_start); dbms_output.put_line('Stop pl: '||l_stop); COMMIT; END; / DROP PACKAGE pkg_payload_sr; DROP TABLE sr_payload; DROP TYPE tt_sr_payload; drop type to_sr_payload;
Some results from ps and pstack** shows:
XXXXXXX$ ps -o user,pid,s,pcpu,time,etime,wchan,comm -p 27974 USER PID S %CPU TIME ELAPSED WCHAN COMMAND oracle 27974 R 99.2 05:36:56 05:39:35 - oracle XXXXXXXX$ pstack 27974 #0 0x000000000718ab58 in kolrgrfc () #1 0x00000000019fa69f in kolrrdl () #2 0x00000000019fa7ce in kolradc () #3 0x0000000000e0df3f in __PGOSF350_kokegPromoteLob () #4 0x0000000000e0decb in kokegApplyOp () #5 0x0000000000e0de8e in kokegOpnGarbagePromote () #6 0x0000000000e0dc14 in kokegGarbageCollectBinds () #7 0x000000000705edb3 in kxsiis () #8 0x00000000070d0074 in __PGOSF445_qerltcNoKdtBufferedInsRowCBK () #9 0x000000000151924b in qerltcLoadStateMachine () #10 0x000000000151b64f in __PGOSF455_qerltcInsertSelectRop () #11 0x0000000003668205 in qerstRowP () #12 0x0000000003668205 in qerstRowP () #13 0x00000000014cd178 in qerjoFetch () #14 0x00000000036683f6 in qerstFetch () #15 0x0000000006f60b39 in rwsfcd () #16 0x00000000036683f6 in qerstFetch () #17 0x00000000070ce5d2 in qerltcFetch () #18 0x00000000036683f6 in qerstFetch () #19 0x000000000702a1db in insexe () #20 0x00000000070dc89a in opiexe () #21 0x00000000070e4513 in opipls () #22 0x0000000006fc979a in opiodr () #23 0x00000000070671b9 in __PGOSF123_rpidrus () #24 0x0000000007189474 in skgmstack () #25 0x0000000007067525 in rpidru () #26 0x00000000070667c0 in rpiswu2 () #27 0x0000000007065de3 in rpidrv () #28 0x000000000704e8c3 in psddr0 () #29 0x000000000704e594 in psdnal () #30 0x00000000071ce501 in pevm_EXECC () #31 0x00000000071c74ca in pfrinstr_EXECC () #32 0x00000000071c6721 in pfrrun_no_tool () #33 0x00000000071c51bd in pfrrun () #34 0x00000000071cb2ba in plsql_run () #35 0x00000000071c1f54 in peicnt () #36 0x00000000071c11fa in kkxexe () #37 0x00000000070dee11 in opiexe () #38 0x000000000154dbc1 in kpoal8 () #39 0x0000000006fc979a in opiodr () #40 0x000000000713cd4b in ttcpip () #41 0x0000000000e1630f in opitsk () #42 0x0000000000e18c50 in opiino () #43 0x0000000006fc979a in opiodr () #44 0x0000000000e12244 in opidrv () #45 0x00000000011b393a in sou2o () #46 0x0000000000920183 in opimai_real () #47 0x00000000011b7871 in ssthrdmain () #48 0x00000000009200af in main ()
Reading from bottom to top and using Metalink note 175982.1, the kol functions deal with "support for object Lob buffering , object lob evaluation and object Language/runtime functions for Opaque types" - sounds about right for my test case.
* I'm not saying that sending down a collection of 1000 blobs is necessarily a good thing but I'm just using it to compare good and bad.
** This is just a test case on a dev box. Note what Tanel has to say particularly about using pstack on Linux on production.