February 25, 2015 4 Comments
Today I helped resolve a problem quickly but to the detriment of my understanding of exactly what was going on and why.
And as part of the quicker resolution, I had to destroy the evidence which would have helped understand better.
So… now need to go back and figure it out if I have time to model it properly, etc.
Here’s what little I know so far.
What happened was that there were a number of application sessions experiencing slowness and contention when executing a recursive stats history cleanup statement.
Verified via ASH that this recursive delete was somehow being called by app ETL code (TOP_LEVEL_SQL_ID, PLSQL_ENTRY_OBJECT_ID & USER_ID columns), four of them each running slow statement and also because of that nature of the statement below and being blocked by mode 6 TX locks from the leading execution.
Version is 126.96.36.199
Statement was sql id 9v9n97qj8z1dg:
delete /*+ dynamic_sampling(4) */ from sys.wri$_optstat_histhead_history where savtime < :1 and rownum <= NVL(:2, rownum);
First up, resolution was quick and easy according to, by a quirk of coincidence, my last post:
We just got rid of all the statistics history using the magic PURGE_ALL truncate flag.
The history is of limited usefulness day-to-day anyway.
The slowness was holding up ETL jobs which were just trying to calculate stats for their own partition.
I was brought into the situation towards the end of the situation but here’s an illustration of the slowness:
SQL Text ------------------------------ delete /*+ dynamic_sampling(4) */ from sys.wri$_optstat_histhead_history where savtime < :1 and rownum <= NVL(:2, rownum) Global Information ------------------------------ Status : EXECUTING Instance ID : 1 SQL ID : 9v9n97qj8z1dg SQL Execution ID : 16777249 Execution Started : 02/24/2015 19:11:25 First Refresh Time : 02/24/2015 19:13:06 Last Refresh Time : 02/25/2015 09:10:35 Duration : 50351s Module/Action : JDBC Thin Client/- Program : JDBC Thin Client Binds ======================================================================================================================== | Name | Position | Type | Value | ======================================================================================================================== | :2 | 2 | NUMBER | 10000 | ======================================================================================================================== Global Stats =============================================================================================== | Elapsed | Cpu | IO | Application | Concurrency | Other | Buffer | Read | Read | | Time(s) | Time(s) | Waits(s) | Waits(s) | Waits(s) | Waits(s) | Gets | Reqs | Bytes | =============================================================================================== | 50359 | 35199 | 0.16 | 14669 | 254 | 237 | 1G | 63 | 504KB | =============================================================================================== SQL Plan Monitoring Details (Plan Hash Value=2348801730) ======================================================================================================================================================================================================= | Id | Operation | Name | Rows | Cost | Time | Start | Execs | Rows | Read | Read | Activity | Activity Detail | | | | | (Estim) | | Active(s) | Active | | (Actual) | Reqs | Bytes | (%) | (# samples) | ======================================================================================================================================================================================================= | 0 | DELETE STATEMENT | | | | 50262 | +101 | 1423 | 1 | | | | | | 1 | DELETE | WRI$_OPTSTAT_HISTHEAD_HISTORY | | | 50361 | +2 | 1423 | 1 | | | 29.74 | enq: TX - row lock contention (14630) | | | | | | | | | | | | | | buffer busy waits (231) | | | | | | | | | | | | | | Cpu (82) | | | | | | | | | | | | | | resmgr:cpu quantum (5) | | 2 | COUNT | | | | 50262 | +101 | 1423 | 14M | | | 0.00 | Cpu (1) | | 3 | FILTER | | | | 50266 | +97 | 1423 | 14M | | | 30.14 | Cpu (15146) | | 4 | TABLE ACCESS BY INDEX ROWID | WRI$_OPTSTAT_HISTHEAD_HISTORY | 6M | 23218 | 50253 | +100 | 1423 | 4G | 22 | 176KB | 23.17 | buffer busy waits (18) | | | | | | | | | | | | | | Cpu (11627) | | -> 5 | INDEX RANGE SCAN | I_WRI$_OPTSTAT_HH_ST | 990K | 5827 | 50264 | +101 | 1423 | 4G | 41 | 328KB | 16.94 | buffer busy waits (3) | | | | | | | | | | | | | | Cpu (8332) | | | | | | | | | | | | | | resmgr:cpu quantum (180) | ======================================================================================================================================================================================================= 3 - filter(ROWNUM<=NVL(:2,ROWNUM)) 5 - access("WRI$_OPTSTAT_HISTHEAD_HISTORY"."SYS_NC00024$"<SYS_EXTRACT_UTC(:1))
So, first thought was that:
1. I thought that MMON was responsible for purging old stats… clearly not in this case as there were multiple concurrent application connections purging old data as part of their ETL process.
2. The DELETE is deleting any old history older than a parameterised date, the first 10000 rows thereof. There is no connection to the object on whose statistics the application ETL jobs were working on.
3. I would expect a COUNT STOPKEY operation but then the NVL predicate avoids this optimisation.
4. UPDATE: The real time sql monitoring output is also reporting the metrics for 1423 executions of this statement. The clue is in the 14M rows reported in the COUNT + FILTER operations. 1423 * 10000 rows = 14M. But I requested the output for what I thought was a single execution id – 16777249 – strange… bulk/batched delete ?
More questions than answers…