Some observations on a system’s usage of TEMP.

There’s no conclusion or revelation here – just some notes.

The graph below shows the average read times against temp table space from the one hour snap intervals captured from a week’s worth of statspack reports (obviously system-wide averages showing a system-wide trends).

To summarise, a variation in average read time of between 0ms and 350 ms with a typical high of 50 to 100 ms seen around midnight to 3am.

In other words, a significant variation over time.

These measurements seem to be a fairly representative of this system (although the big peak is a bit of an outlier).

I’ve got a rule of thumb figure in my head for general IO average read times of 4 to 5 ms for a middle of the road hybrid system before it’s worthy of further mention. That figure might be outdated but so’s this system – it’s 9i and the whole stack is similarly aged if not older.

I’ve seen systems with IO problems before. I can’t remember ever observing such variations against TEMP. I’m pretty sure it’s not normal but I don’t have a vast array of notes or figures from systems past.

You don’t see the same trends in a graph of other IO stats against the TEMP tablespace nor against the other tablespaces.

Most of the time these peaks coincide with batch work (but not single session batch work – concurrent processes chugging away on various bits of batch work).

The system is operating with automatic workarea sizing and, to my knowledge, none of the batch work overrides the work area sizing manually. But this is definitely a thing to look at.

So, further investigation is required into this temp space usage – primarily hash joins, sorting, parallel operations and temporary table usage (not forgetting implicit temporary tables via implicit and explicit (i.e. unhinted/hinted) materialized subqueries).

V$PGA_TARGET_ADVICE_HISTOGRAM suggests that a 20% increase in PGA_AGGREGATE_TARGET would make a decent difference to overall MULTIPASS operations but the few SQL statements I’ve looked at individually really need manual overrides, a 20% increase in the aggregate target wouldn’t make much difference to them.

On the subject of hash joins, I’ve never paid much attention to a 10104 trace file before but when combined with the old 10046 trace (and even a 10032 as well maybe for any sorting), then you get a really informative picture of what’s going on with a SQL statement. Really very good.

The sort of thing that you see when the hash join starts spilling to disk (following “kxhfWrite: hash-join is spilling to disk”) is this kind of variation in the direct path write times to the temp tablespace (trace file taken from non-prod environment):

WAIT #4: nam='direct path write' ela= 2811 p1=201 p2=14089 p3=127
WAIT #4: nam='direct path write' ela= 4730 p1=201 p2=14216 p3=127
WAIT #4: nam='direct path write' ela= 9 p1=201 p2=14343 p3=127
WAIT #4: nam='direct path write' ela= 7920 p1=201 p2=14470 p3=127
WAIT #4: nam='direct path write' ela= 2639 p1=201 p2=14597 p3=127
WAIT #4: nam='direct path write' ela= 9 p1=201 p2=14724 p3=127
WAIT #4: nam='direct path write' ela= 7669 p1=201 p2=14851 p3=127
WAIT #4: nam='direct path write' ela= 22 p1=201 p2=14978 p3=127
WAIT #4: nam='direct path write' ela= 9 p1=201 p2=15105 p3=127
WAIT #4: nam='direct path write' ela= 9310 p1=201 p2=15232 p3=127
WAIT #4: nam='direct path write' ela= 15 p1=201 p2=2569 p3=127
WAIT #4: nam='direct path write' ela= 10512 p1=201 p2=2696 p3=127
WAIT #4: nam='direct path write' ela= 3496 p1=201 p2=2823 p3=127
WAIT #4: nam='direct path write' ela= 6402 p1=201 p2=2950 p3=127
WAIT #4: nam='direct path write' ela= 6681 p1=201 p2=3077 p3=127
WAIT #4: nam='direct path write' ela= 20 p1=201 p2=3204 p3=127
WAIT #4: nam='direct path write' ela= 10 p1=201 p2=3331 p3=127
WAIT #4: nam='direct path write' ela= 3804 p1=201 p2=3458 p3=127
WAIT #4: nam='direct path write' ela= 11 p1=201 p2=3585 p3=127
WAIT #4: nam='direct path write' ela= 6729 p1=201 p2=3712 p3=127
WAIT #4: nam='direct path write' ela= 7817 p1=201 p2=37256 p3=127
WAIT #4: nam='direct path write' ela= 2498 p1=201 p2=37383 p3=127
WAIT #4: nam='direct path write' ela= 4227 p1=201 p2=37510 p3=127
WAIT #4: nam='direct path write' ela= 21 p1=201 p2=37637 p3=127
WAIT #4: nam='direct path write' ela= 1 p1=201 p2=37764 p3=127
WAIT #4: nam='direct path write' ela= 7 p1=201 p2=37891 p3=127
WAIT #4: nam='direct path write' ela= 9087 p1=201 p2=38018 p3=127
WAIT #4: nam='direct path write' ela= 10798 p1=201 p2=38145 p3=127
WAIT #4: nam='direct path write' ela= 19696 p1=201 p2=38272 p3=127
WAIT #4: nam='direct path write' ela= 16614 p1=201 p2=33289 p3=127

The other thing that’s interesting in a couple of the 10104 traces is a big imbalance in the distribution of data in the hash table. For example:

Partition:0 rows:1057762 clusters:271 slots:141 kept=0
Partition:1 rows:137 clusters:1 slots:1 kept=0
Partition:2 rows:195 clusters:1 slots:1 kept=0
Partition:3 rows:1522 clusters:1 slots:1 kept=0
Partition:4 rows:6789 clusters:2 slots:2 kept=0
Partition:5 rows:408 clusters:1 slots:1 kept=0
Partition:6 rows:461 clusters:1 slots:1 kept=0
Partition:7 rows:180 clusters:1 slots:1 kept=0

And also this illustration of some serious skew in the buckets:

### Hash table overall statistics ###
Total buckets: 2097152 Empty buckets: 2097117 Non-empty buckets: 35
Total number of rows: 562177
Maximum number of rows in a bucket: 560131

## Recent Comments