Temp Usage

Some observations on a system’s usage of TEMP.
There’s no conclusion or revelation here – just some notes.

The graph below shows the average read times against temp table space from the one hour snap intervals captured from a week’s worth of statspack reports (obviously system-wide averages showing a system-wide trends).

To summarise, a variation in average read time of between 0ms and 350 ms with a typical high of 50 to 100 ms seen around midnight to 3am.

In other words, a significant variation over time.

These measurements seem to be a fairly representative of this system (although the big peak is a bit of an outlier).

I’ve got a rule of thumb figure in my head for general IO average read times of 4 to 5 ms for a middle of the road hybrid system before it’s worthy of further mention. That figure might be outdated but so’s this system – it’s 9i and the whole stack is similarly aged if not older.

I’ve seen systems with IO problems before. I can’t remember ever observing such variations against TEMP. I’m pretty sure it’s not normal but I don’t have a vast array of notes or figures from systems past.

You don’t see the same trends in a graph of other IO stats against the TEMP tablespace nor against the other tablespaces.

Most of the time these peaks coincide with batch work (but not single session batch work – concurrent processes chugging away on various bits of batch work).

The system is operating with automatic workarea sizing and, to my knowledge, none of the batch work overrides the work area sizing manually. But this is definitely a thing to look at.

So, further investigation is required into this temp space usage – primarily hash joins, sorting, parallel operations and temporary table usage (not forgetting implicit temporary tables via implicit and explicit (i.e. unhinted/hinted) materialized subqueries).

V$PGA_TARGET_ADVICE_HISTOGRAM suggests that a 20% increase in PGA_AGGREGATE_TARGET would make a decent difference to overall MULTIPASS operations but the few SQL statements I’ve looked at individually really need manual overrides, a 20% increase in the aggregate target wouldn’t make much difference to them.

On the subject of hash joins, I’ve never paid much attention to a 10104 trace file before but when combined with the old 10046 trace (and even a 10032 as well maybe for any sorting), then you get a really informative picture of what’s going on with a SQL statement. Really very good.

The sort of thing that you see when the hash join starts spilling to disk (following “kxhfWrite: hash-join is spilling to disk”) is this kind of variation in the direct path write times to the temp tablespace (trace file taken from non-prod environment):

WAIT #4: nam='direct path write' ela= 2811 p1=201 p2=14089 p3=127
WAIT #4: nam='direct path write' ela= 4730 p1=201 p2=14216 p3=127
WAIT #4: nam='direct path write' ela= 9 p1=201 p2=14343 p3=127
WAIT #4: nam='direct path write' ela= 7920 p1=201 p2=14470 p3=127
WAIT #4: nam='direct path write' ela= 2639 p1=201 p2=14597 p3=127
WAIT #4: nam='direct path write' ela= 9 p1=201 p2=14724 p3=127
WAIT #4: nam='direct path write' ela= 7669 p1=201 p2=14851 p3=127
WAIT #4: nam='direct path write' ela= 22 p1=201 p2=14978 p3=127
WAIT #4: nam='direct path write' ela= 9 p1=201 p2=15105 p3=127
WAIT #4: nam='direct path write' ela= 9310 p1=201 p2=15232 p3=127
WAIT #4: nam='direct path write' ela= 15 p1=201 p2=2569 p3=127
WAIT #4: nam='direct path write' ela= 10512 p1=201 p2=2696 p3=127
WAIT #4: nam='direct path write' ela= 3496 p1=201 p2=2823 p3=127
WAIT #4: nam='direct path write' ela= 6402 p1=201 p2=2950 p3=127
WAIT #4: nam='direct path write' ela= 6681 p1=201 p2=3077 p3=127
WAIT #4: nam='direct path write' ela= 20 p1=201 p2=3204 p3=127
WAIT #4: nam='direct path write' ela= 10 p1=201 p2=3331 p3=127
WAIT #4: nam='direct path write' ela= 3804 p1=201 p2=3458 p3=127
WAIT #4: nam='direct path write' ela= 11 p1=201 p2=3585 p3=127
WAIT #4: nam='direct path write' ela= 6729 p1=201 p2=3712 p3=127
WAIT #4: nam='direct path write' ela= 7817 p1=201 p2=37256 p3=127
WAIT #4: nam='direct path write' ela= 2498 p1=201 p2=37383 p3=127
WAIT #4: nam='direct path write' ela= 4227 p1=201 p2=37510 p3=127
WAIT #4: nam='direct path write' ela= 21 p1=201 p2=37637 p3=127
WAIT #4: nam='direct path write' ela= 1 p1=201 p2=37764 p3=127
WAIT #4: nam='direct path write' ela= 7 p1=201 p2=37891 p3=127
WAIT #4: nam='direct path write' ela= 9087 p1=201 p2=38018 p3=127
WAIT #4: nam='direct path write' ela= 10798 p1=201 p2=38145 p3=127
WAIT #4: nam='direct path write' ela= 19696 p1=201 p2=38272 p3=127
WAIT #4: nam='direct path write' ela= 16614 p1=201 p2=33289 p3=127

The other thing that’s interesting in a couple of the 10104 traces is a big imbalance in the distribution of data in the hash table. For example:

Partition:0    rows:1057762    clusters:271    slots:141    kept=0
Partition:1    rows:137        clusters:1      slots:1      kept=0
Partition:2    rows:195        clusters:1      slots:1      kept=0
Partition:3    rows:1522       clusters:1      slots:1      kept=0
Partition:4    rows:6789       clusters:2      slots:2      kept=0
Partition:5    rows:408        clusters:1      slots:1      kept=0
Partition:6    rows:461        clusters:1      slots:1      kept=0
Partition:7    rows:180        clusters:1      slots:1      kept=0

And also this illustration of some serious skew in the buckets:

### Hash table overall statistics ###
Total buckets: 2097152 Empty buckets: 2097117 Non-empty buckets: 35
Total number of rows: 562177
Maximum number of rows in a bucket: 560131
About these ads

3 Responses to Temp Usage

  1. Hello Dominic,

    These measurements seem to be a fairly representative of this system (although the big peak is a bit of an outlier).

    If you take a look at the spike in the beginning of the graph, then you might think different. It was 28-AUG when average read time was 250 ms. And on 04-SEP it was 350 ms, which is in the same ballpark. And both peaks happens on Saturday nights – seems like a pattern to me. I think you should look closer at the previous and upcoming Saturdays on this system; and check jobs schedule.
    It will be interesting to see the root cause of your problem, keep us posted on it.

    • dombrooks says:

      Hi Timur,

      Thanks for commenting.

      I might have dismissed that too quickly as an outlier. I’ll keep an eye out on subsequent weekends.

      I’m not so worried about peaks at the weekend. It’s possibly not as significant to the business as the peaks (smaller) which occur in mid-week batches when *stuff* needs to get done in a limited window. But it might be easiest to investigate when the peak is biggest.

      This is probably not really causing the business a problem at the moment but it is almost certainly representative of the lack of predictability of the timings of the key batch processes and accompanying SLAs etc – cause / symptom / cause / symptom…

      I probably have to set this to one side really. It’s not born from a specific business-prioritised tuning targets. Just something that I happened to notice and then, following the white rabbit, I got into some trace files, etc.

      I wouldn’t be surprised if this became relevant in the not too distant future whilst looking at a next prioritised tuning target. Until then, chances are it won’t go anywhere.

      At the moment, it probably falls into the big bucket to stuff to look at if it’s still happening after an upgrade to 11g and a chance to sort out a load of settings and config that needs sorting (but without introducing all the change in one uncontrolled big bang).

      Cheers,
      Dominic

  2. Pingback: Did I know that? « OraStory

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 70 other followers

%d bloggers like this: