Copy_table_stats

An observation on DBMS_STATS.COPY_TABLE_STATS in 11.2.0.3

Prior to 10.2.0.5, 11.1.0.7 (I think) and 11.2, COPY_TABLE_STATS copied the high/low value of the partition key from the source to the destination rather than use the implicit boundaries of that partition. Doug mentions it in his series on statistics on partitioned objects.

Looking at it in 11.2.0.3, this issue is still addressed in 11.2.0.3 but there’s a “but” if we copy at the SUBPARTITION level:

Setup:

SQL> create table t1
  2  (dt    DATE
  3  ,col1  VARCHAR2(10)
  4  ,stuff VARCHAR2(100))
  5  PARTITION BY RANGE (DT)
  6  SUBPARTITION BY LIST (COL1)
  7  SUBPARTITION TEMPLATE
  8  (SUBPARTITION A VALUES ('A'),
  9   SUBPARTITION B VALUES ('B'),
 10   SUBPARTITION C VALUES ('C'))
 11  (PARTITION T1_WK1 VALUES LESS THAN (TO_DATE('07-01-2012','DD-MM-YYYY')),
 12   PARTITION T1_WK2 VALUES LESS THAN (TO_DATE('14-01-2012','DD-MM-YYYY')));

Table created.

SQL> insert into t1
  2  values (TO_DATE('01-01-2012','DD-MM-YYYY'),'A',RPAD('X',100,'X'));

1 row created.

SQL> insert into t1
  2  values (TO_DATE('02-01-2012','DD-MM-YYYY'),'B',RPAD('X',100,'X'));

1 row created.

SQL> insert into t1
  2  values (TO_DATE('03-01-2012','DD-MM-YYYY'),'C',RPAD('X',100,'X'));

1 row created.

SQL> exec dbms_stats.gather_table_stats(USER,'T1','T1_WK1');

PL/SQL procedure successfully completed.

SQL> 

So, in one partition, we have 1 row per subpartition. The other partition is empty.

And we’ve gathered stats on one partition.
(Why just one partition? It’s not really important but I’ll touch on it at the end).

You should see the following statistics:

SQL> select global_stats
  2  ,      column_name
  3  ,      case
  4              when column_name = 'DT'   then display_raw(low_value,'DATE')
  5              when column_name = 'COL1' then display_raw(low_value,'VARCHAR2')
  6         end lo
  7  ,      case
  8              when column_name = 'DT'   then display_raw(high_value,'DATE')
  9              when column_name = 'COL1' then display_raw(high_value,'VARCHAR2')
 10         end hi
 11  from   user_tab_col_statistics
 12  where  table_name = 'T1'
 13  and    column_name IN ('DT','COL1');

GLO COLUMN_NAME          LO         HI
--- -------------------- ---------- ----------
YES DT                   01-JAN-12  03-JAN-12
YES COL1                 A          C

SQL> select global_stats
  2  ,      partition_name
  3  ,      column_name
  4  ,      case
  5              when column_name = 'DT'   then display_raw(low_value,'DATE')
  6              when column_name = 'COL1' then display_raw(low_value,'VARCHAR2')
  7         end lo
  8  ,      case
  9              when column_name = 'DT'   then display_raw(high_value,'DATE')
 10              when column_name = 'COL1' then display_raw(high_value,'VARCHAR2')
 11         end hi
 12  from   user_part_col_statistics
 13  where  table_name  = 'T1'
 14  and    column_name IN ('DT','COL1')
 15  order by partition_name, column_name;

GLO PARTITION_NAME       COLUMN_NAME          LO         HI
--- -------------------- -------------------- ---------- ----------
YES T1_WK1               COL1                 A          C
YES T1_WK1               DT                   01-JAN-12  03-JAN-12
NO  T1_WK2               COL1
NO  T1_WK2               DT

SQL> select subpartition_name
  2  ,      column_name
  3  ,      case
  4              when column_name = 'DT'   then display_raw(low_value,'DATE')
  5              when column_name = 'COL1' then display_raw(low_value,'VARCHAR2')
  6         end lo
  7  ,      case
  8              when column_name = 'DT'   then display_raw(high_value,'DATE')
  9              when column_name = 'COL1' then display_raw(high_value,'VARCHAR2')
 10         end hi
 11  from   user_subpart_col_statistics
 12  where  table_name  = 'T1'
 13  and    column_name IN ('DT','COL1')
 14  order by subpartition_name, column_name;

SUBPARTITION_NAME    COLUMN_NAME          LO         HI
-------------------- -------------------- ---------- ----------
T1_WK1_A             COL1                 A          A
T1_WK1_A             DT                   01-JAN-12  01-JAN-12
T1_WK1_B             COL1                 B          B
T1_WK1_B             DT                   02-JAN-12  02-JAN-12
T1_WK1_C             COL1                 C          C
T1_WK1_C             DT                   03-JAN-12  03-JAN-12
T1_WK2_A             COL1
T1_WK2_A             DT
T1_WK2_B             COL1
T1_WK2_B             DT
T1_WK2_C             COL1
T1_WK2_C             DT

12 rows selected.

SQL> 

Hopefully no surprises there.

BTW: display_raw

Now, if we use COPY_TABLE_STATS at a partition level:

SQL> exec dbms_stats.copy_table_stats(USER,'T1','T1_WK1','T1_WK2');

PL/SQL procedure successfully completed.

SQL>

the table level stats are unaffected as you might expect:

SQL> select global_stats
  2  ,      column_name
  3  ,      case
  4              when column_name = 'DT'   then display_raw(low_value,'DATE')
  5              when column_name = 'COL1' then display_raw(low_value,'VARCHAR2')
  6         end lo
  7  ,      case
  8              when column_name = 'DT'   then display_raw(high_value,'DATE')
  9              when column_name = 'COL1' then display_raw(high_value,'VARCHAR2')
 10         end hi
 11  from   user_tab_col_statistics
 12  where  table_name = 'T1'
 13  and    column_name IN ('DT','COL1');

GLO COLUMN_NAME          LO         HI
--- -------------------- ---------- ----------
YES DT                   01-JAN-12  03-JAN-12
YES COL1                 A          C

SQL> 

and, most importantly, the partition key high/low values are no longer just copied:

SQL> select global_stats
  2  ,      partition_name
  3  ,      column_name
  4  ,      case
  5              when column_name = 'DT'   then display_raw(low_value,'DATE')
  6              when column_name = 'COL1' then display_raw(low_value,'VARCHAR2')
  7         end lo
  8  ,      case
  9              when column_name = 'DT'   then display_raw(high_value,'DATE')
 10              when column_name = 'COL1' then display_raw(high_value,'VARCHAR2')
 11         end hi
 12  from   user_part_col_statistics
 13  where  table_name  = 'T1'
 14  and    column_name IN ('DT','COL1')
 15  order by partition_name, column_name;

GLO PARTITION_NAME       COLUMN_NAME          LO         HI
--- -------------------- -------------------- ---------- ----------
YES T1_WK1               COL1                 A          C
YES T1_WK1               DT                   01-JAN-12  03-JAN-12
YES T1_WK2               COL1                 A          C
YES T1_WK2               DT                   07-JAN-12  14-JAN-12

The statistics above show that the high/low of the destination partition match the partition boundaries which nearly makes sense but note the high value is the LESS THAN limit of the partition.
(Robin Moffat also mentioned this in a post)

Now copying the statistics at a PARTITION level, does nothing to the SUBPARTITION statistics, never has done.
That’s not changed nor was it expected to:

SQL>  select subpartition_name
  2   ,      column_name
  3   ,      case
  4               when column_name = 'DT'   then display_raw(low_value,'DATE')
  5               when column_name = 'COL1' then display_raw(low_value,'VARCHAR2')
  6          end lo
  7   ,      case
  8               when column_name = 'DT'   then display_raw(high_value,'DATE')
  9               when column_name = 'COL1' then display_raw(high_value,'VARCHAR2')
 10          end hi
 11   from   user_subpart_col_statistics
 12   where  table_name  = 'T1'
 13   and    column_name IN ('DT','COL1')
 14   order by subpartition_name, column_name;

SUBPARTITION_NAME    COLUMN_NAME          LO                   HI
-------------------- -------------------- -------------------- --------------------
T1_WK1_A             COL1                 A                    A
T1_WK1_A             DT                   01-JAN-12            01-JAN-12
T1_WK1_B             COL1                 B                    B
T1_WK1_B             DT                   02-JAN-12            02-JAN-12
T1_WK1_C             COL1                 C                    C
T1_WK1_C             DT                   03-JAN-12            03-JAN-12
T1_WK2_A             COL1
T1_WK2_A             DT
T1_WK2_B             COL1
T1_WK2_B             DT
T1_WK2_C             COL1
T1_WK2_C             DT

12 rows selected.

Perhaps then, if we value our subpartition statitics, we might think it’s a good idea to do our operations at a subpartition level instead.

What happens if we reset these statistics and do a COPY_TABLE_STATS at a subpartition level?

Reset:

SQL> exec dbms_stats.delete_table_stats(USER,'T1');

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(USER,'T1','T1_WK1');

PL/SQL procedure successfully completed.

And COPY_TABLE_STATS on the subpartitions:

SQL>  begin
  2    dbms_stats.copy_table_stats(USER,'T1','T1_WK1_A','T1_WK2_A');
  3    dbms_stats.copy_table_stats(USER,'T1','T1_WK1_B','T1_WK2_B');
  4    dbms_stats.copy_table_stats(USER,'T1','T1_WK1_C','T1_WK2_C');
  5   end;
  6  /

PL/SQL procedure successfully completed.

The table level stats are still unaffected.

But at the subpartition level, the RANGE PARTITION KEY (DT) is back to just being copied, not set to the boundary of the PARTITION KEY:

SQL>  select subpartition_name
  2   ,      column_name
  3   ,      case
  4               when column_name = 'DT'   then display_raw(low_value,'DATE')
  5               when column_name = 'COL1' then display_raw(low_value,'VARCHAR2')
  6          end lo
  7   ,      case
  8               when column_name = 'DT'   then display_raw(high_value,'DATE')
  9               when column_name = 'COL1' then display_raw(high_value,'VARCHAR2')
 10          end hi
 11   from   user_subpart_col_statistics
 12   where  table_name  = 'T1'
 13   and    column_name IN ('DT','COL1')
 14   order by subpartition_name, column_name;

SUBPARTITION_NAME    COLUMN_NAME          LO                   HI
-------------------- -------------------- -------------------- --------------------
T1_WK1_A             COL1                 A                    A
T1_WK1_A             DT                   01-JAN-12            01-JAN-12
T1_WK1_B             COL1                 B                    B
T1_WK1_B             DT                   02-JAN-12            02-JAN-12
T1_WK1_C             COL1                 C                    C
T1_WK1_C             DT                   03-JAN-12            03-JAN-12
T1_WK2_A             COL1                 A                    A
T1_WK2_A             DT                   01-JAN-12            01-JAN-12
T1_WK2_B             COL1                 B                    B
T1_WK2_B             DT                   02-JAN-12            02-JAN-12
T1_WK2_C             COL1                 C                    C
T1_WK2_C             DT                   03-JAN-12            03-JAN-12

12 rows selected.

And, at the partition level, any aggregation of these incorrect values depends on whether we already have GLOBAL STATS or not (hence the initial stats gathering on just the partition).

If no global stats then the error is propagated up from the subpartition:

SQL>  select global_stats
  2   ,      partition_name
  3   ,      column_name
  4   ,      case
  5               when column_name = 'DT'   then display_raw(low_value,'DATE')
  6               when column_name = 'COL1' then display_raw(low_value,'VARCHAR2')
  7          end lo
  8   ,      case
  9               when column_name = 'DT'   then display_raw(high_value,'DATE')
 10               when column_name = 'COL1' then display_raw(high_value,'VARCHAR2')
 11          end hi
 12   from   user_part_col_statistics
 13   where  table_name  = 'T1'
 14   and    column_name IN ('DT','COL1')
 15   order by partition_name, column_name;

GLO PARTITION_NAME       COLUMN_NAME          LO                   HI
--- -------------------- -------------------- -------------------- --------------------
YES T1_WK1               COL1                 A                    C
YES T1_WK1               DT                   01-JAN-12            03-JAN-12
NO  T1_WK2               COL1                 A                    C
NO  T1_WK2               DT                   01-JAN-12            03-JAN-12

Or if instead of gathering statistics at the partition level initially (and at reset), we do a table level gather, then
the PARTITION stats are not aggregated, i.e.

GLO PARTITION_NAME       COLUMN_NAME          LO         HI
--- -------------------- -------------------- ---------- ----------
YES T1_WK1               COL1                 A          C
YES T1_WK1               DT                   01-JAN-12  03-JAN-12
YES T1_WK2               COL1
YES T1_WK2               DT

So, in summary, I just wanted to show that whilst the issue with COPY_TABLE_STATS and HIGH/LOW has been addressed at the PARTITION level, there still seem to be issues for the PARTITION KEY at the SUBPARTITION LEVEL.

Partitioned table statistics are more complicated than you might expect.

Here are some of the excellent resources that can guide you through:

Any thoughts on stats validation?

There is an application running on 9.2.0.8 and this application’s approach to object statistics is that they are gathered along with each quarterly release of code and do not change outside of the release schedule.

There are a number of other additional issues with some elements of this particular implementation, discussion of which are outside the scope of this.

Now, as part of the upgrade to 11.2, it’s highly likely that we will move to the default built-in stat proc, albeit gathering in a weekend window not daily.

One of the biggest advantages of the current situation is that the stats are tested along with the release and released with the release (of course, plans can still change depite stats being fixed – even because they are fixed due to time ticking ever onwards, sequences increasing, etc).

And so, as part of a possibly non-negotiable comfort blanket to move to the default 11g stats job, there is a management requirement to have some sort of validation routine – i.e. a set of sql statements broadly representing important or problem areas of the application – that will be run before/after and compared on some sort of metric and judgement passed on whether the stats are “good”.

There are a number of nifty features like pending stats that we can use here but my initial thoughts on this was that this sounded like a fairly easy job for SQL Tuning Sets (STS) and the SQL Performance Analyzer (SPA).

SPA requires the additional Real Application Testing (RAT) license and, now that our licensing has been negotiated, this isn’t something we’ve got.

So… hand-rolling a validation routine.

I would see SQL Tuning Sets with the sql statements and bind sets as still part of a bespoke solution
(STS are available with the Tuning Pack license which is on top of EE both of which we have).

I could picture a simple approach that would execute these statements in parallel sessions probably via DBMS_SQL, fetch all the rows, record the session stats before/after, pick a metric e.g. buffer gets, elapsed time, or some sort of ratio between those sorts of things and then and do some sort of metric comparison – yes, that run was broadly the same or better, or at least not a catastrophic degradation – these stats are ok.

(Of course, being able to picture a possible approach does not mean that I advocate it)

But I just wondered whether this is something anyone else has done?

I’d be very interested in hearing more if you have.

Edit:
I’ve tried to reword this to avoid some confusion.
Alex – I’ve deleted your comments and my replies because a) they reduce the chance of getting the input the article was aiming for b) they were based on nested recursive misunderstandings and c) much of what you were saying is preaching to the choir.

Follow

Get every new post delivered to your Inbox.

Join 62 other followers