Block Corruption II

As mentioned before, we’ve had a block corruption that hadn’t yet caused any problems in production but was causing issues for our 11g upgrade rehearsals with the datapump export/import stage.

Because it wasn’t causing any problems in production, the approach to addressing it was somewhat relaxed!

But what this issue did demonstrate was big concerns over the amount of redo that this system generates in a day – 500 GB – and how that volume of redo restricts the practicalities and ability to respond for a physical recovery of the problem, particularly as we only thing we knew was the recovery had happened at some point since 1st January 2012.

Fortunately we were able to identify that:

  1. Only one block in one partition of one table was affected

And we could say with a high degree of confidence that:

  1. That two rows in this one block were affected by the corruption and
  2. We could get the values from these two row from one of the uat databases and that the data hadn’t changed since.

In other words, we could resolve this corruption with a manual, logical approach.

For example, cross-referencing a block dump to see how the block is corrupted, I could get all the rows in this one block that haven’t been affected using something like this:

SELECT DBMS_ROWID.ROWID_ROW_NUMBER(rowid)
,      t.*
FROM   trade PARTITION (p_01) t
WHERE  rowid >= DBMS_ROWID.ROWID_CREATE (1,98381,24,1093595,0)
AND    rowid <  DBMS_ROWID.ROWID_CREATE (1,98381,24,1093596,0)
AND    rowid NOT IN ('AAAYBNAAYAAEK/bAAR','AAAYBNAAYAAEK/bAAT');

So, the approach for this manual fix was along the lines of:
1. Create an empty copy of the affected table.

CREATE TABLE repaired_table
TABLESPACE ....
AS
SELECT *
FROM   TRADE
WHERE  1 = 2;

2. Copy all the data apart from this one block into the new table

INSERT
INTO   repaired_table
SELECT * 
FROM   trade PARTITION (P_01) 
WHERE  rowid < DBMS_ROWID.ROWID_CREATE (1,98381,24,1093595,0)
UNION ALL
SELECT *
FROM   trade PARTITION (P_01)
WHERE  rowid >= DBMS_ROWID.ROWID_CREATE (1,98381,24,1093596,0);

3. Insert rows from block that were ok (double checked via block dump)

INSERT 
INTO   TRADE
SELECT t.*
FROM   trade PARTITION (p_01) t
WHERE  rowid >= DBMS_ROWID.ROWID_CREATE (1,98381,24,1093595,0)
AND    rowid <  DBMS_ROWID.ROWID_CREATE (1,98381,24,1093596,0)
AND    rowid NOT IN ('AAAYBNAAYAAEK/bAAR','AAAYBNAAYAAEK/bAAT');

4. Insert good versions of the corrupted rows extracted from an older UAT database.
5. Create mirror of local indexes on copy table

6. Partition exchange without validation including indexes

7. Rebuild global indexes on original table

8. Do stuff with stats if you needed to (we recovered stats from a stats backup table)

At this point, table REPAIRED_TRADE now has the corrupt block allocated.

Even if we drop the table, DBVERIFY still reports the corrupt block as it has not been formatted.

This is normal as the block will not be formatted until it is taken off the freelist and used by a new segment.

For completeness, we wanted to get this block formatted.

Because we did not drop table REPAIRED_TRADE we know where this block is.

So:

1. TRUNCATE TABLE repaired_trade REUSE STORAGE;

2. Drop the indexes on this table that we wanted for ease of partition exchange

3. Create trigger to tell us when we format the block in question:

CREATE OR REPLACE TRIGGER corrupt_trigger 
AFTER INSERT ON repaired_trade 
REFERENCING OLD AS p_old NEW AS new_p 
FOR EACH ROW 
DECLARE 
  e_corrupt EXCEPTION; 
BEGIN 
  IF  (DBMS_ROWID.ROWID_BLOCK_NUMBER(:new_p.rowid)=1093595)
  AND (DBMS_ROWID.ROWID_RELATIVE_FNO(:new_p.rowid)=24) 
  THEN 
      RAISE e_corrupt; 
  END IF; 
EXCEPTION 
  WHEN e_corrupt THEN 
     RAISE_APPLICATION_ERROR(-20000, 'Corrupt block has been formatted'); 
END; 
/

4. Reinsert data into this table from original TRADE table

INSERT
INTO   repaired_trade
SELECT * FROM trade;

5. Repeat 4 if necessary, i.e. until we get the exception then

6. DROP TABLE repaired_trade;

At this point, DBVERIFY will no longer report that there is a corrupt block.

About these ads

One Response to Block Corruption II

  1. Ora600Tom says:

    Thanks for sharing. Nice one.

    Thomas

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 68 other followers

%d bloggers like this: