Block Corruption II
March 5, 2012 1 Comment
As mentioned before, we’ve had a block corruption that hadn’t yet caused any problems in production but was causing issues for our 11g upgrade rehearsals with the datapump export/import stage.
Because it wasn’t causing any problems in production, the approach to addressing it was somewhat relaxed!
But what this issue did demonstrate was big concerns over the amount of redo that this system generates in a day – 500 GB – and how that volume of redo restricts the practicalities and ability to respond for a physical recovery of the problem, particularly as we only thing we knew was the recovery had happened at some point since 1st January 2012.
Fortunately we were able to identify that:
- Only one block in one partition of one table was affected
And we could say with a high degree of confidence that:
- That two rows in this one block were affected by the corruption and
- We could get the values from these two row from one of the uat databases and that the data hadn’t changed since.
In other words, we could resolve this corruption with a manual, logical approach.
For example, cross-referencing a block dump to see how the block is corrupted, I could get all the rows in this one block that haven’t been affected using something like this:
SELECT DBMS_ROWID.ROWID_ROW_NUMBER(rowid) , t.* FROM trade PARTITION (p_01) t WHERE rowid >= DBMS_ROWID.ROWID_CREATE (1,98381,24,1093595,0) AND rowid < DBMS_ROWID.ROWID_CREATE (1,98381,24,1093596,0) AND rowid NOT IN ('AAAYBNAAYAAEK/bAAR','AAAYBNAAYAAEK/bAAT');
So, the approach for this manual fix was along the lines of:
1. Create an empty copy of the affected table.
CREATE TABLE repaired_table TABLESPACE .... AS SELECT * FROM TRADE WHERE 1 = 2;
2. Copy all the data apart from this one block into the new table
INSERT INTO repaired_table SELECT * FROM trade PARTITION (P_01) WHERE rowid < DBMS_ROWID.ROWID_CREATE (1,98381,24,1093595,0) UNION ALL SELECT * FROM trade PARTITION (P_01) WHERE rowid >= DBMS_ROWID.ROWID_CREATE (1,98381,24,1093596,0);
3. Insert rows from block that were ok (double checked via block dump)
INSERT INTO TRADE SELECT t.* FROM trade PARTITION (p_01) t WHERE rowid >= DBMS_ROWID.ROWID_CREATE (1,98381,24,1093595,0) AND rowid < DBMS_ROWID.ROWID_CREATE (1,98381,24,1093596,0) AND rowid NOT IN ('AAAYBNAAYAAEK/bAAR','AAAYBNAAYAAEK/bAAT');
4. Insert good versions of the corrupted rows extracted from an older UAT database.
5. Create mirror of local indexes on copy table
6. Partition exchange without validation including indexes
7. Rebuild global indexes on original table
8. Do stuff with stats if you needed to (we recovered stats from a stats backup table)
At this point, table REPAIRED_TRADE now has the corrupt block allocated.
Even if we drop the table, DBVERIFY still reports the corrupt block as it has not been formatted.
This is normal as the block will not be formatted until it is taken off the freelist and used by a new segment.
For completeness, we wanted to get this block formatted.
Because we did not drop table REPAIRED_TRADE we know where this block is.
So:
1. TRUNCATE TABLE repaired_trade REUSE STORAGE;
2. Drop the indexes on this table that we wanted for ease of partition exchange
3. Create trigger to tell us when we format the block in question:
CREATE OR REPLACE TRIGGER corrupt_trigger AFTER INSERT ON repaired_trade REFERENCING OLD AS p_old NEW AS new_p FOR EACH ROW DECLARE e_corrupt EXCEPTION; BEGIN IF (DBMS_ROWID.ROWID_BLOCK_NUMBER(:new_p.rowid)=1093595) AND (DBMS_ROWID.ROWID_RELATIVE_FNO(:new_p.rowid)=24) THEN RAISE e_corrupt; END IF; EXCEPTION WHEN e_corrupt THEN RAISE_APPLICATION_ERROR(-20000, 'Corrupt block has been formatted'); END; /
4. Reinsert data into this table from original TRADE table
INSERT INTO repaired_trade SELECT * FROM trade;
5. Repeat 4 if necessary, i.e. until we get the exception then
6. DROP TABLE repaired_trade;
At this point, DBVERIFY will no longer report that there is a corrupt block.
Thanks for sharing. Nice one.
Thomas