Handling Block Corruptions in Oracle7 / 8 / 8i / 9i / 10g
Contents
Introduction
Overview of Steps to handle a Corruption
Corruption due to NOLOGGING or UNRECOVERABLE
(1) Determine the Extent of the Corruption Problem
(2) Replace or Move Away from Suspect Hardware
(3) Which Objects are Affected ?
Options for various Segment Types:
CACHE
CLUSTER
INDEX PARTITION
INDEX
LOBINDEX
LOBSEGMENT
ROLLBACK
TABLE PARTITION
TABLE
TEMPORARY
TYPE2 UNDO
Other Segment Types
No Segment
(4) Choosing a Recovery Option
(4A) Complete Recovery
Block Level Recovery ,
Datafile Recovery ,
Database Recovery ,
After Complete Recovery
(4B) Recreating Indexes
(4C) Salvaging Data from Tables
Methods of extracting data from a corrupt table AROUND a corrupt block
Methods of extracting data from a table with a corrupt LOBSEGMENT block
Extracting data from the corrupt block itself
(4D) Leaving the Corruption in Place
Warnings when Leaving a Corruption in Place
(4E) Last Options
Document History
All SQL statements here are for use in SQL*Plus (in 8.1 or higher)
or Server Manager (Oracle7 / 8.0) when connected as a SYSDBA user.
(Eg: "connect / as sysdba" or "connect internal"
Introduction
This article discusses how to handle one or more block corruptions
on an Oracle datafile and describes the main actions to take to deal
with them. Please read the complete article before taking any action.
This note does not cover memory corruption issues (typically
ORA-600 [17xxx] type errors).
Note: If the problem is an ORA-1578 on STARTUP then please
contact your local support center for advice referencing
Note 106638.1 - this note is not visible to customers
but the relevant steps from it can be supplied by an experienced
support analyst.
You may be referred to this article from many places for many forms of error - it is important that you have the following information for each corrupt block:
* An absolute FILE NUMBER of the file containing the corrupt block. Referred to as "&AFN" in this article.
* The file name of the file containing the corrupt block. Referred to as "&FILENAME" in this article. ( If you know the FILE NUMBER but not its name then V$DATAFILE can be used to get the file name: SELECT name FROM v$datafile WHERE file#=&AFN; If the file number does not appear in V$DATAFILE in Oracle8i AND &AFN is greater than the DB_FILES parameter value then it is probably a TEMPFILE. In this case the filename can be found using: SELECT name FROM v$tempfile WHERE file#=(&AFN - &DB_FILES_value); )
* The BLOCK NUMBER of the corrupt block in that file. Referred to as "&BL" in this article.
* The tablespace number and name containing the affected block. Referred to as "&TSN" (tablespace number) and "&TABLESPACE_NAME" in this article. If you do not know these then you can find them using: SELECT ts# "TSN" FROM v$datafile WHERE file#=&AFN; SELECT tablespace_name FROM dba_data_files WHERE file_id=&AFN;
* The block size of the tablespace where the corruption lies. Referred to as "&TS_BLOCK_SIZE" in this article. For Oracle 9i+, run the following query to determine the appropriate block size: SELECT block_size FROM dba_tablespaces WHERE tablespace_name = (SELECT tablespace_name FROM dba_data_files WHERE file_id=&AFN); For Oracle 7, 8.0 and 8.1: Every tablespace in the database has the same block size. For these versions, issue "SHOW PARAMETER DB_BLOCK_SIZE" and use this value as your &TS_BLOCK_SIZE.
Eg: For the ORA-1578 error:
ORA-01578: ORACLE data block corrupted (file # 7, block # 12698)
ORA-01110: data file 22: '/oracle1/oradata/V816/oradata/V816/users01.dbf'
then:
&AFNis "22" (from the ORA-1110 portion of the error)
&RFNis "7"(from the "file #" in the ORA-1578)
&BL is "12698"(from the "block #" in the ORA-1578)
&FILENAME is '/oracle1/oradata/V816/oradata/V816/users01.dbf'
&TSN etc.. should be determined from the above SQL
For other errors (ORA-600 , ORA-1498 etc...) the above values should
either be given to you by Oracle Support, or be given to you from the
article which covers the relevant error.
Overview of Steps to handle a Corruption
There are many possible causes of a block corruption including:
- Bad IO hardware / firmware
- OS problems
- Oracle problems
- Recovering through "UNRECOVERABLE" or "NOLOGGING" database actions
(in which case ORA-1578 is expected behaviour - see below)
The point in time when an Oracle error is raised may be much later than
when any corruption initially occurred.
As the root cause is not usually known at the time the corruption is
encountered, and as in most cases the key requirement is to get up
and running again, then the steps used tackle corruption problems in
this article are:
1) Determine the extent of the corruption problems
and also determine if the problems are permanent or transient.
If the problem is widespread or the errors move about
then focus on identifying the cause first (check hardware
etc..). This is important as there is no point recovering
a system if the underlying hardware is faulty.
2) Replace or move away from any faulty or suspect hardware.
3) Determine which database objects are affected.
4) Choose the most appropriate database recovery / data salvage
option.
For all steps above it is sensible to collect evidence and
document exactly what actions are being taken. The 'Evidence>>'
tags in this article list the information which should be collected
to assist with identifying the root cause of the problem.
Corruption due to NOLOGGING or UNRECOVERABLE
If a NOLOGGING (or UNRECOVERABLE) operation is performed on an
object and the datafile containing that object is subsequently
recovered then the data blocks affected by the NOLOGGING operation
are marked as corrupt and will signal an ORA-1578 error when
accessed.In Oracle8i an ORA-26040 is also signalled
("ORA-26040: Data block was loaded using the NOLOGGING option" )
which makes the cause fairly obvious, but earlier releases have no
additional error message. If a block is corrupt due to recovery
through a NOLOGGING operation then you can use this article from
Section 3 "Which Objects are Affected ?" onwards but note that:
(a) Recovery cannot retrieve the NOLOGGING data
(b) No data is salvagable from inside the block
(1) Determine the Extent of the Corruption Problem
Whenever a corruption error occurs note down the FULL error message/s
and look in the instance's alert log and trace files for any associated
errors. It is important to do this first to assess whether this is
a single block corruption, an error due to an UNRECOVERABLE operation
or a more severe issue.
It is a good idea to scan affected files (and any important files)
with DBVERIFY to check for other corruptions in order to determine
the extent of the problem.
For details of using DBVERIFY see Note 35512.1
Once you have determined a list of corrupt file/block combinations
then the steps below can be used to help determine what action
can be taken.
Evidence>>
- Record the original error in full, along with details of
the application which encountered the error.
- Save an extract from the alert log from a few hours before
the FIRST recorded problem up to the current point in time.
- Save any tracefiles mentioned in the alert log.
- Record any recent OS problems you have encountered.
- Note if you are using any special features - Eg: ASYNC IO,
fast write disk options etc..
- Record your current BACKUP position (Dates, Type etc...)
- Note if your database is in ARCHIVELOG mode or not
Eg: Issue "ARCHIVE LOG LIST" in SQL*Plus (or Server Manager)
(2) Replace or Move Away from Suspect Hardware
The vast majority of corruption problems are caused by faulty hardware.
If there is a hardware fault or a suspect component then it is sensible
to either repair the problem, or make disk space available on a
separate disk sub-system prior to proceeding with a recovery option.
You can move datafiles about using the following steps:
1. Make sure the file to be relocated is either OFFLINE or
the instance is in the MOUNT state (not open)
2. Physically restore (or copy) the datafile to its new location
eg: /newlocation/myfile.dbf
3. Tell Oracle the new location of the file.
eg: ALTER DATABASE RENAME FILE '/oldlocation/myfile.dbf'
TO '/newlocation/myfile.dbf';
(Note that you cannot RENAME a TEMPFILE - TEMPFILEs should
be dropped and recreated at the new location)
4. Online the relevant file / tablespace (if database is open)
IMPORTANT:If there are multiple errors (which are NOT due to NOLOGGING)
OR You have OS level errors against the affected file
OR The errors are transient and keep moving about
then there is little point proceeding until the underlying problem
has been addressed or space is available on alternative disks.
Get your hardware vendor to check the system over and contact
Oracle Support with details of all errors.
Please note: Whilst a failed hardware check is a good indication
that there is a hardware issue, a successful hardware check should not
be taken as proof that thereis no hardware related issue - it is very
common for hardware tests to report success when there really is some
underlying fault.
If using any special IO options such as direct IO , async IO or
similar it may be worth disabling them in order to eliminate such options
as a potential source of problems.
(3) Which Objects are Affected ?
It is best to determine which objects are affected BEFORE making any
decisions about how to recover - this is because the corruption/s may be
on object/s which can easily be re-created.
Eg: For a corruption on a 5 row lookup table it may be far quicker to
drop and recreate the table than to perform a recovery.
For each corruption collect the information in the following table.
The steps to do this are explained below.
*
Information to Record for each Corruption Original
Error
Absolute
File#
&AFN
Relative
File#
&RFN
Block#
&BL
Tablespace
Segment
Type
Segment
Owner.Name
Related
Objects
Recovery
Options
The notes below will help you fill in this table for each corruption.
"Original Error"
This is the error as initially reported.
Eg: ORA-1578 / ORA-1110 , ORA-600 with all arguments etc..
"Absolute File#", "Relative File#" and "Block#"
The File# and Block# should have been given to you either by the
error, by Oracle Support, or by the steps in an error article which
directed you to this article.
In Oracle8/8i/9i/10g:
The absolute and relative file numbers are often the
same but can differ (especially if the database has
been migrated from Oracle7). It is important to get
the correct numbers for &AFN and &RFN
or you may end up salvaging the wrong object !!
An ORA-1578 reports the RELATIVE file number, with the
ABSOLUTE file number given in the accompanying ORA-1110
error. For ORA-600 errors you should be told an absolute
file number.
The following query will show the absolute and relative
file numbers for datafiles in the database:
SELECT tablespace_name, file_id "AFN", relative_fno "RFN"
FROM dba_data_files;
In Oracle8i/9i/10g:
In addition to the notes above about Oracle8, Oracle8i onwards
can have TEMPFILES.The following query will show the
absolute and relative file numbers for tempfiles in the
database:
SELECT tablespace_name, file_id+value "AFN", relative_fno "RFN"
FROM dba_temp_files, v$parameter
WHERE name='db_files';
In Oracle7:Use the same file number for both the "Absolute File#"
and the "Relative File#"
"Segment Type", "Owner", "Name" and "Tablespace"
The following query will tell you the object TYPE , OWNER and NAME of
a segment given the absolute file number "&AFN" and block number "&BL" of the
corrupt block - the database must be open in order to use this query:
SELECT tablespace_name, segment_type, owner, segment_name
FROM dba_extents
WHERE file_id = &AFN
and &BL between block_id AND block_id + blocks - 1
;
If the block is in a TEMPFILE the above query will return no data.
For TEMPFILES the "Segment Type" will be "TEMPORARY".
"Related Objects" and Possible "Recovery Options" by SEGMENT_TYPE:
The related objects and recovery options which can be used depend on the
SEGMENT_TYPE. The additional queries and possible recovery options are
listed below for each of the most common segment types.
CACHE
CLUSTER
INDEX PARTITION
INDEX
LOBINDEX
LOBSEGMENT
ROLLBACK
TABLE PARTITION
TABLE
TEMPORARY
TYPE2 UNDO
Some other Segment Type
"no rows" from the query
CACHE
- If the segment type is CACHE recheck you have entered the SQL
and parameters correctly.
If you get the same result contact Oracle support with all
information you have.
Options:
The database is likely to require recovery.
{Continue}{Back to Segment List}
CLUSTER
- If the segment is a CLUSTER determine which tables it contains.
Eg:
SELECT owner, table_name
FROM dba_tables
WHERE owner='&OWNER'
AND cluster_name='&SEGMENT_NAME'
;
Options:
If the OWNER is "SYS" then contact Oracle support with all details.
The database is likely to require recovery.
For non dictionary clusters possible options include:
Recovery
ORSalvage data from all tables in the cluster
THEN Recreate the cluster and all its tables
As the cluster may contain a number of tables, it is best to
collect information for each table in the cluster before making a
decision.
{Collect TABLE information}{Back to Segment List}
INDEX PARTITION
- If the segment is an INDEX PARTITION note the NAME and OWNER
and then determine which partition is affected thus:
SELECT partition_name
FROM dba_extents
WHERE file_id = &AFN
AND &BL BETWEEN block_id AND block_id + blocks - 1
;
then continue below as if the segment was an INDEX segment.
Options:
Index partitions can be rebuilt using:
ALTER INDEX xxx REBUILD PARTITION ppp;
(take care with the REBUILD option as described in
"Recreating Indexes" below)
INDEX
- If the segment is an INDEX then if the OWNER is "SYS" contact
Oracle support with all details.
For a non-dictionary INDEX or INDEX PARTITIONs find out which table
the INDEX is on:
Eg:
SELECT table_owner, table_name
FROM dba_indexes
WHERE owner='&OWNER'
AND index_name='&SEGMENT_NAME'
;
and determine if the index supports a CONSTRAINT:
Eg:
SELECT owner, constraint_name, constraint_type, table_name
FROM dba_constraints
WHERE owner='&TABLE_OWNER'
AND constraint_name='&INDEX_NAME'
;
Possible values for CONSTRAINT_TYPE are:
P
The index supports a primary key constraint.
U
The index supports a unique constraint.
If the INDEX supports a PRIMARY KEY constraint (type "P"
then
check if the primary key is referenced by any foreign key constraints:
Eg:
SELECT owner, constraint_name, constraint_type, table_name
FROM dba_constraints
WHERE r_owner='&TABLE_OWNER'
AND r_constraint_name='&INDEX_NAME'
;
Options:
If the OWNER is "SYS" then contact Oracle support with all details.
The database is likely to require recovery.
For non dictionary indexes possible options include:
Recovery
ORRecreate the index (with any associated constraint
disables/enables)
(take care with the REBUILD option as described in
"Recreating Indexes" below)
{Continue}{Back to Segment List}
ROLLBACK
- If the segment is a ROLLBACK segment contact Oracle support as
rollback segment corruptions require special handling.
Options:
The database is likely to require recovery.
{Continue}{Back to Segment List}
TYPE2 UNDO
- TYPE2 UNDO is a system managed undo segment which is a special
form of rollback segment. Corruptions in these segments require
special handling.
Options:
The database is likely to require recovery.
{Continue}{Back to Segment List}
TABLE PARTITION
- If the segment is a TABLE PARTITION note the NAME and OWNER
and then determine which partition is affected thus:
SELECT partition_name
FROM dba_extents
WHERE file_id = &AFN
AND &BL BETWEEN block_id AND block_id + blocks - 1
;
then continue below as if the segment was a TABLE segment.
Options:
If all corruptions are in the same partition then one option
at this point is to EXCHANGE the corrupt partition with an
empty TABLE - this can allow the application to continue (without
access to the data in the corrupt partition) whilst any good
data can then be extracted from the table.
For other options see the TABLE options below.
TABLE
- If the OWNER is "SYS" then contact Oracle support with all details.
The database is likely to require recovery.
For a non-dictionary TABLE or TABLE PARTITIONs find out which
INDEXES exist on the TABLE:
Eg:
SELECT owner, index_name, index_type
FROM dba_indexes
WHERE table_owner='&OWNER'
AND table_name='&SEGMENT_NAME'
;
and determine if there is any PRIMARY key on the table:
Eg:
SELECT owner, constraint_name, constraint_type, table_name
FROM dba_constraints
WHERE owner='&OWNER'
AND table_name='&SEGMENT_NAME'
AND constraint_type='P'
;
If there is a primary key then check if this is referenced by any
foreign key constraints:
Eg:
SELECT owner, constraint_name, constraint_type, table_name
FROM dba_constraints
WHERE r_owner='&OWNER'
AND r_constraint_name='&CONSTRAINT_NAME'
;
Options:
If the OWNER is "SYS" then contact Oracle support with all details.
The database is likely to require recovery.
For non dictionary tables possible options include:
Recovery
ORSalvage data from the table (or partition)
THEN Recreate the table (or partition)
ORLeave the corruption in place
(eg: Use DBMS_REPAIR to mark the problem blocks to be skipped)
{Continue}{Back to Segment List}
|