求助:oracle10g RAC的ASM磁盘组无法挂载(ASM磁盘头信息丢失+ORA-15063)

[复制链接]
查看11 | 回复9 | 2009-1-4 14:52:28 | 显示全部楼层 |阅读模式
我想知道ASM管理的磁盘header信息丢失的时候,如何数据恢复。
我试着用事先备份的磁盘头文件dd回来,VOL2和VOL4磁盘头信息是对了,但是ASM还是无法加载ORCL_DATA1磁盘组,执行alter diskgroup ORCL_DATA1 mount报错:
ORA-15032: not all alterations performed
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "ORCL_DATA1"
昨天出现的问题,oracle10g RAC,两个节点,可能是ASM的bug导致的数据库无法访问VOL2,VOL4分区,进而磁盘组ORCL_DATA1无法挂载。
alert日志显示:
Tue Dec 30 18:11:00 2008
Reread from mirror side 'VOL2' returns corrupted data
Reread from mirror side 'VOL4' returns corrupted data
用kfed工具发现这两个分区磁盘头信息已经损坏。

恢复磁盘头时,我orcl1上执行的
#dd if=/u01/app/oracle/asmdiskheader/VOL2 of=/dev/oracleasm/disks/VOL2 bs=4096 count=1
#dd if=/u01/app/oracle/asmdiskheader/VOL4 of=/dev/oracleasm/disks/VOL4 bs=4096 count=1
我以为这样就可以了,然后$export ORACLE_SID=+ASM1;sqlplus / as sysdba,启动磁盘组ORCL_DATA1没成功。我把两台机器同时重启也不行,重启之后的一些sql查询如下:
SQL> select group_number,disk_number,path,STATE,REDUNDANCY,TOTAL_MB,FREE_MB,NAME,FAILGROUP from v$asm_disk;

GROUP_NUMBER DISK_NUMBER PATH
STATE REDUNDANCY TOTAL_MBFREE_MB NAME
FAILGROUP
------------ ----------- -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------------- -------------- ---------- ---------- ------------------------------------------------------------ ------------------------------------------------------------
0 0 ORCL:VOL2
NORMALUNKNOWN 6588415
0
0 1 ORCL:VOL4
NORMALUNKNOWN 6588415
0
1 0 ORCL:VOL1
NORMALUNKNOWN 10219521021597 VOL1
VOL1
1 1 ORCL:VOL3
NORMALUNKNOWN 10239991023644 VOL3
VOL3
SQL> select group_number , name , state , type , offline_disks from v$asm_diskgroup;

GROUP_NUMBER NAME
STATE
TYPE OFFLINE_DISKS
------------ ------------------------------------------------------------ ---------------------- ------------ -------------
1 FLASH_RECOVERY_AREA
MOUNTED
NORMAL
0
0 ORCL_DATA1
DISMOUNTED
0
ASM的alert日志如下:
Wed Dec 31 14:00:00 2008
Instance terminated by LMON, pid = 30099
Wed Dec 31 14:10:28 2008
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Interface type 1 eth1 172.16.0.0 configured from OCR for use as a cluster interconnect
Interface type 1 eth0 10.134.64.0 configured from OCR for use asa public interface
Picked latch-free SCN scheme 1
Using LOG_ARCHIVE_DEST_1 parameter default value as /u01/app/oracle/product/10.2.0/db_1/dbs/arch
Autotune of undo retention is turned off.
LICENSE_MAX_USERS = 0
SYS auditing is disabled
ksdpec: called for event 13740 prior to event group initialization
Starting up ORACLE RDBMS Version: 10.2.0.1.0.
System parameters with non-default values:
large_pool_size
= 12582912
spfile
= /u02/oradata/orcl/dbs/spfile+ASM.ora
instance_type
= asm
cluster_database = TRUE
instance_number
= 2
remote_login_passwordfile= EXCLUSIVE
background_dump_dest = /u01/app/oracle/admin/+ASM/bdump
user_dump_dest = /u01/app/oracle/admin/+ASM/udump
core_dump_dest = /u01/app/oracle/admin/+ASM/cdump
asm_diskgroups = ORCL_DATA1, FLASH_RECOVERY_AREA
Cluster communication is configured to use the following interface(s) for this instance
172.16.0.2
Wed Dec 31 14:10:28 2008
cluster interconnect IPC version:Oracle UDP/IP
IPC Vendor 1 proto 2
PMON started with pid=2, OS id=8456
DIAG started with pid=3, OS id=8458
PSP0 started with pid=4, OS id=8460
LMON started with pid=5, OS id=8462
LMD0 started with pid=6, OS id=8464
LMS0 started with pid=7, OS id=8470
MMAN started with pid=8, OS id=8495
DBW0 started with pid=9, OS id=8497
LGWR started with pid=10, OS id=8499
CKPT started with pid=11, OS id=8501
SMON started with pid=12, OS id=8503
RBAL started with pid=13, OS id=8505
GMON started with pid=14, OS id=8507
Wed Dec 31 14:10:29 2008
lmon registered with NM - instance id 2 (internal mem no 1)
Wed Dec 31 14:10:29 2008
Reconfiguration started (old inc 0, new inc 1)
ASM instance
pseudo shared rm latch used
List of nodes:
1
Global Resource Directory frozen
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Wed Dec 31 14:10:29 2008
LMS 0: 0 GCS shadows cancelled, 0 closed
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Post SMON to start 1st pass IR
Wed Dec 31 14:10:29 2008
LMS 0: 0 GCS shadows traversed, 0 replayed
Wed Dec 31 14:10:29 2008
Submitted all GCS remote-cache requests
Post SMON to start 1st pass IR
Fix write in gcs resources
Reconfiguration complete
LCK0 started with pid=15, OS id=8509
Wed Dec 31 14:10:30 2008
SQL> ALTER DISKGROUP ALL MOUNT
Wed Dec 31 14:10:30 2008
NOTE: cache registered group FLASH_RECOVERY_AREA number=1 incarn=0x998878d3
NOTE: cache registered group ORCL_DATA1 number=2 incarn=0x998878d4
Wed Dec 31 14:10:30 2008
Loaded ASM Library - Generic Linux, version 2.0.2 (KABI_V2) library for asmlib interface
Wed Dec 31 14:10:30 2008
NOTE: Hbeat: instance first (grp 1)
ERROR: no PST quorum in group 2: required 2, found 0
Wed Dec 31 14:10:30 2008
NOTE: cache dismounting group 2/0x998878D4 (ORCL_DATA1)
NOTE: dbwr not being msg'd to dismount
ERROR: diskgroup ORCL_DATA1 was not mounted
Wed Dec 31 14:10:32 2008
Reconfiguration started (old inc 1, new inc 2)
List of nodes:
0 1
Global Resource Directory frozen
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Wed Dec 31 14:10:33 2008
LMS 0: 0 GCS shadows cancelled, 0 closed
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Wed Dec 31 14:10:33 2008
LMS 0: 0 GCS shadows traversed, 0 replayed
Wed Dec 31 14:10:33 2008
Submitted all GCS remote-cache requests
Fix write in gcs resources
Reconfiguration complete
Wed Dec 31 14:10:35 2008
NOTE: start heartbeating (grp 1)
NOTE: cache opening disk 0 of grp 1: VOL1 label:VOL1
Wed Dec 31 14:10:35 2008
NOTE: F1X0 found on disk 0 fcn 0.0
NOTE: cache opening disk 1 of grp 1: VOL3 label:VOL3
NOTE: F1X0 found on disk 1 fcn 0.0
NOTE: cache mounting (first) group 1/0x998878D3 (FLASH_RECOVERY_AREA)
* allocate domain 1, invalid = TRUE
kjbdomatt send to node 0
Wed Dec 31 14:10:35 2008
NOTE: attached to recovery domain 1
Wed Dec 31 14:10:35 2008
NOTE: starting recovery of thread=1 ckpt=11.139
NOTE: starting recovery of thread=2 ckpt=13.41
NOTE: advancing ckpt for thread=1 ckpt=11.139
NOTE: advancing ckpt for thread=2 ckpt=13.41
NOTE: cache recovered group 1 to fcn 0.2579
Wed Dec 31 14:10:35 2008
NOTE: opening chunk 1 at fcn 0.2579 ABA
NOTE: seq=12 blk=140
Wed Dec 31 14:10:35 2008
NOTE: cache mounting group 1/0x998878D3 (FLASH_RECOVERY_AREA) succeeded
SUCCESS: diskgroup FLASH_RECOVERY_AREA was mounted
Wed Dec 31 14:10:44 2008
NOTE: recovering COD for group 1/0x998878d3 (FLASH_RECOVERY_AREA)
SUCCESS: completed COD recovery for group 1/0x998878d3 (FLASH_RECOVERY_AREA)
Wed Dec 31 14:27:34 2008
SQL> alter diskgroup ORCL_DATA1 mount
Wed Dec 31 14:27:34 2008
NOTE: cache registered group ORCL_DATA1 number=2 incarn=0x9b1878dc
Wed Dec 31 14:27:34 2008
ERROR: no PST quorum in group 2: required 2, found 0
Wed Dec 31 14:27:34 2008
NOTE: cache dismounting group 2/0x9B1878DC (ORCL_DATA1)
回复

使用道具 举报

千问 | 2009-1-4 14:52:28 | 显示全部楼层
哪位对ASM的磁盘组加载比较了解,给点提示呗
回复

使用道具 举报

千问 | 2009-1-4 14:52:28 | 显示全部楼层
原帖由 dalier149 于 2008-12-31 16:12 发表
恢复磁盘头时,我orcl1上执行的
#dd if=/u01/app/oracle/asmdiskheader/VOL2 of=/dev/oracleasm/disks/VOL2 bs=4096 count=1
#dd if=/u01/app/oracle/asmdiskheader/VOL4 of=/dev/oracleasm/disks/VOL4 bs=4096 count=1


你从哪找到的文档 说count=1就能恢复了?


回复

使用道具 举报

千问 | 2009-1-4 14:52:28 | 显示全部楼层
原帖由 wa0362 于 2009-1-1 20:04 发表

你从哪找到的文档 说count=1就能恢复了?



那应该是怎么样的,能说的具体些吗,谢谢
回复

使用道具 举报

千问 | 2009-1-4 14:52:28 | 显示全部楼层
原帖由 dalier149 于 2009-1-2 00:14 发表

那应该是怎么样的,能说的具体些吗,谢谢

我不知道,我没找到过相关的文档。
所以我才问你! 你难道也没找到相关说明就 自己猜着 一个block就能恢复磁盘头?


回复

使用道具 举报

千问 | 2009-1-4 14:52:28 | 显示全部楼层
这就是ORACLE极力推荐的ASM,有时候保守一些好。
回复

使用道具 举报

千问 | 2009-1-4 14:52:28 | 显示全部楼层
看来一个4096是不可能恢复磁盘头的了,我在网上找到过这篇文章,http://space.itpub.net/175005/viewspace-402150。我完全是死马当活马医了,系统部署的10多个RAC中,之前已经发生多次ASM丢失磁盘头的情况,原因很难说,感觉唯一的办法就是升级到10.2.0.4,不过我还是希望在ASM磁盘恢复上做些工作,很不甘心。
SQL> select path, MOUNT_STATUS, HEADER_STATUS, MODE_STATUS, STATE from v$asm_disk;
PATH
MOUNT_STATUSHEADER_STATUS
MODE_STATUSSTATE
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------- ------------------------ -------------- ----------------
ORCL:VOL2
CLOSEDMEMBER
ONLINENORMAL
ORCL:VOL4
CLOSEDMEMBER
ONLINENORMAL
ORCL:VOL1
CACHEDMEMBER
ONLINENORMAL
ORCL:VOL3
CACHEDMEMBER
ONLINENORMAL

==============================================================================================================================
[oracle@bjljcsev-10 lib]$ kfed read /dev/oracleasm/disks/VOL2aunum=0 blknum=1|more
kfbh.endian:
0 ; 0x000: 0x00
kfbh.hard:
0 ; 0x001: 0x00
kfbh.type:
0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:
0 ; 0x003: 0x00
kfbh.block.blk:
0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj:
0 ; 0x008: TYPE=0x0 NUMB=0x0
kfbh.check:
0 ; 0x00c: 0x00000000
kfbh.fcn.base:
0 ; 0x010: 0x00000000
kfbh.fcn.wrap:
0 ; 0x014: 0x00000000
kfbh.spare1:
0 ; 0x018: 0x00000000
kfbh.spare2:
0 ; 0x01c: 0x00000000
[oracle@bjljcsev-10 lib]$ kfed read /dev/oracleasm/disks/VOL4aunum=0 blknum=1|more
kfbh.endian:
0 ; 0x000: 0x00
kfbh.hard:
0 ; 0x001: 0x00
kfbh.type:
0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:
0 ; 0x003: 0x00
kfbh.block.blk:
0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj:
0 ; 0x008: TYPE=0x0 NUMB=0x0
kfbh.check:
0 ; 0x00c: 0x00000000
kfbh.fcn.base:
0 ; 0x010: 0x00000000
kfbh.fcn.wrap:
0 ; 0x014: 0x00000000
kfbh.spare1:
0 ; 0x018: 0x00000000
kfbh.spare2:
0 ; 0x01c: 0x00000000
[oracle@bjljcsev-10 lib]$ kfed read /dev/oracleasm/disks/VOL1aunum=0 blknum=1|more
kfbh.endian:
1 ; 0x000: 0x01
kfbh.hard:
130 ; 0x001: 0x82
kfbh.type:
2 ; 0x002: KFBTYP_FREESPC
kfbh.datfmt:
1 ; 0x003: 0x01
kfbh.block.blk:
1 ; 0x004: T=0 NUMB=0x1
kfbh.block.obj:
2147483648 ; 0x008: TYPE=0x8 NUMB=0x0
kfbh.check:
2180809470 ; 0x00c: 0x81fc82fe
kfbh.fcn.base:
0 ; 0x010: 0x00000000
kfbh.fcn.wrap:
0 ; 0x014: 0x00000000
kfbh.spare1:
0 ; 0x018: 0x00000000
kfbh.spare2:
0 ; 0x01c: 0x00000000
kfdfsb.aunum:
0 ; 0x000: 0x00000000
kfdfsb.max:
254 ; 0x004: 0x00fe
kfdfsb.cnt:
254 ; 0x006: 0x00fe
kfdfse[0].total:
448 ; 0x008: 0x01c0
kfdfse[0].free:
1 ; 0x00a: 0x01
kfdfse[0].frag:
1 ; 0x00b: 0x01
kfdfse[1].total:
448 ; 0x00c: 0x01c0
kfdfse[1].free:
1 ; 0x00e: 0x01
kfdfse[1].frag:
1 ; 0x00f: 0x01
kfdfse[2].total:
448 ; 0x010: 0x01c0
kfdfse[2].free:
1 ; 0x012: 0x01
kfdfse[2].frag:
1 ; 0x013: 0x01
kfdfse[3].total:
448 ; 0x014: 0x01c0
kfdfse[3].free:
1 ; 0x016: 0x01
kfdfse[3].frag:
1 ; 0x017: 0x01
kfdfse[4].total:
448 ; 0x018: 0x01c0
kfdfse[4].free:
1 ; 0x01a: 0x01
kfdfse[4].frag:
1 ; 0x01b: 0x01
kfdfse[5].total:
448 ; 0x01c: 0x01c0
kfdfse[5].free:
1 ; 0x01e: 0x01
kfdfse[5].frag:
1 ; 0x01f: 0x01
kfdfse[6].total:
448 ; 0x020: 0x01c0
kfdfse[6].free:
1 ; 0x022: 0x01
kfdfse[6].frag:
1 ; 0x023: 0x01
--More--
==========================================================
回复

使用道具 举报

千问 | 2009-1-4 14:52:28 | 显示全部楼层
原帖由 dalier149 于 2009-1-2 14:04 发表
看来一个4096是不可能恢复磁盘头的了,我在网上找到过这篇文章,http://space.itpub.net/175005/viewspace-402150。我完全是死马当活马医了,系统部署的10多个RAC中,之前已经发生多次ASM丢失磁盘头的情况,原因很难说,感觉唯一的办法就是升级到10.2.0.4,不过我还是希望在ASM磁盘恢复上做些工作,很不甘心。


裸设备用的物理磁盘还是逻辑卷 还以磁盘分区?
[ 本帖最后由 wa0362 于 2009-1-2 14:13 编辑 ]
回复

使用道具 举报

千问 | 2009-1-4 14:52:28 | 显示全部楼层
原帖由 wa0362 于 2009-1-2 14:12 发表

你的操作系统是什么?
裸设备用的物理磁盘还是逻辑卷 还以磁盘分区?

AS4U4,用的oracleasm的逻辑卷VOL1,...,VOL4
当时按照http://www.oracle.com/technology ... rac10gr2_iscsi.html这篇文档部署的。
回复

使用道具 举报

千问 | 2009-1-4 14:52:28 | 显示全部楼层
原帖由 dalier149 于 2009-1-2 14:18 发表
AS4U4,用的oracleasm的逻辑卷VOL1,...,VOL4
当时按照http://www.oracle.com/technology ... rac10gr2_iscsi.html这篇文档部署的。

不是照官方文档做的啊?
你的asm磁盘是用的磁盘的一个分区吧(VOL1,VOL2...)? 分区的是否有没有试着磁盘头空出一点来?
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

主题

0

回帖

4882万

积分

论坛元老

Rank: 8Rank: 8

积分
48824836
热门排行