中午发生RAC上的问题,不知道是不是Bug , 正在查找中 ?

[复制链接]
查看11 | 回复4 | 2015-3-6 11:57:31 | 显示全部楼层 |阅读模式
Linux AS2.1 + Oracle9.2.0.4RAC.
OLTP .
LinuxKernel :2.4.9-e.40smp #1 SMP

实例重新启动倒是恢复正常了 。具体原因还在查询中。

Node1 的信息:

Wed May 17 12:00:26 2006
ARC1: Evaluating archive log 2 thread 1 sequence 19622
Wed May 17 12:00:26 2006
Current log# 3 seq# 19623 mem# 0: /ocfs_ctrl_redo/orcl/redo03.log
Current log# 3 seq# 19623 mem# 1: /ocfs_data/orcl/redo03b.log
Wed May 17 12:00:26 2006
ARC1: Beginning to archive log 2 thread 1 sequence 19622
Creating archive destination LOG_ARCHIVE_DEST_1: '/ocfs_arch1/orcl/1_19622.dbf'
ARC1: Completed archivinglog 2 thread 1 sequence 19622
Wed May 17 12:11:03 2006
Waiting for clusterware split-brain resolution
Evicting instance 2 from cluster
Wed May 17 12:21:12 2006
Reconfiguration started
List of nodes: 0,
Wed May 17 12:21:12 2006
Reconfiguration started
List of nodes: 0,
[/COLOR]
Wed May 17 12:41:46 2006
Starting ORACLE instance (normal)
Wed May 17 12:41:46 2006
Global Enqueue Service Resources = 20158, pool = 8
Wed May 17 12:41:46 2006
Global Enqueue Service Enqueues = 32606
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
GES IPC: Receivers 3Senders 3
GES IPC: BuffersReceive 1000Send 2260Reserve 1000
GES IPC: Msg SizeRegular 396Batch 2048
SCN scheme 2
Using log_archive_dest parameter default value
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up ORACLE RDBMS Version: 9.2.0.4.0.
System parameters with non-default values:


-------------------------------------------------------------------------------------------------

Node2 上的log信息:

Wed May 17 12:09:17 2006
Communications reconfiguration: instance 0
Wed May 17 12:11:03 2006
Waiting for clusterware split-brain resolution
Wed May 17 12:21:04 2006
Errors in file /u01/product/admin/orcl/bdump/orcl2_lmon_2458.trc:
ORA-29740: evicted by member 1, group incarnation 3
LMON: terminating instance due to error 29740
Wed May 17 12:21:06 2006
Errors in file /u01/product/admin/orcl/bdump/orcl2_smon_2472.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-29740: evicted by member , group incarnation
Instance terminated by LMON, pid = 2458
Wed May 17 12:41:46 2006
Starting ORACLE instance (normal)
Wed May 17 12:41:46 2006
Global Enqueue Service Resources = 20158, pool = 4
Wed May 17 12:41:46 2006
Global Enqueue Service Enqueues = 32606
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
GES IPC: Receivers 3Senders 3
GES IPC: BuffersReceive 1000Send 2260Reserve 1000
GES IPC: Msg SizeRegular 396Batch 2048
SCN scheme 2
Using log_archive_dest parameter default value
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up ORACLE RDBMS Version: 9.2.0.4.0.
System parameters with non-default values:

----------------------------------------------------------------------------------------------------


trace file 信息:

/u01/product/admin/orcl/bdump/orcl2_smon_2472.trc
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production
ORACLE_HOME = /u01/product/oracle
System name:Linux
Node name:dell-node2
Release:2.4.9-e.40smp
Version:#1 SMP Thu Apr 8 16:53:29 EDT 2004
Machine:i686
Instance name: orcl2
Redo thread mounted by this instance: 2
Oracle process number: 11
Unix process pid: 2472, image: oracle@dell-node2 (SMON)
*** SESSION ID

12.1) 2006-05-17 12:18:18.028
*** 2006-05-17 12:18:18.028
kjctipccb: send timed out for msg 0x0x97406ad0 to (0 2), inc 2 type 32 waited 307 sec
kjctipccb: stat 3dest_inc 2sys_inc 2
------ Dumping SKGXP context ------
SKGXPCTX: 0xad7f470 ctx
admono 0x68c70cb4 admport:
SSKGXPT 0xad7f558 flags info for network 0
socket no 8 IP 10.1.1.6 UDP 32831
sflags SSKGXPT_WRITESSKGXPT_UP
info for network 1
socket no 0 IP 0.0.0.0UDP 0
sflags SSKGXPT_DOWN
active 0actcnt 1
context timestamp 0x80ccca75
no ports
sconno accono erttstate seq# sentasync sync rtrans acks
0x538ba1ef 0x02579b42 3235173684509845090084509
0x538ba1f0 0x226d924e 3234678814025140250011593
0x538ba1f1 0x1999f15f 323470551429214292029712062
ach accono sconnoadmnostate seq#rcv rtrans acks
*** 2006-05-17 12:21:06.322
KCL: caught error 29740 during cr lock op
*** 2006-05-17 12:21:06.323
SMON: following errors trapped and ignored:
ORA-00604: error occurred at recursive SQL level 1
ORA-29740: evicted by member , group incarnation
~
回复

使用道具 举报

千问 | 2015-3-6 11:57:31 | 显示全部楼层
找到些资料 , 有点像, 研究中
http://www.itpub.net/showthread. ... 928&pagenumber=

Reason 2: An instance death was detected. This can happen if:
a) An instance fails to issue a heartbeat to the control file.
When the heartbeat is missing, LMON will issue a network ping to the instance
not issuing the heartbeat. As long as the instance responds to the ping,
LMON will consider the instance alive. If, however, the heartbeat is not
issued for the length of time of the control file enqueue timeout, the
instance is considered to be problematic and will be evicted.
Common causes for an ORA-29740 eviction (Reason 2):
a) NTP (Time changes on cluster) - usually on Linux, Tru64, or IBM AIX
b) Network Problems (SAN).
c) Resource Starvation (CPU, I/O, etc..)
d) An Oracle bug.

Common bugs for reason 2 evictions:

If you feel that this eviction was not correct, do a search in Metalink or the
bug database for:
ORA-29740 'reason 2'
Important files to review are:
a) Each instance's alert log
b) Each instance's LMON trace file
c) Statspack reports from all nodes leading up to the eviction
d) The CKPT process trace file of the evicted instance
e) Other bdump or udump files...
f) Each node's syslog or messages file
g) iostat output before, after, and during evictions
h) vmstat output before, after, and during evictions
i) netstat output before, after, and during evictions
-----------------------------------------------------------------------------
Reason 3: Communications Failure. This can happen if:
a) The LMON processes loose communication between one another.
b) One instance loses communications with the LMD process of another
instance.
c) An LMON process is blocked, spinning, or stuck and is not
responding to the other instance(s) LMON process.
d) An LMD process is blocked or spinning.
In this case the ORA-29740 error is recorded when there are communication
issues between the instances. It is an indication that an instance has been
evicted from the configuration as a result of IPC send timeout. A
communications failure between a foreground, or background other than LMON,
and a remote LMD will also generate a ORA-29740 with reason 3. When this
occurs, the trace file of the process experiencing the error will print a
message:
Reporting Communication error with instance:
If communication is lost at the cluster layer (for example, network cables
are pulled), the cluster software may also perform node evictions in the
event of a cluster split-brain. Oracle will detect a possible split-brain
and wait for cluster software to resolve the split-brain. If cluster
software does not resolve the split-brain within a specified interval,
Oracle proceeds with evictions.
Oracle Support has seen cases where resource starvation (CPU, I/O, etc...) can
cause an instance to be evicted with this reason code. The LMON or LMD process
could be blocked waiting for resources and not respond to polling by the remote
instance(s). This could cause that instance to be evicted. If you have
a statspack report available from the time just prior to the eviction on the
evicted instance, check for poor I/O times and high CPU utilization. Poor I/O
times would be an average read time of > 20ms.
Common causes for an ORA-29740 eviction (Reason 3):
a) Network Problems.
b) Resource Starvation (CPU, I/O, etc..)
c) Severe Contention in Database.
d) An Oracle bug.
回复

使用道具 举报

千问 | 2015-3-6 11:57:31 | 显示全部楼层
ntp 也不敢用了, 取消了先
看看再说
回复

使用道具 举报

千问 | 2015-3-6 11:57:31 | 显示全部楼层
https://metalink.oracle.com/meta ... ts:3917158,TRUE,100
回复

使用道具 举报

千问 | 2015-3-6 11:57:31 | 显示全部楼层
不知道LZ的意思是什么
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

主题

0

回帖

4882万

积分

论坛元老

Rank: 8Rank: 8

积分
48824836
热门排行