生产库第三次爆发了CPU突然从1左右到达200服务器卡死,过段时间后消失的情况,仍然不知道具体出现的原因和解决办法。
环境为11.2.0.3 RAC,情况出现于第二节点,某大查询类应用单独连接第二节点,问题一定是大查询应用导致的,但无法更深入的理解原理,大查询平时也有,CPU LOAD也无非1-10之间,无法理解何种操作可以让CPU爆棚导致数据库hang死。
alert部分内容:
最开始的时候
Fri Jun 05 13:40:57 2015
Archived Log entry 23007 added for thread 2 sequence 3750 ID 0x916ee4e1 dest 1:
Fri Jun 05 13:56:40 2015
Time drift detected. Please check VKTM trace file for more details.
Fri Jun 05 14:00:27 2015
Process J001 died, see its trace file
kkjcre1p: unable to spawn jobq slave process
Fri Jun 05 14:00:38 2015
Errors in file /u01/oracle/diag/rdbms/newrac/rac2/trace/rac2_cjq0_28934.trc:
trc内容:
*** 2015-06-05 13:58:53.025
Process diagnostic dump for J001, OS id=19271
-------------------------------------------------------------------------------
os thread scheduling delay history: (sampling every 1.000000 secs)
0.000000 secs at [ 13:58:50 ]
NOTE: scheduling delay has not been sampled for 2.103287 secs0.000000 secs from [ 13:58:48 - 13:58:53 ], 5 sec avg
0.046415 secs from [ 13:57:53 - 13:58:53 ], 1 min avg
0.011919 secs from [ 13:53:53 - 13:58:53 ], 5 min avg
*** 2015-06-05 13:59:13.707
loadavg : 149.49 62.91 24.77
*** 2015-06-05 13:59:23.811
Memory (Avail / Total) = 120.17M / 48262.39M
Swap (Avail / Total) = 5422.79M /10239.96M
F S UIDPIDPPIDC PRINI ADDR SZ WCHANSTIME TTY
TIME CMD
0 D oracle 19271 1 1178 0 - 5957410 sync_b 13:56 ?00:00:15 ora_j001_rac2
Skipping stack dump because max dump time exceeded.
-------------------------------------------------------------------------------
Process diagnostic dump actual duration=30.780000 sec
(max dump time=30.000000 sec)
*** 2015-06-05 13:59:23.811
Waited for process J001 to initialize for 153 seconds
HugePages is crucial for faster Oracle database performance on Linux if you have a large RAM and SGA. If your combined database SGAs is large (like more than 8GB, can even be important for smaller), you will need HugePages configured.
Roy杰 发表于 2015-6-14 14:12
HugePages is crucial for faster Oracle database performance on Linux if you have a large RAM and SGA ...
感觉开启2M大页 内存剩余率也不多. Linux是 三层页地址转换. 页目录1024项(4K)->中间(未查到)-->页表(1024)->实际地址
1024*1024*4K=4GB
大页是提高效率 还是节约内存? 假设有1000个进程连接ORACLE.
我这边也没有配2M 大内存. 每个进程使用MEMTOP.SH 看到是3M左右