试了一把db2trc。。。

[复制链接]
查看11 | 回复7 | 2009-7-22 09:30:00 | 显示全部楼层 |阅读模式
自从看了wang很多的经典案例后,很是佩服,觉得db2trc是个好东东,
一直想尝试用db2trc解决问题,不过一来这玩意比较高深,许多内部函数,
返回值都不知道,另外也没有合适案例,所以一直未能得逞。
今天总算逮着一个机会搞了一把db2trc,虽然问题很小,不如wang的那么精彩纷呈,
但总算是自己的一点小收获,因此把过程贴出来与大家分享一下。
场景如下:
两个实例db2inst2, db2inst3
数据库mytestdb,
某一天,我用db2relocatedb把 mytestdb 从db2inst2迁到了db2inst3下。
(db2 catalog 目录的内容是tar过去的)。
然后平安度过好多天,
再某一天,我需要抓一些数据库的deadlock情况
我想起mytestdb中有个deadlock monitor DB2DETAILDEADLOCK,
但当我试图打开的时候报如下错:
$db2 "set event monitor DB2DETAILDEADLOCK state 1"

DB21034EThe command was processed as an SQL statement because it was not a
valid Command Line Processor command.During SQL processing it returned:
SQL1618NThe target path of the specified event monitor is in use by another
event monitor.SQLSTATE=51026
db2diag.log里的错误如下:
2009-09-17-17.53.53.411328+480 I710483C334LEVEL: Warning
PID : 344552
TID: 1 PROC : db2event (DB2DETAILDEADLOCK) 0
INSTANCE: db2inst3
NODE : 000
FUNCTION: DB2 UDB, database monitor, sqm___sqlmeiot, probe:70
MESSAGE : target type 0 failed to open DB2DETAILDEADLOCK with rc -2146631636
2009-09-17-17.53.53.411948+480 I710818C407LEVEL: Severe
PID : 451068
TID: 1 PROC : db2agent (MYTESTDB) 0
INSTANCE: db2inst3
NODE : 000 DB : MYTESTDB
APPHDL: 0-48
APPID: *LOCAL.db2inst3.090917093706
FUNCTION: DB2 UDB, database monitor, sqm___start_evmon, probe:40
RETCODE : ZRC=0x800D002C=-2146631636=SQLM_RC_EVPATH "path in use"
2009-09-17-17.53.53.412376+480 I711226C464LEVEL: Severe
PID : 451068
TID: 1 PROC : db2agent (MYTESTDB) 0
INSTANCE: db2inst3
NODE : 000 DB : MYTESTDB
APPHDL: 0-48
APPID: *LOCAL.db2inst3.090917093706
MESSAGE : event monitor name
DATA #1 : Hexdump, 17 bytes
0x2005AA08 : 4442 3244 4554 4149 4C44 4541 444C 4F43DB2DETAILDEADLOC
0x2005AA18 : 4B
K
2009-09-17-17.53.53.412570+480 I711691C688LEVEL: Severe
PID : 451068
TID: 1 PROC : db2agent (MYTESTDB) 0
INSTANCE: db2inst3
NODE : 000 DB : MYTESTDB
APPHDL: 0-48
APPID: *LOCAL.db2inst3.090917093706
MESSAGE : event monitor target
DATA #1 : Hexdump, 68 bytes
0x2005AA30 : 2F68 6F6D 652F 6462 3269 6E73 7433 2F64/home/db2inst3/d
0x2005AA40 : 6232 696E 7374 332F 4E4F 4445 3030 3030b2inst3/NODE0000
0x2005AA50 : 2F53 514C 3030 3030 312F 6462 3265 7665/SQL00001/db2eve
0x2005AA60 : 6E74 2F64 6232 6465 7461 696C 6465 6164nt/db2detaildead
0x2005AA70 : 6C6F 636B
lock

本来想直接drop 再create,不过解决问题不能不求甚解,因此决定用db2trc看一下啥问题。
于是:
db2trc on -t -f db2trc.dump;db2 "set event monitor DB2DETAILDEADLOCK state 1";db2trc off
db2trc flw db2trc.dump db2trc.flw;
db2trc fmt db2trc.dump db2trc.fmt;
然后打开db2diag.log
直接拿sqm___sqlmeiot去db2trc.flw去找,找到如下几行:
sqm___sqlmeiot entry
1859| sqloGetEnvInternal entry
1860| sqloGetEnvInternal exit [rc = 0x870F0104 = -2029059836 = RC_ENV_NOT_FOUND]
1861| sqm___iot::que_init entry
1862| | sqloCSemCreate entry
1863| | sqloCSemCreate exit
1864| sqm___iot::que_init exit
1865| sqlogmblkEx entry
1866| sqlogmblkEx mbt [Marker:PD_OSS_ALLOCATED_MEMORY ]
1867| sqlogmblkEx exit
1868| sqm___sqm_evmon_ftarget entry
1869| | sqlogmblkEx entry
1870| | sqlogmblkEx mbt [Marker:PD_OSS_ALLOCATED_MEMORY ]
1871| | sqlogmblkEx exit
1872| | sqm___sqm_evmon_ctrlf entry
1873| | sqm___sqm_evmon_ctrlf data [probe 1]
1874| | | sqloopenp entry
1875| | | sqloopenp data [probe 30]
1876| | | sqloopenp exit [rc = 0x070F00CB = 118423755 = SQLO_FREV]
1877| | | sqloread entry
1878| | | sqloread exit
1879| | sqm___sqm_evmon_ctrlf data [probe 25]
1880| | | sqloclose entry
1881| | | sqloclose exit
1882| | sqm___sqm_evmon_ctrlf data [probe 30]
1883| | sqm___sqm_evmon_ctrlf exit [rc = 0x800D002C = -2146631636 = SQLM_RC_EVPATH]
1884| sqm___sqm_evmon_ftarget exit [rc = 0x800D002C = -2146631636 = SQLM_RC_EVPATH]
1885| sqlt_logerr_str entry
1886| sqlt_logerr_str data [probe 10]
1887| | pdLog entry
1888| | | _pdlogInt entry
1889| | | | sqltGetDiagPath entry
1890| | | | sqltGetDiagPath data [probe 10]
1891| | | | sqltGetDiagPath exit
1892| | | _pdlogInt data [probe 130]
1893| | | | sqltGetDiagPath entry
1894| | | | sqltGetDiagPath data [probe 10]
1895| | | | sqltGetDiagPath exit
1896| | | | ossPathExists entry
在这里看到了rc = 0x800D002C = -2146631636,跟diag.log一致,估计应该就是这里有问题了,
1874| | | sqloopenp entry
1875| | | sqloopenp data [probe 30]
1876| | | sqloopenp exit [rc = 0x070F00CB = 118423755 = SQLO_FREV]
看上去像打开什么文件出错,看看fmt文件里到底是什么文件?
fmt内容如下:
1874entry DB2 UDB oper system services sqloopenp cei (1.3.15.811.2)
pid 680346 tid 1 cpid -1 node 0 sec 0 nsec 99027637
bytes 113
Data1 (PD_TYPE_FILE_NAME,81) File name:
/home/db2inst3/db2inst3/NODE0000/SQL00001/db2event/db2detaildeadlock/db2event.ctl
Data2 (PD_TYPE_UINT,4) unsigned integer:
95
Data3 (PD_TYPE_UINT,4) unsigned integer:
416
1875data DB2 UDB oper system services sqloopenp cei (3.3.15.811.2.30)
pid 680346 tid 1 cpid -1 node 0 sec 0 nsec 99076076 probe 30
bytes 24
Data1 (PD_TYPE_HEXINT,4) Hex integer:
0x00401102
Data2 (PD_TYPE_HEXINT,4) Hex integer:
0x00000000
1876exit DB2 UDB oper system services sqloopenp cei (2.3.15.811.2)
pid 680346 tid 1 cpid -1 node 0 sec 0 nsec 99097253
rc = 0x070F00CB = 118423755 = SQLO_FREV
bytes 16
Data1 (PD_TYPE_SQO_FILE_HDL,8) File handle:
0000 0003 0000 0000
........
哦,原来是这个文件/home/db2inst3/db2inst3/NODE0000/SQL00001/db2event/db2detaildeadlock/db2event.ctl,
难道这个文件不存在?
ls -l /home/db2inst3/db2inst3/NODE0000/SQL00001/db2event/db2detaildeadlock/db2event.ctl看一下
-rw-r----- 1 db2inst3 staff 300 Aug 31 16:02 /home/db2inst3/db2inst3/NODE0000/SQL00001/db2event/db2detaildeadlock/db2event.ctl
文件好好的在这里啊,
怎么回事,
再看看错误返回值是 SQLO_FREV,这个东西不知道是啥意思,google了一下,有个链接里说是file aready exsits,不过不敢确定。
想起wang的很多例子,这个时候应该继续往下看,
1879| | sqm___sqm_evmon_ctrlf data [probe 25]
1880| | | sqloclose entry
1881| | | sqloclose exit
1882| | sqm___sqm_evmon_ctrlf data [probe 30]
1883| | sqm___sqm_evmon_ctrlf exit [rc = 0x800D002C = -2146631636 = SQLM_RC_EVPATH]
再去对应的fmt里看
1879data DB2 UDB database monitor sqm___sqm_evmon_ctrlf fnc (3.3.13.215.0.25)
pid 680346 tid 1 cpid -1 node 0 sec 0 nsec 99112551 probe 25
bytes 308
Data1 (PD_TYPE_DEFAULT,300) Hexdump:

0000 0000 4442 3244 4554 4149 4C44 4541....DB2DETAILDEA
444C 4F43 4B20 2020 4D59 5445 5354 4442DLOCK MYTESTDB
2020 2020 2020 2020 2020 2020 2F68 6F6D
/hom
652F 6462 3269 6E73 7432 2F64 6232 696Ee/db2inst2/db2in
7374 322F 4E4F 4445 3030 3030 2F53 514Cst2/NODE0000/SQL
3030 3030 312F 2020 2020 2020 2020 202000001/
2020 2020 2020 2020 2020 2020 2020 2020
2020 2020 2020 2020 2020 2020 2020 2020
2020 2020 2020 2020 2020 2020 2020 2020
2020 2020 2020 2020 2020 2020 2020 2020
2020 2020 2020 2020 2020 2020 2020 2020
2020 2020 2020 2020 2020 2020 2020 2020
2020 2020 2020 2020 2020 2020 2020 2020
2020 2020 2020 2020 2020 2020 2020 2020
2020 2020 2020 2020 2020 2020 2020 2020
2020 2020 2020 2020 2020 2020 2020 2020
2020 2020 2020 2020 2020 2020 2020 2020
2020 2020 2020 2020 2020 2020 2020 2020
2020 2020 2020 2020 2020 2020

1880entry DB2 UDB oper system services sqloclose cei (1.3.15.674.2)
pid 680346 tid 1 cpid -1 node 0 sec 0 nsec 99114186
bytes 16
Data1 (PD_TYPE_SQO_FILE_HDL,8) File handle:
0000 0003 0000 0000
........

1881exit DB2 UDB oper system services sqloclose cei (2.3.15.674.2)
pid 680346 tid 1 cpid -1 node 0 sec 0 nsec 99117110
rc = 0
1882data DB2 UDB database monitor sqm___sqm_evmon_ctrlf fnc (3.3.13.215.0.30)
pid 680346 tid 1 cpid -1 node 0 sec 0 nsec 99117388 probe 30
bytes 12
Data1 (PD_TYPE_DEFAULT,4) Hexdump:
800D 002C
...,

1883exit DB2 UDB database monitor sqm___sqm_evmon_ctrlf fnc (2.3.13.215.0)
1884exit DB2 UDB database monitor sqm___sqm_evmon_ftarget fnc (2.3.13.79.0)
pid 680346 tid 1 cpid -1 node 0 sec 0 nsec 99118476
rc = 0x800D002C = -2146631636 = SQLM_RC_EVPATH
终于发现了点有用的东西
....DB2DETAILDEADLOCK MYTESTDB
/home/db2inst2/db2inst2/NODE0000/SQL00001/
db2去db2inst2里干嘛?再看看错误信息
The target path of the specified event monitor is in use by another
event monitor,错误函数 sqm___sqm_evmon_ftarget,错误码 SQLO_FREV
难道是想从db2inst2把db2event.ctl拷到db2inst3来,而文件又已经存在了?
不管,试一下,
mv /home/db2inst3/db2inst3/NODE0000/SQL00001/db2event/db2detaildeadlock/db2event.ctlmv /home/db2inst3/db2inst3/NODE0000/SQL00001/db2event/db2detaildeadlock/db2event.ctl.bak
再执行一下
db2 set event monitor DB2DETAILDEADLOCK state 1
果然成功了。
带来问题,
1、SQLO_FREV可能就是file aready exsits的意思。
2、不知道为什么db2要去原来的instance里拷db2event.ctl文件。
这样的话,如果relocate到另一台机器上去,原先create的event岂不是都不能用了?
另外特别注明,本人非专业的ibm二线支持人员,上述过程中有些东东都是
自己猜的,正确性有待确认,所以大家也就看个意思而已,不必当真,哈哈,哈哈。。。。。。。。




db2trc.rar(71.57 KB, 下载次数: 57)2009-9-17 18:50 上传点击文件名下载附件

回复

使用道具 举报

千问 | 2009-7-22 09:30:00 | 显示全部楼层
怎么没人顶啊,期待高手给个 明确的解释啊,o(∩_∩)o...
回复

使用道具 举报

千问 | 2009-7-22 09:30:00 | 显示全部楼层
good job


one comment:
1858sqm___sqlmeiot entry
1859| sqloGetEnvInternal entry
1860| sqloGetEnvInternal exit [rc = 0x870F0104 = -2029059836 = RC_ENV_NOT_FOUND]
1861| sqm___iot::que_init entry
1862| | sqloCSemCreate entry
1863| | sqloCSemCreate exit
1864| sqm___iot::que_init exit
1865| sqlogmblkEx entry
1866| sqlogmblkEx mbt [Marker:PD_OSS_ALLOCATED_MEMORY ]
1867| sqlogmblkEx exit
1868| sqm___sqm_evmon_ftarget entry
1869| | sqlogmblkEx entry
1870| | sqlogmblkEx mbt [Marker:PD_OSS_ALLOCATED_MEMORY ]
1871| | sqlogmblkEx exit
1872| | sqm___sqm_evmon_ctrlf entry
1873| | sqm___sqm_evmon_ctrlf data [probe 1]---> used to open the current control file
1874| | | sqloopenp entry -----------------> try to open the file
1875| | | sqloopenp data [probe 30]
1876| | | sqloopenp exit [rc = 0x070F00CB = 118423755 = SQLO_FREV]---------> file already exist
1877| | | sqloread entry--------------> read the file
1878| | | sqloread exit
1879| | sqm___sqm_evmon_ctrlf data [probe 25]-----------> show the information being read
1880| | | sqloclose entry ------------> close the file
1881| | | sqloclose exit
1882| | sqm___sqm_evmon_ctrlf data [probe 30] -----------> if data path in the control file not match the current db path, something wrong
1883| | sqm___sqm_evmon_ctrlf exit [rc = 0x800D002C = -2146631636 = SQLM_RC_EVPATH]
1884| sqm___sqm_evmon_ftarget exit [rc = 0x800D002C = -2146631636 = SQLM_RC_EVPATH]
So the problem is that after doing db2relocatedb, the database path in the control file was not changed, and still point to inst2
that means, when db2 trying to open the deadlock control file it still found the path pointing to inst2, so it confused and didn't allow monitor to start-up...
anyway, still a good try

congratz
回复

使用道具 举报

千问 | 2009-7-22 09:30:00 | 显示全部楼层
one of the easy fix (if you don't want to delete the file) is to change db2inst2 to db2inst3 in the file, then it should work


回复

使用道具 举报

千问 | 2009-7-22 09:30:00 | 显示全部楼层
could be due to apar IY68460
http://www-01.ibm.com/support/do ... s=utf-8&lang=en
回复

使用道具 举报

千问 | 2009-7-22 09:30:00 | 显示全部楼层
一直不明白,这些东西 wang是这么看懂的:
1858sqm___sqlmeiot entry
1859| sqloGetEnvInternal entry
1860| sqloGetEnvInternal exit [rc = 0x870F0104 = -2029059836 = RC_ENV_NOT_FOUND]
1861| sqm___iot::que_init entry
1862| | sqloCSemCreate entry
1863| | sqloCSemCreate exit
1864| sqm___iot::que_init exit
1865| sqlogmblkEx entry
1866| sqlogmblkEx mbt [Marker:PD_OSS_ALLOCATED_MEMORY ]
1867| sqlogmblkEx exit
1868| sqm___sqm_evmon_ftarget entry
1869| | sqlogmblkEx entry
1870| | sqlogmblkEx mbt [Marker:PD_OSS_ALLOCATED_MEMORY ]
1871| | sqlogmblkEx exit
1872| | sqm___sqm_evmon_ctrlf entry
1873| | sqm___sqm_evmon_ctrlf data [probe 1]---> used to open the current control file
1874| | | sqloopenp entry -----------------> try to open the file
1875| | | sqloopenp data [probe 30]
1876| | | sqloopenp exit [rc = 0x070F00CB = 118423755 = SQLO_FREV]---------> file already exist
1877| | | sqloread entry--------------> read the file
1878| | | sqloread exit
1879| | sqm___sqm_evmon_ctrlf data [probe 25]-----------> show the information being read
1880| | | sqloclose entry ------------> close the file
1881| | | sqloclose exit
1882| | sqm___sqm_evmon_ctrlf data [probe 30] -----------> if data path in the control file not match the current db path, something wrong
1883| | sqm___sqm_evmon_ctrlf exit [rc = 0x800D002C = -2146631636 = SQLM_RC_EVPATH]
1884| sqm___sqm_evmon_ftarget exit [rc = 0x800D002C = -2146631636 = SQLM_RC_EVPATH]

哪里有文档进行说明吗?
回复

使用道具 举报

千问 | 2009-7-22 09:30:00 | 显示全部楼层
you'll also understand after doing thousands of PMRs, or worked with more than 100 cases that need to read trace

no one understand the crap at beginning


回复

使用道具 举报

千问 | 2009-7-22 09:30:00 | 显示全部楼层
赞,又见mdkii~~
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

主题

0

回帖

4882万

积分

论坛元老

Rank: 8Rank: 8

积分
48824836
热门排行