某农村信用社-db2 数据库问题解决分析报告

显示全部楼层 · 2007-9-26 18:42:10

关于等待超时的问题
原因分析：
一：从数据库diag.log抛出来的错误原因。怀疑跟runstats自动分析有关系。由于runstat在日志分析非常频繁。分析过程多表需要加锁。由于跟踪信息只有 diag.log所以具体怎么锁的无法跟踪。
二：在日志中发现数据库page刷新厉害。所以同时也扩大16k page buffer_pool 为8g. 新增加了4k page 为2g.
现在已经全部关系自动分析。
Automatic maintenance
(AUTO_MAINT) = OFF
Automatic database backup
(AUTO_DB_BACKUP) = OFF
Automatic table maintenance
(AUTO_TBL_MAINT) = OFF
Automatic runstats
(AUTO_RUNSTATS) = OFF
Automatic statistics profiling(AUTO_STATS_PROF) = OFF
Automatic profile updates (AUTO_PROF_UPD) = OFF
Automatic reorganization
(AUTO_REORG) = OFF
更新语句：
db2 update db cfgusing AUTO_MAINToff
db2 update db cfgusing AUTO_RUNSTATS off
db2 update db cfgusingAUTO_TBL_MAINT off

问题现在已经基本正常。需要长期观察。

谭付发

2008-10-22

归档日志产生过快问题
操作系统 aix（版本不详）数据库版本：db2 9.1
现象：数据库每一到两分钟产生一个100m的日志，归档日志目录大小60g. (60/0.1)=600. 60/60=10，也就是10个小时就能把归档日志目录耗尽。
分析跟踪：
db2pd-dynamic db2pd –transaction 查看数据库事物及sql语句。发现三个个插入语句非常异常。语句一直在运行。
解决过程：
把信息提供给开发人员。发现程序中一个批处理有问题。出现死循环。修改程序，杀掉死循环进程。问题解决。

谭付发

tanfufa 2008-10-24

关于sql0918 错误分析
sql0918 原因综合分析：事物中一部分因为异常（网络断开，重起，杀死进程等非法操作已经回滚）正在运行的本地事物完成后不能再正常提交，必须回滚才能保证事物的一致性。在分布式事物的环境中容易发生的这样的错误。
一:在tuxedo应用上：
结合实质生产环境。个人认为：数据库事物（如插入修改的数据）因为异常操作（如数据库重起，非正常杀死数据库进程，等待超时等），数据库事物已经自动回滚数据库相关数据。但是同步的tuxedo应用事物却没有回滚。一但提交同步应用tuxedo事物。为了保证全部事物的完整性,tuxedo事物已经不能再正常commit提交了.所以必须回滚。出现这样情况应该有异常操作处理机制。如果提交不成应该回滚。这种错误就可以避免。

以下是在分布式数据库的现象：
二：在分布式数据库的错误。先看看下面的一段错误分析：
“Using advanced functions: An Order Entry application”, particularly with the Insert Order Detail program documented in B.1.3, “Program flow for the Insert Order Detail program” on page 424. This program (INSDET) calls a stored procedure (STORID) on a remote
AS/400 system. The stored procedure updates a STOCK table on the remote
system. Then, the calling program inserts an order detail record in a ORDERDTL
table on the local system. After doing this, the Distributed Unit of Work (DUW) is
completed.
Note: ROCHESTER is the local system (AR). ZURICH is the remote system (AS).
In this test scenario, the stored procedure program on the remote system was
abruptly terminated, cancelling the job before the database changes on both
systems were committed.
In this case, the remote system (application server) rolled back the one database
change automatically and provided information in the job log. At the application
requester, information provided in the program ended, but not before rolling back
the local database change, which was a record insert. The rollback operation is
needed since the calling program received SQL error return code -918, which
corresponds to message SQL0918. The details are shown in Figure 84.
Figure 84. Message SQL0918
The job log of the remote system (ZURICH) reported the following information:
...............
CPI9152 Information Target DDM job started by source system.
CPI3E01 Information Local relational database accessed by ROCHESTER.
CPC1125 Completion Job ../ITSCID06/ROCHESTER was ended by user ITSCID03.
CPD83DD Diagnostic Conversation terminated; reason 02.
02 -- The conversation was issued a Deallocate Type
(Abend) to force the remote location to roll back.
CPF4059 Diagnostic System abnormally ended the transaction with device ROCHESTER.
CPI8369 Information 1 pending changes rolled back; reason 01.
01 -- The commitment definition is in a state of Reset.
CPF83E4 Diagnostic Commitment control ended with resources not committed.
...............
Display Formatted Message Text
System: ROCHESTER
Message ID . . . . . . . . . : SQL0918
Message file . . . . . . . . : QSQLMSG
Library . . . . . . . . . : QSYS
Message . . . . : ROLLBACK is required.
Cause . . . . . : The activation group requires a ROLLBACK to be performed
prior to running any other SQL statements.
Recovery . . . : Issue a ROLLBACK CL command or an SQL ROLLBACK
这里就不在做翻译。只简单解析一下：
一个存储过程更新一个远程的表STOCK。和一个本地表ORDERDT，事情完成后。都没有提交。远程存储过程被强制中断后远程数据库事物被自动回滚了。本地数据库就不能在被正常提交。如果提交。将出现SQL0918 错误。只能正常回滚。才能保证事物的一致性。

谭付发

2008-10-22

Db2 故障诊断常用方法及工具
常用的诊断工具介绍使用
常用工具之一：db2diag
说明：供数据库和系统管理员使用的主日志文件为管理通知日志。Db2diag.log 文件可供db2支持用于进行故障诊断。
Db2diag 工具用于对db2diag.log 中提供的大量的信息进行过滤和格式。
1．
按照数据库名过滤 db2diag.log
例如：db2diag –g db=NGCBSDB
2.按照进程标识过滤 db2diag.log
db2diag -g level=Severe,pid=1024
3.格式化db2diag.log 工具输出
例如：查找2008年10月10号以后的所有非严重和严重错误信息记录。
db2diag -time 2008-10-21 -node "0,1,2" -level "Severe, Error"
|db2diag -fmt "Time: %{ts}
分区:%node Message Level:%{level} \nPid:%{pid} Tid:%{tid}
实例:%{instance}\nMessage: @{msg}\n" >testfmt.txt
4．如果想知道更多请：
db2diag -h examples
db2diag -h all
db2diag -h tutorial
db2diag.log日志记录实例和数据库的信息更改错误信息。如：内存不够，表空间不够。死锁，数据库异常正常关闭等情况。跟应用相关的错误则不会记录。
常用工具之二：db2pd
做db2管理很多人都喜欢db2pd 这个工具。这个有许多优点。功能强大。不占系统资源。在实时诊断问题上有很大的作用。db2pd 是用于监视各种 DB2 数据库活动以及故障排除的实用程序。它是从 DB2 V8.2 开始随 DB2 引擎发布的一个独立的实用程序，其外观和功能类似于 Informix onstat 实用程序。db2pd 是从命令行以一种可选的交互模式执行的。该实用程序运行得非常快，因为它不需要获取任何锁，并且在引擎资源以外运行
1.
检查锁等待情形
db2pd –db NGCBSDB–locks wait showlocks
db2pd -locks 输出的 status 列（Sts）显示一个 “G”。G 代表 “granted”，意即事务句柄为 2 的事务拥有行锁。此外，列 Mode 表明，事务 2 持有的是一个 X 锁。等待的事务（列 Sts 中显示 “W”（“wait”）的事务）是句柄为 6 的事务。该事务正在与事务 2 请求同一个行上的 X 锁。通过查看 Owner 列（显示事务 2 是锁的所有者）和比较 Lockname（对于 db2pd -locks 中的两个条目是相同的），可以看到这一点。
2．将事务句柄映射到应用程序
db2pd –db NGCBSDB –applications show detail
3．标识一个表的插入行数。
db2pd –db NGCBSDB -tcbstats
4．监视数据库实例对的内存使用。
db2pd –db NGCBSDB –memblock
5．监视表空间使用情况
db2pd –db NGCBSDB –tablespaces
6.监视sql语句。
db2pd –db NGCBSDB –dyn
7．监视事物
db2pd –db NGCBSDB –transactions
8.获得关于应用程序和相应代理的信息
db2pd –db NGCBSDB –agents
常用工具三：db2top (非db2数据库自带的工具。需要下载)。
这个工具已经在开发数据库机器（36.0.15.226）上安装了。在目录下：/cboddv/db2top-2.0-bin/aix_64
可以把这个运行程序拷贝到db2 的bin目录下。由于没有权限无法操作。这个是可以执行程序。不需要安装。
使用方法：
db2 top –d dbname

U命令显示锁信息。

具体使用见使用说明书。使用说明在安装包中。（开发数据库目录为/cboddv/db2top-2.0-bin）
常用工具四 db2mtrk 监控内存使用情况
db2mtrk
您已经了解了一些工具，可用于检查数据库在特定时间点的状态(快照监控器)、在特定事件或事务发生时收集数据(事件监控器)、查看为查询生成的数据访问计划(explain)。如果您正尝试在数据库环境中定位一个问题，则另外还有两种工具需要注意。第一种工具称为 db2mtrk 实用工具。
　　db2mtrk 实用工具设计用于为实例、数据库和代理提供一份完整的内存状态报告。在执行时，db2mtrk 命令将生成以下关于内存池分配的信息：当前大小最大限度的大小(硬性限制)最大大小(高水位标)类型(表示内存池将使用的函数的标识符)分配池的代理(若内存池为专用)
　参数说明。
-i 显示实例级内存
-d 显示数据库级内存
-p 显示专用内存
-m 显示各内存池的最大值
-w 显示各内存池的高水位标值
-r 重复模式
-v 详细输出
-h 显示帮助信息
　　如果您希望获取实例级、数据库级和专用内存池分配信息，可执行 db2mtrk 命令，方法如下：
举例：　　
db2mtrk -i -d -p
在执行此命令时，您可能会看到如下所示的输出。
db2mtrk 实用工具生成的输出示例
NGCSB_P570ZTA:db2inst1:/ngcbsdb2log/db2inst1/NGCBSDB/NODE0000/C0000001> db2mtrk -i -d -v
Tracking Memory on: 2008/10/22 at 13:22:03
Memory for instance
Database Monitor Heap is of size 851968 bytes
Other Memory is of size 6291456 bytes
Total: 7143424 bytes
Memory for database: TWS
Backup/Restore/Util Heap is of size 65536 bytes
Package Cache is of size 4063232 bytes
Catalog Cache Heap is of size 524288 bytes
Buffer Pool Heap (3) is of size 8454144 bytes
Buffer Pool Heap (2) is of size 8519680 bytes
Buffer Pool Heap (1) is of size 4718592 bytes
Buffer Pool Heap (System 32k buffer pool) is of size 720896 bytes
Buffer Pool Heap (System 16k buffer pool) is of size 458752 bytes
Buffer Pool Heap (System 8k buffer pool) is of size 327680 bytes
Buffer Pool Heap (System 4k buffer pool) is of size 262144 bytes
Shared Sort Heap is of size 1310720 bytes
Lock Manager Heap is of size 18481152 bytes
Database Heap is of size 5308416 bytes
Other Memory is of size 196608 bytes
Total: 53411840 bytes
Memory for database: NGCBSDB
Backup/Restore/Util Heap is of size 65536 bytes
Package Cache is of size 51445760 bytes
Catalog Cache Heap is of size 1638400 bytes
Buffer Pool Heap (3) is of size 853213184 bytes
Buffer Pool Heap (2) is of size 8276934656 bytes
Buffer Pool Heap (1) is of size 1307443200 bytes
Buffer Pool Heap (System 32k buffer pool) is of size 720896 bytes
Buffer Pool Heap (System 16k buffer pool) is of size 458752 bytes
Buffer Pool Heap (System 8k buffer pool) is of size 327680 bytes
Buffer Pool Heap (System 4k buffer pool) is of size 262144 bytes
Shared Sort Heap is of size 655360 bytes
Lock Manager Heap is of size 220725248 bytes
Database Heap is of size 10813440 bytes
Other Memory is of size 196608 bytes
常用工具五：snapshot 功能跟db2pd基本差不多。
1．
打开开关

db2 get monitor swithes ;
db2 update db cfg monitor switches using bufferpool on sort on lock on statement on uow table on;
解析： bufferpool：高速缓存。Sort：排序临时段。 Lock ：锁情况。 Statement ：sql语句。
2．
获取快照。如获取锁信息。
Db2 get snapshot for lock on dbname >lock.txt
3．
查看lock.txt.进行分析。
4．

其他：db2trc .db2support .等工具也常用。可以去了解。

由于水平有限就写到这里。谢谢。

谭付发

2008-10-22

千问 · 2007-9-26 18:42:10

db2top是个好东西..监控db2非常好

千问 · 2007-9-26 18:42:10

db2top再v8fp17里面自带了
支持原创，看到好像文章里面包含了若干个案例，能不能针对其中某一个进行详细阐述呀？包括问题的现象，运行环境，刚开始怎样narrow down，在narrow down得时候遇到什么问题，有什么support document，最后的结论是什么之类的：）可以对以后遇到类似问题的朋友有所帮助

千问 · 2007-9-26 18:42:10

LZ辛苦了
谢谢分享!

千问 · 2007-9-26 18:42:10

有空一定仔细写写.且忙且懒..呵呵.

千问 · 2007-9-26 18:42:10

顶

千问 · 2007-9-26 18:42:10

支持原创：）

千问 · 2007-9-26 18:42:10

Up～

千问 · 2007-9-26 18:42:10

另外，问大家一个事，
v8fp17 是不是 V8 最后一个补丁了？
以后不会再出新的V8 的补丁了

千问 · 2007-9-26 18:42:10

不错啊学习了，谢谢楼主分享啊