关于dba_segments纪录数据来源以及大型分区表存储查询优化的讨论

显示全部楼层 · 2015-5-25 17:37:25

本帖最后由 331972515 于 2015-7-22 11:38 编辑
背景：
昨日，研发兄弟说他们遇到一张106g的分区表查询，查询返回结果十分缓慢，希望我帮忙看一下。
看到的情况为该表按每日日期进行分区，从dba_segments中查看改表大小约为106.4g。分别检查了
pga，磁盘io能力，sql执行计划，以及程序访问发起访问查询的方式，均不存在问题；当时一时束手无策。
当分组查询dba_segments试图该表时，发现分区并不连贯；经与研发兄弟了解得知，分区表中部分
分区被他们手动已删除（据说是存在大量垃圾数据），导致该表分区不连贯。瞬间发觉可能是分区不连贯
查询过程中，导致行迁移块迁移情况很严重导致。随后，进行expdp导出后打算drop purge后重新导入，完成
表中行与快的重新码放整理。在expdp导出过程中发现没有被删除的分区，平均每个分区仅有几百兆，最后导出
完成后，发现该表实际大小仅为4.5g，现在将原表删除后重新导入，重建索引，问题立刻解决。
以此情况有两处不明，一处请教；
不明
1. dba_segments视图本为oracle记录库中对象大小等详细记录信息为何会不准，而且该视图数据应来源于后台
进程定时收集统计更新，是smon 还是什么？
2.分区表的分区单独被删除导致行迁移，块的码放不紧凑造成碎片过多导致查询受影响，其中更加详细的说明，往前辈们指点一二。
请教
1.对于上百g的分区表，平均每个分区数据量大小超过7g的分区表，查询有哪些优化的办法呢

千问 · 2015-5-25 17:37:25

本帖最后由 oyhuan 于 2015-7-22 11:44 编辑
dba_segments 只是统计占用了多少个BLOCK，包括有数据和无数据

千问 · 2015-5-25 17:37:25

oyhuan 发表于 2015-7-22 11:44
dba_segments 只是统计占用了多少个BLOCK，包括有数据和无数据
我计算的是视图的bytes字段

千问 · 2015-5-25 17:37:25

高水位问题，delete 数据并不会释放空间，全表扫描依然会扫描所有 blocks

千问 · 2015-5-25 17:37:25

jiqing1004 发表于 2015-7-22 11:48
高水位问题，delete 数据并不会释放空间，全表扫描依然会扫描所有 blocks
我的确也是考虑到是高水位的问题，只不过dba_segments视图里面的信息不准确挺好奇的

千问 · 2015-5-25 17:37:25

Because objects in a database can change constantly, you must update statistics regularly so that they accurately describe these objects. Oracle Database automatically maintains optimizer statistics.
Oracle recommends that you enable automatic optimizer statistics collection. In this case, the database automatically collects optimizer statistics for tables with absent or stale statistics. If fresh statistics are required for a table, then the database collects them both for the table and associated indexes.
Automatic collection eliminates many manual tasks associated with managing the optimizer. It also significantly reduces the risks of generating poor execution plans because of missing or stale statistics.
Automatic optimizer statistics collection calls the DBMS_STATS.GATHER_DATABASE_STATS_JOB_PROC procedure. This internal procedure operates similarly to the DBMS_STATS.GATHER_DATABASE_STATS procedure using the GATHER AUTO option. The main difference is that GATHER_DATABASE_STATS_JOB_PROC prioritizes database objects that require statistics, so that objects that most need updated statistics are processed first, before the maintenance window closes.

千问 · 2015-5-25 17:37:25

1.分区drop,truncate才会降低高水位，delete分区数据就别用了。
2.本地分区索引合理，保证执行计划最优，基本就OK了。这么大的表不算大。

千问 · 2015-5-25 17:37:25

既然是job的一种，那有可能是Job Queue Processes (CJQ0 and Jnnn)这个后台进程

千问 · 2015-5-25 17:37:25

jieyancai 发表于 2015-7-22 11:56
1.分区drop,truncate才会降低高水位，delete分区数据就别用了。
2.本地分区索引合理，保证执行计划最优，基 ...
没想到高水位会影响这么大，研发兄弟们估计删除单个分区的时候也没想到。
索引肯定都是以local的方式创建的，计划也没问题，但是查询速度总是不尽人意，
如果从raid5换成asm速度会有几倍的提升么

千问 · 2015-5-25 17:37:25

另外，从优化器记录的信息看：
Optimizer statistics include the following:
Table statistics
Number of rows
Number of blocks
Average row length
Column statistics
Number of distinct values (NDV) in column
Number of nulls in column
Data distribution (histogram)
Extended statistics
Index statistics
Number of leaf blocks
Levels
Clustering factor
System statistics
I/O performance and utilization
CPU performance and utilization
你可以核对这些相关的信息，只要正确收集了，应该就是准的。
关键还在于语句以及对应的执行计划。