关于直方图

显示全部楼层 · 2013-3-27 11:17:11

请问各位大侠
oracle 在何时会去查看直方图信息，收集直方图也许也是代价比较高的操作，书上也说 ORACLE 不会太多的使用直方图，但是我想问的是ORACLE何时才会去
查看直方图信息，来获取合适的执行计划。。
谢谢各位！！！

千问 · 2013-3-27 11:17:11

不是一般数据倾斜比较严重的时候的么

千问 · 2013-3-27 11:17:11

ORACLE 如何判断数据是否倾斜？好像分析直方图是手动收集的吧。。

千问 · 2013-3-27 11:17:11

另外 CURSOR_SHARING 参数和直方图的关系是啥？还有所谓的绑定变量窥视是啥？

千问 · 2013-3-27 11:17:11

我只知道比如在计算card（基数）的情况下如果存在收集的 histogram oracle将用histogram收集信息决定选择率
比如select * from t where id=1
这个card =num_rows*1/NUM_DISTINCT(user_tab_col_statistics

如果列上有histogram那么将使用user_tab_col_statistics.DENSITY
不过也有使用histogram bucket的情况，下面是使用bucket情况早期的实验记录
create table t3(a int)
SQL> ed
已写入 file afiedt.buf
1declare
2begin
3for i in 1..1000 loop
4insert into t3 values(1);
5end loop;
6for i in 2..9001 loop
7insert into t3 values(i);
8end loop;
9commit;
10* end;
11/
PL/SQL 过程已成功完成。
SQL> execute dbms_stats.gather_table_stats('SYS','T3');
PL/SQL 过程已成功完成。
SQL> select count(*) from t3;
COUNT(*)
----------
10000
SQL> select num_rows from user_tables where table_name='T3';
NUM_ROWS
----------
10000
SQL> select table_name,density,num_nulls,num_distinct from user_tab_col_statisti
cs where table_name='T3';
TABLE_NAME
DENSITYNUM_NULLS NUM_DISTINCT
------------------------------ ---------- ---------- ------------
T3
.000111099
0 9001
SQL> select count(*) from t3 where a=1;
COUNT(*)
----------
1000
从上面信息可以看到 ~~列A 有10000ROWS其中A=1的 1000 ROWS， A=2 到9001 的有9000 ROW
这样NUM_DISTINCT=9001

SQL> set autotrace traceonly explain
SQL> alter session set events '10053 trace name context forever';
会话已更改。
SQL> select count(*) from t3 where a=1;

执行计划
----------------------------------------------------------
Plan hash value: 463314188
---------------------------------------------------------------------------
| Id| Operation
| Name | Rows| Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT || 1 | 3 | 7 (0)| 00:00:01 |
| 1 |SORT AGGREGATE|| 1 | 3 |
|
|
|*2 | TABLE ACCESS FULL| T3 | 1 | 3 | 7 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("A"=1)
可以看到CARD 非常不准确还是按公式 num_rows*1/num_distinct算的
SQL> select 10000*1/9001 from dual;
10000*1/9001
------------
1.11098767
***********************
Table Stats::
Table: T3Alias: T3
#Rows: 10000#Blks:23AvgRowLen:3.00
***************************************
SINGLE TABLE ACCESS PATH
Column (#1): A(NUMBER)
AvgLen: 4.00 NDV: 9001 Nulls: 0 Density: 1.1110e-004 Min: 1 Max: 9001
Table: T3Alias: T3
Card: Original: 10000Rounded: 1Computed: 1.11Non Adjusted: 1.11
Access Path: TableScan
Cost:7.14Resp: 7.14Degree: 0
Cost_io: 7.00Cost_cpu: 2163793
Resp_io: 7.00Resp_cpu: 2163793
Best:: AccessPath: TableScan
Cost: 7.14Degree: 1Resp: 7.14Card: 1.11Bytes: 0~~~~~~~~~~~~~~~~~~看到trace 中card =1.11 ，实际应该为1000
我们可以用HISTOGREAM来解决
1begin
2 dbms_stats.gather_table_stats(
3
'SYS',
4
't3',
5
cascade => true,
6
estimate_percent => null,
7
method_opt => 'for all columns size 120'
8 );
9* end;
SQL> /
PL/SQL 过程已成功完成。

使用HISTOGRAM收集统计信息后 ORACLE 将使用DENSITY 来计算

SQL> select table_name,density,num_nulls,num_distinct from user_tab_col_statisti
cs where table_name='T3';
TABLE_NAME
DENSITYNUM_NULLS NUM_DISTINCT
------------------------------ ---------- ---------- ------------
T3
.0001
0 9001

SQL> set autotrace traceonly explain
SQL> alter session set events '10053 trace name context forever';
会话已更改。
SQL> select count(*) from t3 where a=1;
执行计划
----------------------------------------------------------
Plan hash value: 463314188
---------------------------------------------------------------------------
| Id| Operation
| Name | Rows| Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT || 1 | 3 | 7 (0)| 00:00:01 |
| 1 |SORT AGGREGATE|| 1 | 3 |
|
|
|*2 | TABLE ACCESS FULL| T3 | 917 |2751 | 7 (0)| 00:00:01 |~~~~~~~~~~可以看到CARD接近了=917
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("A"=1)
SQL> select count(*) from t3 where a=2;
执行计划
----------------------------------------------------------
Plan hash value: 463314188
---------------------------------------------------------------------------
| Id| Operation
| Name | Rows| Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT || 1 | 3 | 7 (0)| 00:00:01 |
| 1 |SORT AGGREGATE|| 1 | 3 |
|
|
|*2 | TABLE ACCESS FULL| T3 | 1 | 3 | 7 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("A"=2)
SQL>
上面可以看出 A=1时候CARD=917 ORACLE 并没有使用density来计算， A=2时候用的DENSITY计算的 card=10000*0.0001=1
针对a=1 oracle 使用的 histogram bucket来计算的
SQL> select endpoint_number,endpoint_value from dba_histograms where table_name=
'T3' and rownum=列distinct时候为频率，另一种高度均衡
HISTOGRAM
当列DISTINCT比 histogram bucket多时为高度均衡每个bucket存相同数量的值）
此例为高度均衡可以看出来 1-11 BUCKET 存值1，其实第12个BUCKET中也存值1，因为12-13中可以看出每个BUCKET 存 84个值(93-9)（此例中除值1外其他都唯一，所以很好看
出来），而第12个 BUCKET 最高存的值为9这样表示第12个BUCKET还存了值1（应该是76 =84-8 ，8是2到9的个数） ,oracle 发现值1 是一个高频率出现的值（跨越的多个
桶），oracle将采用BUCKET计算11/120（跨越bucket数/总bucket数）=.091666667 从这里看到了虽然BUCKET12中也有值1 但ORACLE没算进来（因为该bucket中还有其它值2-8）用
11/120 这也就造成了CARD只是接近而不是准确的1000
看下trace

SINGLE TABLE ACCESS PATH
Column (#1): A(NUMBER)
AvgLen: 4.00 NDV: 9001 Nulls: 0 Density: 1.0000e-004 Min: 1 Max: 9001
Histogram: HtBal#Bkts: 120UncompBkts: 120EndPtVals: 110
Table: T3Alias: T3
Card: Original: 10000Rounded: 917Computed: 916.67Non Adjusted: 916.67
Access Path: TableScan
Cost:7.14Resp: 7.14Degree: 0
Cost_io: 7.00Cost_cpu: 2163793
Resp_io: 7.00Resp_cpu: 2163793
Best:: AccessPath: TableScan
Cost: 7.14Degree: 1Resp: 7.14Card: 916.67Bytes: 0 ************ Card: 916.67

千问 · 2013-3-27 11:17:11

CURSOR_SHARING 是 optimizer针对 bind的看法，其中有一个值是根据是否有histogram来算

千问 · 2013-3-27 11:17:11

原帖由 fufuh2o 于 2009-10-11 23:22 发表
我只知道比如在计算card（基数）的情况下如果存在收集的 histogram oracle将用histogram收集信息决定选择率
比如select * from t where id=1
这个card =num_rows*1/NUM_DISTINCT(user_tab_col_statistics

如果列上有histogram那么将使用user_tab_col_statistics.DENSITY
不过也有使用histogram bucket的情况，下面是使用bucket情况早期的实验记录
create table t3(a int)
SQL> ed
已写入 file afiedt.buf
1declare
2begin
3for i in 1..1000 loop
4insert into t3 values(1);
5end loop;
6for i in 2..9001 loop
7insert into t3 values(i);
8end loop;
9commit;
10* end;
11/
PL/SQL 过程已成功完成。
SQL> execute dbms_stats.gather_table_stats('SYS','T3');
PL/SQL 过程已成功完成。
SQL> select count(*) from t3;
COUNT(*)
----------
10000
SQL> select num_rows from user_tables where table_name='T3';
NUM_ROWS
----------
10000
SQL> select table_name,density,num_nulls,num_distinct from user_tab_col_statisti
cs where table_name='T3';
TABLE_NAME
DENSITYNUM_NULLS NUM_DISTINCT
------------------------------ ---------- ---------- ------------
T3
.000111099
0 9001
SQL> select count(*) from t3 where a=1;
COUNT(*)
----------
1000
从上面信息可以看到 ~~列A 有10000ROWS其中A=1的 1000 ROWS， A=2 到9001 的有9000 ROW
这样NUM_DISTINCT=9001

SQL> set autotrace traceonly explain
SQL> alter session set events '10053 trace name context forever';
会话已更改。
SQL> select count(*) from t3 where a=1;

执行计划
----------------------------------------------------------
Plan hash value: 463314188
---------------------------------------------------------------------------
| Id| Operation
| Name | Rows| Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT || 1 | 3 | 7 (0)| 00:00:01 |
| 1 |SORT AGGREGATE|| 1 | 3 |
|
|
|*2 | TABLE ACCESS FULL| T3 | 1 | 3 | 7 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("A"=1)
可以看到CARD 非常不准确还是按公式 num_rows*1/num_distinct算的
SQL> select 10000*1/9001 from dual;
10000*1/9001
------------
1.11098767
***********************
Table Stats::
Table: T3Alias: T3
#Rows: 10000#Blks:23AvgRowLen:3.00
***************************************
SINGLE TABLE ACCESS PATH
Column (#1): A(NUMBER)
AvgLen: 4.00 NDV: 9001 Nulls: 0 Density: 1.1110e-004 Min: 1 Max: 9001
Table: T3Alias: T3
Card: Original: 10000Rounded: 1Computed: 1.11Non Adjusted: 1.11
Access Path: TableScan
Cost:7.14Resp: 7.14Degree: 0
Cost_io: 7.00Cost_cpu: 2163793
Resp_io: 7.00Resp_cpu: 2163793
Best:: AccessPath: TableScan
Cost: 7.14Degree: 1Resp: 7.14Card: 1.11Bytes: 0~~~~~~~~~~~~~~~~~~看到trace 中card =1.11 ，实际应该为1000
我们可以用HISTOGREAM来解决
1begin
2 dbms_stats.gather_table_stats(
3
'SYS',
4
't3',
5
cascade => true,
6
estimate_percent => null,
7
method_opt => 'for all columns size 120'
8 );
9* end;
SQL> /
PL/SQL 过程已成功完成。

使用HISTOGRAM收集统计信息后 ORACLE 将使用DENSITY 来计算

SQL> select table_name,density,num_nulls,num_distinct from user_tab_col_statisti
cs where table_name='T3';
TABLE_NAME
DENSITYNUM_NULLS NUM_DISTINCT
------------------------------ ---------- ---------- ------------
T3
.0001
0 9001

SQL> set autotrace traceonly explain
SQL> alter session set events '10053 trace name context forever';
会话已更改。
SQL> select count(*) from t3 where a=1;
执行计划
----------------------------------------------------------
Plan hash value: 463314188
---------------------------------------------------------------------------
| Id| Operation
| Name | Rows| Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT || 1 | 3 | 7 (0)| 00:00:01 |
| 1 |SORT AGGREGATE|| 1 | 3 |
|
|
|*2 | TABLE ACCESS FULL| T3 | 917 |2751 | 7 (0)| 00:00:01 |~~~~~~~~~~可以看到CARD接近了=917
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("A"=1)
SQL> select count(*) from t3 where a=2;
执行计划
----------------------------------------------------------
Plan hash value: 463314188
---------------------------------------------------------------------------
| Id| Operation
| Name | Rows| Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT || 1 | 3 | 7 (0)| 00:00:01 |
| 1 |SORT AGGREGATE|| 1 | 3 |
|
|
|*2 | TABLE ACCESS FULL| T3 | 1 | 3 | 7 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("A"=2)
SQL>
上面可以看出 A=1时候CARD=917 ORACLE 并没有使用density来计算， A=2时候用的DENSITY计算的 card=10000*0.0001=1
针对a=1 oracle 使用的 histogram bucket来计算的
SQL> select endpoint_number,endpoint_value from dba_histograms where table_name=
'T3' and rownum=列distinct时候为频率，另一种高度均衡
HISTOGRAM
当列DISTINCT比 histogram bucket多时为高度均衡每个bucket存相同数量的值）
此例为高度均衡可以看出来 1-11 BUCKET 存值1，其实第12个BUCKET中也存值1，因为12-13中可以看出每个BUCKET 存 84个值(93-9)（此例中除值1外其他都唯一，所以很好看
出来），而第12个 BUCKET 最高存的值为9这样表示第12个BUCKET还存了值1（应该是76 =84-8 ，8是2到9的个数） ,oracle 发现值1 是一个高频率出现的值（跨越的多个
桶），oracle将采用BUCKET计算11/120（跨越bucket数/总bucket数）=.091666667 从这里看到了虽然BUCKET12中也有值1 但ORACLE没算进来（因为该bucket中还有其它值2-8）用
11/120 这也就造成了CARD只是接近而不是准确的1000
看下trace

SINGLE TABLE ACCESS PATH
Column (#1): A(NUMBER)
AvgLen: 4.00 NDV: 9001 Nulls: 0 Density: 1.0000e-004 Min: 1 Max: 9001
Histogram: HtBal#Bkts: 120UncompBkts: 120EndPtVals: 110
Table: T3Alias: T3
Card: Original: 10000Rounded: 917Computed: 916.67Non Adjusted: 916.67
Access Path: TableScan
Cost:7.14Resp: 7.14Degree: 0
Cost_io: 7.00Cost_cpu: 2163793
Resp_io: 7.00Resp_cpu: 2163793
Best:: AccessPath: TableScan
Cost: 7.14Degree: 1Resp: 7.14Card: 916.67Bytes: 0 ************ Card: 916.67
但是你说的其实是如何计算基数的不同，，但是我想问的是何时可以使用直方图？当满足啥条件才会去查看直方图信息，获得优秀的PLAN？而且好像直方图信息经常会被忽略
[ 本帖最后由 gaopengtttt 于 2009-10-12 09:06 编辑 ]

千问 · 2013-3-27 11:17:11

使用cbo的时候,只要硬解析发生都会去查看直方图

千问 · 2013-3-27 11:17:11

mark

千问 · 2013-3-27 11:17:11

我觉得是你收集的列上有HISTOGRAM 就使用吧，而且使用时候有可能产生hard parse吧~~~因为要重新计算评估一下执行计划(具体没实验过~~LZ可以实验下)，我那例子就是收集了就自动使用了
[ 本帖最后由 fufuh2o 于 2009-10-12 09:25 编辑 ]