hash join 驱动表的测试

[复制链接]
查看11 | 回复9 | 2005-10-30 17:05:33 | 显示全部楼层 |阅读模式
遇到比较奇怪的一个问题,hash join的时候 大表做驱动表比小表做驱动表 consistent gets 小,而且快
test.sql:
select value from v$mystat where statistic#=12;
select * from sh.sales s,sh.customers c where s.cust_id=c.cust_id;
select value from v$mystat where statistic#=12;
test2.sql
select value from v$mystat where statistic#=12;
select/*+ordered*/ * from sh.sales s,sh.customers c where s.cust_id=c.cust_id;
select value from v$mystat where statistic#=12;
对比
value 是从v$mystat 中获得的cpu 使用情况

LIZH@ORCL>@test
VALUE
----------
565
Elapsed: 00:00:00.00
918843 rows selected.
Elapsed: 00:00:27.53
Execution Plan
----------------------------------------------------------
Plan hash value: 3549450340
----------------------------------------------------------------------------------------------------------
| Id| Operation
| Name| Rows| Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 918K| 183M| |3205 (2)| 00:00:39 | | |
|*1 |HASH JOIN | | 918K| 183M|10M|3205 (2)| 00:00:39 | | |
| 2 | TABLE ACCESS FULL| CUSTOMERS | 55500 |9755K| | 407 (1)| 00:00:05 | | |
| 3 | PARTITION RANGE ALL| | 918K|25M| | 498 (4)| 00:00:06 | 1 |28 |
| 4 |TABLE ACCESS FULL | SALES | 918K|25M| | 498 (4)| 00:00:06 | 1 |28 |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("S"."CUST_ID"="C"."CUST_ID")

Statistics
----------------------------------------------------------

0recursive calls

0db block gets
64337consistent gets

0physical reads

0redo size
163271404bytes sent via SQL*Net to client
674236bytes received via SQL*Net from client
61258SQL*Net roundtrips to/from client

0sorts (memory)

0sorts (disk)
918843rows processed

VALUE
----------
1109
Elapsed: 00:00:00.00
LIZH@ORCL>@test2
VALUE
----------
1109
Elapsed: 00:00:00.00
918843 rows selected.
Elapsed: 00:00:21.90
Execution Plan
----------------------------------------------------------
Plan hash value: 1736528364
----------------------------------------------------------------------------------------------------------
| Id| Operation
| Name| Rows| Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 918K| 183M| |3208 (2)| 00:00:39 | | |
|*1 |HASH JOIN | | 918K| 183M|35M|3208 (2)| 00:00:39 | | |
| 2 | PARTITION RANGE ALL| | 918K|25M| | 498 (4)| 00:00:06 | 1 |28 |
| 3 |TABLE ACCESS FULL | SALES | 918K|25M| | 498 (4)| 00:00:06 | 1 |28 |
| 4 | TABLE ACCESS FULL| CUSTOMERS | 55500 |9755K| | 407 (1)| 00:00:05 | | |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("S"."CUST_ID"="C"."CUST_ID")

Statistics
----------------------------------------------------------

0recursive calls

0db block gets
9228consistent gets

0physical reads

0redo size
48408961bytes sent via SQL*Net to client
674236bytes received via SQL*Net from client
61258SQL*Net roundtrips to/from client

0sorts (memory)

0sorts (disk)
918843rows processed

VALUE
----------
1487
Elapsed: 00:00:00.00
LIZH@ORCL>
数据库版本9,10,11都差不多,sales 不是分区表也差不多 (没有测试大表很大,以至于需要分区构建hash表的情况)
这个测试中还有问题
163271404bytes sent via SQL*Net to client
48408961bytes sent via SQL*Net to client
不知道为什么返回数据量区别这么大
按说小表作为驱动表应该好,可是从这个测试中没看出来好处啊
[ 本帖最后由 gclizh 于 2008-7-24 14:01 编辑 ]
回复

使用道具 举报

千问 | 2005-10-30 17:05:33 | 显示全部楼层
我测试了一下。我得到结果没有这么大的区别。而且几乎是一致的。我这个测试表数据量不大。大表下面是我的测试结果。
环境:windows oracle 10.2.0.1
SQL> select count(*) from test1;
COUNT(*)
----------
49751
SQL> select count(*) from test2;
COUNT(*)
----------
2000

SQL> alter system flush shared_pool;
系统已更改。
SQL> select /*+ leading(a) use_hash(a,b) */ * from test1 a,test2 b where a.object_id=b.object_id;
已选择2000行。

执行计划
----------------------------------------------------------
Plan hash value: 497311279
------------------------------------------------------------------------------------
| Id| Operation
| Name| Rows| Bytes |TempSpc| Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | |2000 | 691K| | 527 (2)| 00:00:07 |
|*1 |HASH JOIN | |2000 | 691K|7016K| 527 (2)| 00:00:07 |
| 2 | TABLE ACCESS FULL| TEST1 | 37994 |6567K| | 158 (2)| 00:00:02 |
| 3 | TABLE ACCESS FULL| TEST2 |2000 | 345K| | 8 (0)| 00:00:01 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."OBJECT_ID"="B"."OBJECT_ID")
Note
-----
- dynamic sampling used for this statement

统计信息
----------------------------------------------------------
1625recursive calls

0db block gets
1130consistent gets

0physical reads

0redo size
166009bytes sent via SQL*Net to client
1848bytes received via SQL*Net from client
135SQL*Net roundtrips to/from client
14sorts (memory)

0sorts (disk)
2000rows processed
SQL> alter system flush shared_pool;
系统已更改。
SQL> select /*+ leading(b) use_hash(a,b) */ * from test1 a,test2 b where a.object_id=b.object_id;
已选择2000行。

执行计划
----------------------------------------------------------
Plan hash value: 3916441650
----------------------------------------------------------------------------
| Id| Operation
| Name| Rows| Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | |2000 | 691K| 168 (3)| 00:00:03 |
|*1 |HASH JOIN | |2000 | 691K| 168 (3)| 00:00:03 |
| 2 | TABLE ACCESS FULL| TEST2 |2000 | 345K| 8 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| TEST1 | 37994 |6567K| 158 (2)| 00:00:02 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("A"."OBJECT_ID"="B"."OBJECT_ID")
Note
-----
- dynamic sampling used for this statement

统计信息
----------------------------------------------------------
1625recursive calls

0db block gets
1130consistent gets

0physical reads

0redo size
166009bytes sent via SQL*Net to client
1848bytes received via SQL*Net from client
135SQL*Net roundtrips to/from client
14sorts (memory)

0sorts (disk)
2000rows processed
也许是我的表数据量不那么大原因。
回复

使用道具 举报

千问 | 2005-10-30 17:05:33 | 显示全部楼层
楼主试试把SALES表重建下
回复

使用道具 举报

千问 | 2005-10-30 17:05:33 | 显示全部楼层
表分析没有?
回复

使用道具 举报

千问 | 2005-10-30 17:05:33 | 显示全部楼层
可能是因为大表中重复的记录比较多,用一个在条件上没有重复记录的大表测试一下试试
回复

使用道具 举报

千问 | 2005-10-30 17:05:33 | 显示全部楼层
SQL>create table sales2 as select * from sales;
Table created.
Elapsed: 00:00:01.68
SQL>select/*+ordered*/ * from sales2 s, customers c where s.cust_id = c.cust_id;
918843 rows selected.
Elapsed: 00:00:37.99
Execution Plan
----------------------------------------------------------
Plan hash value: 4229485069
----------------------------------------------------------------------------------------
| Id| Operation
| Name| Rows| Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 842K| 214M| |5786 (2)| 00:01:10 |
|*1 |HASH JOIN | | 842K| 214M|78M|5786 (2)| 00:01:10 |
| 2 | TABLE ACCESS FULL| SALES2| 832K|69M| |1018 (3)| 00:00:13 |
| 3 | TABLE ACCESS FULL| CUSTOMERS | 55500 |9755K| | 337 (2)| 00:00:05 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("S"."CUST_ID"="C"."CUST_ID")
Note
-----
- dynamic sampling used for this statement

Statistics
----------------------------------------------------------

0recursive calls

0db block gets
11953consistent gets

0physical reads

0redo size
34696319bytes sent via SQL*Net to client
674308bytes received via SQL*Net from client
61258SQL*Net roundtrips to/from client

0sorts (memory)

0sorts (disk)
918843rows processed
SQL>select* from sales2 s, customers c where s.cust_id = c.cust_id;
918843 rows selected.
Elapsed: 00:00:40.71
Execution Plan
----------------------------------------------------------
Plan hash value: 3198171663
----------------------------------------------------------------------------------------
| Id| Operation
| Name| Rows| Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 842K| 214M| |5782 (2)| 00:01:10 |
|*1 |HASH JOIN | | 842K| 214M|10M|5782 (2)| 00:01:10 |
| 2 | TABLE ACCESS FULL| CUSTOMERS | 55500 |9755K| | 337 (2)| 00:00:05 |
| 3 | TABLE ACCESS FULL| SALES2| 832K|69M| |1018 (3)| 00:00:13 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("S"."CUST_ID"="C"."CUST_ID")
Note
-----
- dynamic sampling used for this statement

Statistics
----------------------------------------------------------

0recursive calls

0db block gets
66891consistent gets

0physical reads

0redo size
148634596bytes sent via SQL*Net to client
674308bytes received via SQL*Net from client
61258SQL*Net roundtrips to/from client

0sorts (memory)

0sorts (disk)
918843rows processed
SQL>
回复

使用道具 举报

千问 | 2005-10-30 17:05:33 | 显示全部楼层
to foxmile分析不分析只影响 执行计划,现在是我指定了执行 ,跟标分析不分析应该没有关系
to lhlzjxcustomers sales应该是很典型的主子表情况 一对多的关系
SQL> select *
2from (select count(*), cust_id from sales group by cust_id order by 1 desc)
3 where rownum <= 10
4;

COUNT(*)CUST_ID
---------- ----------
283 4090
28325939
28319010
28224560
281 6453
27819017
27826634
27732173
27726626
277 9830
[ 本帖最后由 gclizh 于 2008-7-24 15:15 编辑 ]
回复

使用道具 举报

千问 | 2005-10-30 17:05:33 | 显示全部楼层
关注,我也碰到过
回复

使用道具 举报

千问 | 2005-10-30 17:05:33 | 显示全部楼层
我是这样想:hash join用小表建hash table,主要是因为它更有可能实现optimal/one-pass hash join.但是具体性能会受到诸如bitmap vector的不同值个数,以及bucket 里面的hash collision的影响,所以带有了一定的随机性!
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

主题

0

回帖

4882万

积分

论坛元老

Rank: 8Rank: 8

积分
48824836
热门排行