hash join时bitmap和buckets到底存的啥？

显示全部楼层 · 2007-10-20 08:38:44

在metalink 41954.1 上看到：
o At the same time a bitmap of the hashed values is built using both hash values
bitmap是由join keys的2个hash值组成的，对吧？
在下面的“Bitmap for unique join keys”有点看不懂了，结合《Hash Joins, Implementation and Tuning》中的描述，似乎bitmap是保存的join keys，那这样不就可以直接确定关联关系了吗，没必要再去执行之后的probe hash table了，搞不懂....而且bitmap的使用大小也很小的，不可能是unique join keys的.......
我一开始以为bitmap保存的是小表的unique hash value，这样当大表扫描bitmap的时候就起到了filter的作用，如果大表的hash value和bitmap符合，再去buckets里查看具体的join key value是否一致，这样最后出结果，现在有点懵了，麻烦XD们帮忙解释一下。

千问 · 2007-10-20 08:38:44

原文关于bitmap的说明：
Bitmap for unique join keys
===========================
This bitmap is effectively a flag that indicates if a particular hash bucket
contains values. The bitmap records the hash values that actually contain
records. The point of this is to strip out rows from S before they
are written to the partition on disk because they do not hash to any of the
partition(s) that are currently in memory. This bitmap is consulted before
the rows from S are put in to partitions on disk so that we do not have the
overhead of writing out and then reading back in rows that are never going
to join.
The filter really tells us what is NOT stored on disk as opposed to
what is. Since different values can hash to the same hash value, a match on a
hash value does not guarantee that the actual value is the one that caused the
hash partition to be populated. However it can filter a good proportion of
rows, hence its usage.
If the hash value is not present then the value is definately not
in the hash table.
It is possible to get false positives; keys that hash to a value in the hash
value set, but are not contained in the data set.
False positives are less likely as the hash value range increases.
---------------------不是特别懂，主要是结合《Hash Joins, Implementation and Tuning》看的

千问 · 2007-10-20 08:38:44

up.......

千问 · 2007-10-20 08:38:44

艰难的UP

千问 · 2007-10-20 08:38:44

这本书《Hash Joins, Implementation and Tuning》在哪儿，我也想看一下儿！楼主能共享吗？

千问 · 2007-10-20 08:38:44

hash_join.pdf(86.43 KB, 下载次数: 44)2008-7-31 14:04 上传点击文件名下载附件

hash join算法原理.doc(43.5 KB, 下载次数: 32)2008-7-31 14:04 上传点击文件名下载附件有一本是网友翻译的。

千问 · 2007-10-20 08:38:44

参考
http://www.itpub.net/315494.html

千问 · 2007-10-20 08:38:44

千问 · 2007-10-20 08:38:44

兄弟，《hash join算法原理.doc》我也看过了，文中描述：
“在将S表读入内存分区时，oracle即记录连接键的唯一值，构建成所谓的位图向量，它需要占hash area内存的5%左右。在这里即为{1,3,4,5,8,10}。
当对B表进行分区时，将每一个连接键上的值与位图向量相比较，如果不在其中，则将其记录丢弃。在我们这个例子中，B表中以下数据将被丢弃”
----那我感觉在bitmap这里过滤后就可以完成匹配了（因为这里说bitmap保存的是key value，不是hash value），没必要再去对比S分区中的HASH VALUE了。

千问 · 2007-10-20 08:38:44

morning up