在metalink 41954.1 上看到:
o At the same time a bitmap of the hashed values is built using both hash values
bitmap是由join keys的2个hash值组成的,对吧?
在下面的“Bitmap for unique join keys”有点看不懂了,结合《Hash Joins, Implementation and Tuning》中的描述,似乎bitmap是保存的join keys,那这样不就可以直接确定关联关系了吗,没必要再去执行之后的probe hash table了,搞不懂....而且bitmap的使用大小也很小的,不可能是unique join keys的.......
我一开始以为bitmap保存的是小表的unique hash value,这样当大表扫描bitmap的时候就起到了filter的作用,如果大表的hash value和bitmap符合,再去buckets里查看具体的join key value是否一致,这样最后出结果,现在有点懵了,麻烦XD们帮忙解释一下。
原文关于bitmap的说明:
Bitmap for unique join keys
===========================
This bitmap is effectively a flag that indicates if a particular hash bucket
contains values. The bitmap records the hash values that actually contain
records. The point of this is to strip out rows from S before they
are written to the partition on disk because they do not hash to any of the
partition(s) that are currently in memory. This bitmap is consulted before
the rows from S are put in to partitions on disk so that we do not have the
overhead of writing out and then reading back in rows that are never going
to join.
The filter really tells us what is NOT stored on disk as opposed to
what is. Since different values can hash to the same hash value, a match on a
hash value does not guarantee that the actual value is the one that caused the
hash partition to be populated. However it can filter a good proportion of
rows, hence its usage.
If the hash value is not present then the value is definately not
in the hash table.
It is possible to get false positives; keys that hash to a value in the hash
value set, but are not contained in the data set.
False positives are less likely as the hash value range increases.
---------------------不是特别懂,主要是结合《Hash Joins, Implementation and Tuning》看的