The current Tajo has three join physical operators, such as BNLJoinExec, HashJoinExec, and MergeJoinExec. HashJoinExec builds a in-memory hash table for the smaller relation, and for each tuple in the larger relation it probes the hash table. It works very well and fast in many cases.

However, it doesn't work if the smaller relation does not fit in memory. In such a case, now we use ExternalSortExec and then MergeJoinExec. It also works well, but external sort is very costly.

Hybrid hash join (http://en.wikipedia.org/wiki/Hash_join#Hybrid_hash_join) is a well-known join algorithm. It is a variation of Grace hash join algorithm. Grace hash join algorithm works as follows:

Given two relations R and S to be joined,

1. Grace hash join algorithm partitions R into N buckets by hashing join keys in order to fit each bucket in memory

2. it also partitions S into N buckets by hashing join keys

3. for each pair of [i] buckets, 1 <= i <= N, Grace hash joins builds a in-memory hash table for bucket [i] of S relation, and probes the hash table for each tuple in bucket [i] of R relation.

Hybrid hash join just keeps the first bucket of S in memory and probes the first bucket of R directly in order to avoid the write and read of the first bucket pair.

You can find many other materials (e.g., PPTs, book chapters, and papers) for hybrid hash join algorithm via google.

Thanks!

Hello,

I'm a master student in Gebze Institute of Technology. I'm working on about

database, distributed system, datamining, IR

I'm interested in working with this issue. Could you do more explanation

about it what you mean hybrid hash operator?

Thanks

Best Regards...

–

BURAK ISIKLI* *| *http://burakisikli.wordpress.com**

*