In case of map-joins, it is likely that the big table will not find many matching rows from the small table.
Currently, we perform a hash-map lookup for every row in the big table, which can be pretty expensive.
It might be useful to try out a bloom-filter containing all the elements in the small table.
Each element from the big table is first searched in the bloom filter, and only in case of a positive match,
the small table hash table is explored.
|Assignee||J. Andrew Key [ joeandrewkey ]|
|Labels||optimization||gsoc gsoc2012 optimization|
|Assignee||Siying Dong [ sdong ]||J. Andrew Key [ joeandrewkey ]|
|Assignee||Liyin Tang [ liyin ]||Siying Dong [ sdong ]|
|Field||Original Value||New Value|
|Summary||use bloom filters to improve the performance of map joins||use bloom filters to improve the performance of joins|