[IMPALA-12455] Create set of disjunct bloom filters for keys in partitioned builds - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Backend, Frontend
Labels:

Epic Color:
ghx-label-1

Description

Currently Impala aggregates bloom filters from different instances of the join builder by OR-ing them to a final filter. This could be avoided by having num_instances smaller bloom filters and choosing the correct one during lookup by doing the same hashing as used in partitioning. Builders would only need to write a single small filter as they have only keys from a single partition. This would make runtime filter producers faster and much more scalable while shouldn't have major effect on consumers.

One caveat is that we push down the current bloom filter to Kudu as it is, so this optimization wouldn't be applicable in filters consumed by Kudu scans.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Csaba Ringhofer

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 21/Sep/23 09:24

Updated:: 08/Feb/24 15:23