Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 3.4.0
-
None
-
ghx-label-6
Description
In bloom-filter benchmark for Kudu, there is performance regression for query TPCH-Q9. In Profile shows that 5 bloom filters are generated by hash join. Some of those filters are not useful for filtering rows. When pushing all bloom filters to Kudu, the bloom filter evaluations add extra cost for Kudu scan, which cause performance regression.
The regression on Q9 looks a lot like https://issues.apache.org/jira/browse/IMPALA-9302, where Q9 regressed a lot with multithreading initially because ineffective filters weren't being disabled. This query is a bit special in that there are many filters pushed to scan 2, and most of them are not useful. Based on our experience there, we need to add a method to disable ineffective filters for Kudu scan.
Attachments
Issue Links
- Dependent
-
KUDU-3140 Add heuristics to disable predicate evaluation/filtering for Bloom filter predicate
- Resolved
- is related to
-
IMPALA-3741 Push bloom filters to Kudu scanners
- Resolved