Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 2.8.0
-
ghx-label-3
Description
Would be nice to implement an ExprRewriteRule that coalesces multiple compatible OR conditions to an IN predicate, e.g.:
(c=1) OR (c=2) OR (c=3) OR (c=4) ... -> c IN (1, 2, 3, 4...)
Long chains of OR are generally unwieldy, and transforming them to IN has the following benefits:
- IN predicates with long value lists are evaluated with an O(log n) lookup in the BE
- It is easier to extract min/max values from an IN predicate for Parquet min/max filtering
- The IN predicate may be faster to codegen than a deep binary tree or ORs
Note that this new rule complements existing rules to yield interesting improvements, e.g.:
(c1=1 AND c2='a') OR (c1=2 AND c2='a') OR (c1=3 AND c2='a') -> c2='a' AND c1 IN (1, 2, 3)
I've attached a relevant query profile from one of Mostafa's experiments.