Would be nice to implement an ExprRewriteRule that coalesces multiple compatible OR conditions to an IN predicate, e.g.:
Long chains of OR are generally unwieldy, and transforming them to IN has the following benefits:
- IN predicates with long value lists are evaluated with an O(log n) lookup in the BE
- It is easier to extract min/max values from an IN predicate for Parquet min/max filtering
- The IN predicate may be faster to codegen than a deep binary tree or ORs
Note that this new rule complements existing rules to yield interesting improvements, e.g.:
I've attached a relevant query profile from one of Mostafa's experiments.