Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5280

Coalesce chains of OR conditions to an IN predicate.

    Details

    • Epic Color:
      ghx-label-3

      Description

      Would be nice to implement an ExprRewriteRule that coalesces multiple compatible OR conditions to an IN predicate, e.g.:

      (c=1) OR (c=2) OR (c=3) OR (c=4) ...
      ->
      c IN (1, 2, 3, 4...)
      

      Long chains of OR are generally unwieldy, and transforming them to IN has the following benefits:

      • IN predicates with long value lists are evaluated with an O(log n) lookup in the BE
      • It is easier to extract min/max values from an IN predicate for Parquet min/max filtering
      • The IN predicate may be faster to codegen than a deep binary tree or ORs

      Note that this new rule complements existing rules to yield interesting improvements, e.g.:

      (c1=1 AND c2='a') OR (c1=2 AND c2='a') OR (c1=3 AND c2='a')
      ->
      c2='a' AND c1 IN (1, 2, 3)
      

      I've attached a relevant query profile from one of Mostafa's experiments.

        Attachments

          Activity

            People

            • Assignee:
              sakinapelli sandeep akinapelli
              Reporter:
              alex.behm Alexander Behm
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: