Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5280

Coalesce chains of OR conditions to an IN predicate.

    XMLWordPrintableJSON

Details

    • ghx-label-3

    Description

      Would be nice to implement an ExprRewriteRule that coalesces multiple compatible OR conditions to an IN predicate, e.g.:

      (c=1) OR (c=2) OR (c=3) OR (c=4) ...
      ->
      c IN (1, 2, 3, 4...)
      

      Long chains of OR are generally unwieldy, and transforming them to IN has the following benefits:

      • IN predicates with long value lists are evaluated with an O(log n) lookup in the BE
      • It is easier to extract min/max values from an IN predicate for Parquet min/max filtering
      • The IN predicate may be faster to codegen than a deep binary tree or ORs

      Note that this new rule complements existing rules to yield interesting improvements, e.g.:

      (c1=1 AND c2='a') OR (c1=2 AND c2='a') OR (c1=3 AND c2='a')
      ->
      c2='a' AND c1 IN (1, 2, 3)
      

      I've attached a relevant query profile from one of Mostafa's experiments.

      Attachments

        1. same_query_profile_on_CDH5.12.txt
          265 kB
          Alexander Behm

        Activity

          People

            sakinapelli sandeep akinapelli
            alex.behm Alexander Behm
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: