Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5280

Coalesce chains of OR conditions to an IN predicate.

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • ghx-label-3

    Description

      Would be nice to implement an ExprRewriteRule that coalesces multiple compatible OR conditions to an IN predicate, e.g.:

      (c=1) OR (c=2) OR (c=3) OR (c=4) ...
      ->
      c IN (1, 2, 3, 4...)
      

      Long chains of OR are generally unwieldy, and transforming them to IN has the following benefits:

      • IN predicates with long value lists are evaluated with an O(log n) lookup in the BE
      • It is easier to extract min/max values from an IN predicate for Parquet min/max filtering
      • The IN predicate may be faster to codegen than a deep binary tree or ORs

      Note that this new rule complements existing rules to yield interesting improvements, e.g.:

      (c1=1 AND c2='a') OR (c1=2 AND c2='a') OR (c1=3 AND c2='a')
      ->
      c2='a' AND c1 IN (1, 2, 3)
      

      I've attached a relevant query profile from one of Mostafa's experiments.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            sakinapelli sandeep akinapelli
            alex.behm Alexander Behm
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment