Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-9771

[Rust] [DataFusion] Predicate Pushdown Improvement: treat predicates separated by AND separately

    XMLWordPrintableJSON

Details

    Description

      As discussed by jorgecarleitao and houqp here: https://github.com/apache/arrow/pull/7880#pullrequestreview-468057624

      If a predicate is a conjunction (aka AND'd) together, each of the clauses can be treated separately (e.g. a single filter expression A > 5 And B < 4 can be broken up and each of A > 5 and B < 4 can be potentially pushed down different levels

      The filter pushdown logic works for the following case (when a and b are are separate selections, predicate for a is pushed below the Aggregate in the optimized plan):

      ********Original plan:
      Selection: #b GtEq Int64(1)
        Selection: #a LtEq Int64(1)
          Aggregate: groupBy=[[#a]], aggr=[[MIN(#b)]]
            TableScan: test projection=None
      
      ********Optimized plan:
      Selection: #b GtEq Int64(1)
        Aggregate: groupBy=[[#a]], aggr=[[MIN(#b)]]
          Selection: #a LtEq Int64(1)
            TableScan: test projection=None
      

      But not for this when a and b are AND'd together

      ********Original plan:
      Selection: #a LtEq Int64(1) And #b GtEq Int64(1)
        Aggregate: groupBy=[[#a]], aggr=[[MIN(#b)]]
          TableScan: test projection=None
      ********Optimized plan:
      Selection: #a LtEq Int64(1) And #b GtEq Int64(1)
        Aggregate: groupBy=[[#a]], aggr=[[MIN(#b)]]
          TableScan: test projection=None
      

      Attachments

        Issue Links

          Activity

            People

              jorgecarleitao Jorge Leitão
              alamb Andrew Lamb
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1.5h
                  1.5h