Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-10547

[Rust][DataFusion] Filter pushdown loses filters if below a user defined node

    XMLWordPrintableJSON

Details

    Description

      I have a LogicalPlan like this (where "Extension" is some extension node type):

          Extension [non_null_column:Utf8]
            Filter: #state Eq Utf8("MA") [city:Utf8;N, state:Utf8;N, borough:Utf8;N]
              InMemoryScan: projection=None [city:Utf8;N, state:Utf8;N, borough:Utf8;N]
      

      When I run this plan through ExecutionContext::optimize the plan that results is as follows (note the filter has been lost):

          Extension [non_null_column:Utf8]
            InMemoryScan: projection=Some([0, 1, 2]) [city:Utf8;N, state:Utf8;N, borough:Utf8;N]
      

      I have debugged the problem and the root cause of the issue is that the FilterPushdown logic is not recursing into the inputs of user defined nodes. I will get a PR up showing and fixing the problem.

      More generally, I am not sure how / if the filter pushdown logic would work if there are multiple child inputs as no existing LogicalPlan variant his multiple inputs at this time. We will have to handle this when / if we want to support joins (typically filters get pushed down each join input if possible)

      Attachments

        Issue Links

          Activity

            People

              alamb Andrew Lamb
              alamb Andrew Lamb
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m