Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-36878

Optimization in PushDownPredicates to push all filters in a single iteration has broken some optimizations in PruneFilter rule

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Bug
    • 3.1.1
    • None
    • SQL
    • None

    Description

      It appears that the optimization in PushDownPredicates rule to try to push all filters in a single pass to reduce iteration has broken the PruneFilter rule to substitute with EmptyRelation when the filter condition is a composite and statically evaluates to false either because one of the non redundant predicate is Literal(false) or all the non redundant predicates are null.

      The new PushDownPredicate rule is created by chaining CombineFilters, PushPredicateThroughNonJoin and PushPredicateThroughJoin.

      so individual filters will get combined as a single filter while being pushed.

      But the PruneFilters rule does not substitute it with empty relation if the filter is composite. It is coded to handle single predicates.

      The test is falsely passing as it is testing PushPredicateThroughNonJoin, which does not combine filters. 

      While  the actual rule in action has an effect produced by CombineFilters. 

      In fact I believe all the places in other tests which are testing individually for PushDownPredicateThroughNonJoin or PushDownPredicateThroughJoin should be corrected ( may be with rule PushPredicates) & re tested.

      I will add a bug test & open PR.

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            ashahid7 Asif
            andrew spark andrew spark
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 48h
                48h
                Remaining:
                Remaining Estimate - 48h
                48h
                Logged:
                Time Spent - Not Specified
                Not Specified