Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-35273

CombineFilters support non-deterministic expressions

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.2.0
    • Fix Version/s: 3.2.0
    • Component/s: SQL
    • Labels:
      None

      Description

      For example:

      spark.sql("create table t1(id int) using parquet")
      spark.sql("select * from (select * from t1 where id not in (1, 3, 6)) t where id = 7 and rand() <= 0.01").explain("cost")
      

      Current:

      == Optimized Logical Plan ==
      Filter (isnotnull(id#0) AND ((id#0 = 7) AND (rand(-639771619343876662) <= 0.01))), Statistics(sizeInBytes=1.0 B)
      +- Filter NOT id#0 IN (1,3,6), Statistics(sizeInBytes=1.0 B)
         +- Relation default.t1[id#0] parquet, Statistics(sizeInBytes=0.0 B)
      

      Expected:

      == Optimized Logical Plan ==
      Filter ((rand(-639771619343876662) <= 0.01))), Statistics(sizeInBytes=1.0 B)
      +- Filter NOT id#0 IN (1,3,6) and isnotnull(id#0) AND ((id#0 = 7) , Statistics(sizeInBytes=1.0 B)
         +- Relation default.t1[id#0] parquet, Statistics(sizeInBytes=0.0 B)
      

      Another example:

      spark.sql("create table t1(id int) using parquet")
      spark.sql("create view v1 as select * from t1 where id not in (1, 3, 6)")
      spark.sql("select * from v1 where id = 7 and rand() <= 0.01").explain("cost")
      

        Attachments

          Activity

            People

            • Assignee:
              yumwang Yuming Wang
              Reporter:
              yumwang Yuming Wang
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: