Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-29162

Simplify NOT(isnull(x)) and NOT(isnotnull(x))

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • SQL
    • None

    Description

      I propose the following expression rewrite optimizations:

      NOT isnull(x)     -> isnotnull(x)
      NOT isnotnull(x)  -> isnull(x)
      

      This might seem contrived, but I saw negated versions of these expressions appear in a user-written query after that query had undergone optimization. For example:

      spark.createDataset(Seq[(String, java.lang.Boolean)](("true", true), ("false", false), ("null", null))).write.parquet("/tmp/bools")
      spark.read.parquet("/tmp/bools").where("not(isnull(_2) or _2 == false)").explain
      
      spark.read.parquet("/tmp/bools").where("not(isnull(_2) or _2 == false)").explain(true)
      == Parsed Logical Plan ==
      'Filter NOT ('isnull('_2) OR ('_2 = false))
      +- RelationV2[_1#4, _2#5] parquet file:/tmp/bools
      
      == Analyzed Logical Plan ==
      _1: string, _2: boolean
      Filter NOT (isnull(_2#5) OR (_2#5 = false))
      +- RelationV2[_1#4, _2#5] parquet file:/tmp/bools
      
      == Optimized Logical Plan ==
      Filter ((isnotnull(_2#5) AND NOT isnull(_2#5)) AND NOT (_2#5 = false))
      +- RelationV2[_1#4, _2#5] parquet file:/tmp/bools
      
      == Physical Plan ==
      *(1) Project [_1#4, _2#5]
      +- *(1) Filter ((isnotnull(_2#5) AND NOT isnull(_2#5)) AND NOT (_2#5 = false))
         +- *(1) ColumnarToRow
            +- BatchScan[_1#4, _2#5] ParquetScan Location: InMemoryFileIndex[file:/tmp/bools], ReadSchema: struct<_1:string,_2:boolean>
      

      This rewrite is also useful for query canonicalization.

      Attachments

        Issue Links

          Activity

            People

              angerszhuuu angerszhu
              joshrosen Josh Rosen
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: