Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-29162

Simplify NOT(isnull(x)) and NOT(isnotnull(x))

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0
    • Fix Version/s: 3.0.0
    • Component/s: SQL
    • Labels:
      None

      Description

      I propose the following expression rewrite optimizations:

      NOT isnull(x)     -> isnotnull(x)
      NOT isnotnull(x)  -> isnull(x)
      

      This might seem contrived, but I saw negated versions of these expressions appear in a user-written query after that query had undergone optimization. For example:

      spark.createDataset(Seq[(String, java.lang.Boolean)](("true", true), ("false", false), ("null", null))).write.parquet("/tmp/bools")
      spark.read.parquet("/tmp/bools").where("not(isnull(_2) or _2 == false)").explain
      
      spark.read.parquet("/tmp/bools").where("not(isnull(_2) or _2 == false)").explain(true)
      == Parsed Logical Plan ==
      'Filter NOT ('isnull('_2) OR ('_2 = false))
      +- RelationV2[_1#4, _2#5] parquet file:/tmp/bools
      
      == Analyzed Logical Plan ==
      _1: string, _2: boolean
      Filter NOT (isnull(_2#5) OR (_2#5 = false))
      +- RelationV2[_1#4, _2#5] parquet file:/tmp/bools
      
      == Optimized Logical Plan ==
      Filter ((isnotnull(_2#5) AND NOT isnull(_2#5)) AND NOT (_2#5 = false))
      +- RelationV2[_1#4, _2#5] parquet file:/tmp/bools
      
      == Physical Plan ==
      *(1) Project [_1#4, _2#5]
      +- *(1) Filter ((isnotnull(_2#5) AND NOT isnull(_2#5)) AND NOT (_2#5 = false))
         +- *(1) ColumnarToRow
            +- BatchScan[_1#4, _2#5] ParquetScan Location: InMemoryFileIndex[file:/tmp/bools], ReadSchema: struct<_1:string,_2:boolean>
      

      This rewrite is also useful for query canonicalization.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                angerszhuuu angerszhu
                Reporter:
                joshrosen Josh Rosen
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: