Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-1510

Dictionary filter skips null values when evaluating not-equals.

    XMLWordPrintableJSON

    Details

      Description

      This was discovered in Spark, see SPARK-26677. From the Spark PR:

      // Repeat the values to get dictionary encoding.
      Seq(Some("A"), Some("A"), None).toDF.repartition(1).write.mode("overwrite").parquet("/tmp/foo")
      spark.read.parquet("/tmp/foo").where("NOT (value <=> 'A')").show()
      +-----+
      |value|
      +-----+
      +-----+
      
      // Use plain encoding.
      Seq(Some("A"), None).toDF.repartition(1).write.mode("overwrite").parquet("/tmp/bar")
      spark.read.parquet("/tmp/bar").where("NOT (value <=> 'A')").show()
      +-----+
      |value|
      +-----+
      | null|
      +-----+
      

      This is a correctness issue.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                rdblue Ryan Blue
                Reporter:
                rdblue Ryan Blue
              • Votes:
                1 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: