Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19887

__HIVE_DEFAULT_PARTITION__ is not interpreted as NULL partition value in partitioned persisted tables

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.1.0, 2.2.0
    • 2.1.1, 2.2.0
    • SQL

    Description

      The following Spark shell snippet under Spark 2.1 reproduces this issue:

      val data = Seq(
        ("p1", 1, 1),
        ("p2", 2, 2),
        (null, 3, 3)
      )
      
      // Correct case: Saving partitioned data to file system.
      
      val path = "/tmp/partitioned"
      
      data.
        toDF("a", "b", "c").
        write.
        mode("overwrite").
        partitionBy("a", "b").
        parquet(path)
      
      spark.read.parquet(path).filter($"a".isNotNull).show(truncate = false)
      // +---+---+---+
      // |c  |a  |b  |
      // +---+---+---+
      // |2  |p2 |2  |
      // |1  |p1 |1  |
      // +---+---+---+
      
      // Incorrect case: Saving partitioned data as persisted table.
      
      data.
        toDF("a", "b", "c").
        write.
        mode("overwrite").
        partitionBy("a", "b").
        saveAsTable("test_null")
      
      spark.table("test_null").filter($"a".isNotNull).show(truncate = false)
      // +---+--------------------------+---+
      // |c  |a                         |b  |
      // +---+--------------------------+---+
      // |3  |__HIVE_DEFAULT_PARTITION__|3  |     <-- This line should not be here
      // |1  |p1                        |1  |
      // |2  |p2                        |2  |
      // +---+--------------------------+---+
      

      Hive-style partitioned tables use the magic string __HIVE_DEFAULT_PARTITION__ to indicate NULL partition values in partition directory names. However, in the case persisted partitioned table, this magic string is not interpreted as NULL but a regular string.

      Attachments

        Activity

          People

            cloud_fan Wenchen Fan
            lian cheng Cheng Lian
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: