Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19887

__HIVE_DEFAULT_PARTITION__ is not interpreted as NULL partition value in partitioned persisted tables

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.1.0, 2.2.0
    • Fix Version/s: 2.1.1, 2.2.0
    • Component/s: SQL
    • Labels:

      Description

      The following Spark shell snippet under Spark 2.1 reproduces this issue:

      val data = Seq(
        ("p1", 1, 1),
        ("p2", 2, 2),
        (null, 3, 3)
      )
      
      // Correct case: Saving partitioned data to file system.
      
      val path = "/tmp/partitioned"
      
      data.
        toDF("a", "b", "c").
        write.
        mode("overwrite").
        partitionBy("a", "b").
        parquet(path)
      
      spark.read.parquet(path).filter($"a".isNotNull).show(truncate = false)
      // +---+---+---+
      // |c  |a  |b  |
      // +---+---+---+
      // |2  |p2 |2  |
      // |1  |p1 |1  |
      // +---+---+---+
      
      // Incorrect case: Saving partitioned data as persisted table.
      
      data.
        toDF("a", "b", "c").
        write.
        mode("overwrite").
        partitionBy("a", "b").
        saveAsTable("test_null")
      
      spark.table("test_null").filter($"a".isNotNull).show(truncate = false)
      // +---+--------------------------+---+
      // |c  |a                         |b  |
      // +---+--------------------------+---+
      // |3  |__HIVE_DEFAULT_PARTITION__|3  |     <-- This line should not be here
      // |1  |p1                        |1  |
      // |2  |p2                        |2  |
      // +---+--------------------------+---+
      

      Hive-style partitioned tables use the magic string __HIVE_DEFAULT_PARTITION__ to indicate NULL partition values in partition directory names. However, in the case persisted partitioned table, this magic string is not interpreted as NULL but a regular string.

        Attachments

          Activity

            People

            • Assignee:
              cloud_fan Wenchen Fan
              Reporter:
              lian cheng Cheng Lian
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: