Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-37575

null values should be saved as nothing rather than quoted empty Strings "" with default settings

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.0, 3.2.0
    • 3.3.0
    • SQL
    • None

    Description

      As mentioned in sql migration guide(https://spark.apache.org/docs/latest/sql-migration-guide.html#upgrading-from-spark-sql-23-to-24),

      Since Spark 2.4, empty strings are saved as quoted empty strings "". In version 2.3 and earlier, empty strings are equal to null values and do not reflect to any characters in saved CSV files. For example, the row of "a", null, "", 1 was written as a,,,1. Since Spark 2.4, the same row is saved as a,,"",1. To restore the previous behavior, set the CSV option emptyValue to empty (not quoted) string.

      But actually, both empty strings and null values are saved as quoted empty Strings "" rather than "" (for empty strings) and nothing(for null values)。

      code:

      val data = List("spark", null, "").toDF("name")
      data.coalesce(1).write.csv("spark_csv_test")
      

       actual result:

      line1: spark
      line2: ""
      line3: ""

      expected result:

      line1: spark
      line2: 
      line3: ""
      

      Attachments

        Activity

          People

            Wayne Guo Wei Guo
            Wayne Guo Wei Guo
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: