Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21442

Spark CSV writer trims trailing spaces

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 2.1.0, 2.1.1
    • None
    • Input/Output
    • None
    • version 2.1.0-mapr-1703
      Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131)

      and

      version 2.1.1
      Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131)

    Description

      Looks like Spark truncates trailing spaces saving data with csv codec. Check the following example for more details (note extra space at the end of "Johny " field):

      scala> case class SampleRow(field1: String, field2: String)
      defined class SampleRow
      
      scala> val fooDS = Seq(SampleRow("Johny ", "Doe"), SampleRow("Ivan", "Susanin")).toDS()
      fooDS: org.apache.spark.sql.Dataset[SampleRow] = [field1: string, field2: string]
      
      scala> fooDS.collect.foreach(println)
      SampleRow(Johny ,Doe)
      SampleRow(Ivan,Susanin)
      
      scala> fooDS.show()
      +------+-------+
      |field1| field2|
      +------+-------+
      |Johny |    Doe|
      |  Ivan|Susanin|
      +------+-------+
      
      scala> import org.apache.spark.sql.SaveMode
      import org.apache.spark.sql.SaveMode
      
      scala> fooDS.write.option("delimiter", "|").mode(SaveMode.Overwrite).csv("file:///tmp/spaces.txt")
      
      cat /tmp/spaces.txt/*
      Johny|Doe
      Ivan|Susanin
      

      I expect space before the pipe at the first line in output file.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              eyushin eugen yushin
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: