Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21442

Spark CSV writer trims trailing spaces

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 2.1.0, 2.1.1
    • None
    • Input/Output
    • None
    • version 2.1.0-mapr-1703
      Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131)

      and

      version 2.1.1
      Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131)

    Description

      Looks like Spark truncates trailing spaces saving data with csv codec. Check the following example for more details (note extra space at the end of "Johny " field):

      scala> case class SampleRow(field1: String, field2: String)
      defined class SampleRow
      
      scala> val fooDS = Seq(SampleRow("Johny ", "Doe"), SampleRow("Ivan", "Susanin")).toDS()
      fooDS: org.apache.spark.sql.Dataset[SampleRow] = [field1: string, field2: string]
      
      scala> fooDS.collect.foreach(println)
      SampleRow(Johny ,Doe)
      SampleRow(Ivan,Susanin)
      
      scala> fooDS.show()
      +------+-------+
      |field1| field2|
      +------+-------+
      |Johny |    Doe|
      |  Ivan|Susanin|
      +------+-------+
      
      scala> import org.apache.spark.sql.SaveMode
      import org.apache.spark.sql.SaveMode
      
      scala> fooDS.write.option("delimiter", "|").mode(SaveMode.Overwrite).csv("file:///tmp/spaces.txt")
      
      cat /tmp/spaces.txt/*
      Johny|Doe
      Ivan|Susanin
      

      I expect space before the pipe at the first line in output file.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            eyushin eugen yushin
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment