Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
2.1.0, 2.1.1
-
None
-
None
-
version 2.1.0-mapr-1703
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131)and
version 2.1.1
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131)
Description
Looks like Spark truncates trailing spaces saving data with csv codec. Check the following example for more details (note extra space at the end of "Johny " field):
scala> case class SampleRow(field1: String, field2: String) defined class SampleRow scala> val fooDS = Seq(SampleRow("Johny ", "Doe"), SampleRow("Ivan", "Susanin")).toDS() fooDS: org.apache.spark.sql.Dataset[SampleRow] = [field1: string, field2: string] scala> fooDS.collect.foreach(println) SampleRow(Johny ,Doe) SampleRow(Ivan,Susanin) scala> fooDS.show() +------+-------+ |field1| field2| +------+-------+ |Johny | Doe| | Ivan|Susanin| +------+-------+ scala> import org.apache.spark.sql.SaveMode import org.apache.spark.sql.SaveMode scala> fooDS.write.option("delimiter", "|").mode(SaveMode.Overwrite).csv("file:///tmp/spaces.txt") cat /tmp/spaces.txt/* Johny|Doe Ivan|Susanin
I expect space before the pipe at the first line in output file.
Attachments
Issue Links
- duplicates
-
SPARK-18579 spark-csv strips whitespace (pyspark)
- Resolved