[SPARK-21442] Spark CSV writer trims trailing spaces - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 2.1.0, 2.1.1
Fix Version/s: None
Component/s: Input/Output
Labels:
None
Environment:

version 2.1.0-mapr-1703
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131)

and

version 2.1.1
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131)

Description

Looks like Spark truncates trailing spaces saving data with csv codec. Check the following example for more details (note extra space at the end of "Johny " field):

scala> case class SampleRow(field1: String, field2: String)
defined class SampleRow

scala> val fooDS = Seq(SampleRow("Johny ", "Doe"), SampleRow("Ivan", "Susanin")).toDS()
fooDS: org.apache.spark.sql.Dataset[SampleRow] = [field1: string, field2: string]

scala> fooDS.collect.foreach(println)
SampleRow(Johny ,Doe)
SampleRow(Ivan,Susanin)

scala> fooDS.show()
+------+-------+
|field1| field2|
+------+-------+
|Johny |    Doe|
|  Ivan|Susanin|
+------+-------+

scala> import org.apache.spark.sql.SaveMode
import org.apache.spark.sql.SaveMode

scala> fooDS.write.option("delimiter", "|").mode(SaveMode.Overwrite).csv("file:///tmp/spaces.txt")

cat /tmp/spaces.txt/*
Johny|Doe
Ivan|Susanin

I expect space before the pipe at the first line in output file.

Attachments

Issue Links

duplicates

SPARK-18579 spark-csv strips whitespace (pyspark)

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: eugen yushin

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 17/Jul/17 14:37

Updated:: 12/Dec/22 18:10

Resolved:: 17/Jul/17 14:49