Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
2.3.0, 2.3.1, 2.4.0, 3.0.0
-
None
-
None
Description
As described inĀ RFC-4180, page 2 -
7. If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote
That's what Excel does, for example, by default.
Although in Spark (as of Spark 2.1), escaping is done by default through non-RFC way, using backslah (). To fix this you have to explicitly tell Spark to use doublequote to use for as an escape character:
.option('quote', '"') .option('escape', '"')
This may explain that a comma character wasn't interpreted as it was inside a quoted column.
So this is request to make spark-csv reader RFC-4180 compatible in regards to default option values for `quote` and `escape` (make both equal to " ).
Since this is a backward-incompatible change, Spark 3.0 might be a good release for this change.
Some more background - https://stackoverflow.com/a/45138591/470583
Attachments
Issue Links
- duplicates
-
SPARK-22236 CSV I/O: does not respect RFC 4180
- Open