Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.9.0
-
None
-
None
Description
Reading data that contains " does not work if escape character is manually set to '"' as specified in RFC 4180.
It works for other escape characters or if no escape character is explicitly defined in the format.
This line in Lexer.java is responsible for the originally quite erroneous ticket:
this.escape = mapNullToDisabled(format.getEscapeCharacter());
From this line I (wrongly) deduced that an unspecified escape character would actually disable escaping. Because of that I wanted to enable it by setting it to '"' which causes exceptions in the Lexer for perfectly valid input. That in turn convinced my that this is a way bigger issue than it is. Sorry about that.
I don't think that the current situation is ideal, though.
I would not have been this confused if CSVFormat would be more explicit about the escape char that will be used, i.e. if toString() would show the implicitly used quote character or print - in case of null - that this means it's using the quote character. It is currently omitted from the output if it is not set explicitly.
There is also no documentation about what null as escape character actually means - it may be documented somewhere but isn't documented for CSVFormat.getEscapeCharacter() or CSVFormat.Builder.set/getEscape() methods.
And setting the escape character explicitly to the value specified in the RFC should certainly not fail, even if setting it to that value is superfluous since null behaves exactly the same.
Relevant part of the RFC:
7. If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote. For example:
"aaa","b""bb","ccc"