Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.14.0
-
None
-
None
-
Hive HiveServer2 wiki
Description
SUMMARY
- need better wiki page doc for beeline outputformat option
- should explicitly say that "double quote characters" are used to enclose fields which need enclosing.
- Should describe the treatment of embedded double quote chars as "doubled"
DETAIL
The page at:
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-Separated-ValueOutputFormats
describes separated value outputformats csv/tsv/csv2/tsv2, etc.
I found doc to be inadequate and terminology to be confusing.
> These conform better to standard CSV convention, which adds quotes around a cell value
What kind of quotes? The only reference to quotes in this section refers to single quotes for the deprecated csv/tsv format.
The JIRA at
https://issues.apache.org/jira/browse/HIVE-8615
clarifies a bit:
- Old format quoted every field. New format quotes only fields that contain a delimiter or the quoting char.
- Old format quoted using single quotes, new format quotes using double quotes
- Old format didn't escape quotes in a field (a bug). New format does escape the quotes
However, neither this JIRA page nor the wiki page doc define what is meant by "escaping the quotes".
Q: In this context, does escaping mean "backslash escaping" or "double embedded double quotes" or something else?
Investigation of source code reveals that this is using SuperCSV.
SuperCSV does not support backslash-escape of embedded quotes. See last line of:
https://super-csv.github.io/super-csv/csv_specification.html
THE END
Attachments
Issue Links
- relates to
-
HIVE-8615 beeline csv,tsv outputformat needs backward compatibility mode
- Closed