Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Currently (arrow version 6.0.1 and readr version 2.1.0) we only support a few of the readr::write_csv() arguments. Once ARROW-13623 is fixed write_csv_arrow() will error if the user passes unsupported readr arguments.
The following arguments need CsvWriteOptions (see linked issues) in order to be exposed to R users:
- na: string used for missing values. Defaults to NA. Missing values are never quoted; strings with the same value as na will always be quoted.
- append: boolean. If {[FALSE}} will overwrite existing file. If TRUE will append to existing file. In both cases, if the file doesn't exist, a new file is created.
- quote: how to handle fields which contain characters that need to be quoted:
- needed: only quote fields which need them
- all: quote all fields - I think this might be the implicit default behaviour for `write_csv_arrow()`
- none: never quote fields
- escape: the type of escape to use when quotes are in the data:
- double: quotes are escaped by doubling them
- backslash: quotes are escaped by a preceding backslash
- none: quotes are not escaped
- eol: the end of line character to use. Most commonly either "\n" for Unix style newlines, or "\r\n" for Windows style newlines.
Once these are enabled, update the signature of `write_csv_arrow()` and compare written files.
From ARROW-13623 "I noticed we had a difference in quoting: readr doesn't quote strings by default but we do." Once we have more control over quoting, we could write some tests to make sure default behaviours between write_csv_arrow() and {{readr::write_csv()}} match.
Attachments
Issue Links
- split from
-
ARROW-13623 [R] write_csv_arrow should follow the signature of readr::write_csv
- Resolved
1.
|
[C++] Enable CSV Writer to append / overwrite existing file | Open | Unassigned | |||||||||
2.
|
[C++] Enable CSV Writer to control the type of escape used for quoting | Open | Unassigned |
|