Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
Currently, this code fails:
dataset <- open_dataset("some/folder/with/parquet/files") write_csv_arrow(dataset, sink = "dataset.csv")
with this error message:
Error: x must be an object of class 'data.frame', 'RecordBatch', or 'Table', not 'FileSystemDataset'.
In ARROW-14741, support was added for reading from a RecordBatchReader, so we should be able to now extend write_csv_arrow() to allow this behaviour.
Note: We would need to make sure whatever write_csv(record_batch_reader) function can take a filesystem= argument
Attachments
Issue Links
- depends upon
-
ARROW-14741 [C++] Allow CSV Writer to take a RecordBatchReader as input
- Resolved
- is blocked by
-
ARROW-15128 [C++] segfault when writing CSV from RecordBatchReader
- Closed
- is duplicated by
-
ARROW-15104 write_parquet() / write_csv_arrow() cannot stream a dataset object back to S3
- Closed
- relates to
-
ARROW-15271 [R] Refactor do_exec_plan to return a RecordBatchReader
- Resolved
- links to