Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
When I call the following code:
tf <- tempfile() dir.create(tf) on.exit(unlink(tf)) write_csv_arrow(mtcars[1:5,], file.path(tf, "file1.csv")) write_csv_arrow(mtcars[6:11,], file.path(tf, "file2.csv")) ds <- open_dataset(c(file.path(tf, "file1.csv"), file.path(tf, "file2.csv")))
I get the following error:
Error: IOError: Could not open parquet input source '/tmp/RtmpSug6P8/file714931976ac54/file1.csv': Invalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.
However, in the documentation for open_dataset(), there is nothing saying that the input source cannot be a CSV or must be a Parquet file.
I think this is due to calling DataSetFactory$Finish() when schema is NULL and input files have no inherent schema (i.e. are CSVs).
Attachments
Issue Links
- links to