[ARROW-12791] [R] Better error handling for DatasetFactory$Finish() when no format specified - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 5.0.0
Component/s: R
Labels:
- pull-request-available

External issue URL:
https://github.com/apache/arrow/issues/28529

Description

When I call the following code:

tf <- tempfile()
dir.create(tf)
on.exit(unlink(tf))
write_csv_arrow(mtcars[1:5,], file.path(tf, "file1.csv"))
write_csv_arrow(mtcars[6:11,], file.path(tf, "file2.csv"))
ds <- open_dataset(c(file.path(tf, "file1.csv"), file.path(tf, "file2.csv")))

I get the following error:

 Error: IOError: Could not open parquet input source '/tmp/RtmpSug6P8/file714931976ac54/file1.csv': Invalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.

However, in the documentation for open_dataset(), there is nothing saying that the input source cannot be a CSV or must be a Parquet file.

I think this is due to calling DataSetFactory$Finish() when schema is NULL and input files have no inherent schema (i.e. are CSVs).

Attachments

Issue Links

links to

GitHub Pull Request #10326

Activity

People

Assignee:: Nicola Crane

Reporter:: Nicola Crane

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 14/May/21 13:25

Updated:: 11/Jan/23 08:28

Resolved:: 04/Jun/21 20:11

Time Tracking

Estimated:

Not Specified

Remaining:

Logged: