Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
Currently, the default `FileSystemFactoryOptions::exclude_invalid_files` will silently ignore unsupported files (either IO error, not of the valid format, corruption, missing compression codecs, etc...) when creating a `FileSystemSource`.
We should change this behavior to propagate an error in the Inspect/Finish calls by default and allow the user to toggle `exclude_invalid_files`. The error should contain at least the file path and a decipherable error (if possible).
Attachments
Issue Links
- is related to
-
ARROW-8283 [Python][Dataset] Non-existent files are silently dropped in pa.dataset.FileSystemDataset
- Resolved
-
ARROW-8058 [C++][Python][Dataset] Provide an option to toggle validation and schema inference in FileSystemDatasetFactoryOptions
- Resolved