Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.0.0
-
None
Description
Currently FileSystemDatasetFactory can be constructed with an explicit listing of files or with a fs::FileSelector. Since the selector does not support sophisticated selection criteria, FileSystemFactoryOptions::selector_ignore_prefixes to allow users to exclude undesired files such as _metadata or .DS_STORE.
The selector + ignored prefixes mechanism is inflexible with numerous edge cases ( ARROW-9644 ARROW-9573 ). Furthermore, implementing more advanced file selection logic in dataset discovery prevents it from being reused by other consumers of the file system api.
Remove FileSystemDatasetFactory's constructor-from-selector, optionally adding that functionality directly to fs::FileSelector. An explicit listing of files for use in construction of a FileSystemDatasetFactory can then be assembled using an fs::FileSelector and/or other globbing libraries, with arbitrary inclusion logic.
Attachments
Issue Links
- relates to
-
ARROW-9657 [R][Dataset] Expose more FileSystemDatasetFactory options
- Open