Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-9748

[C++][Dataset] Remove Selector, ignore_prefixes from FileSystemDatasetFactory

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.0.0
    • None
    • C++

    Description

      Currently FileSystemDatasetFactory can be constructed with an explicit listing of files or with a fs::FileSelector. Since the selector does not support sophisticated selection criteria, FileSystemFactoryOptions::selector_ignore_prefixes to allow users to exclude undesired files such as _metadata or .DS_STORE.

      The selector + ignored prefixes mechanism is inflexible with numerous edge cases ( ARROW-9644 ARROW-9573 ). Furthermore, implementing more advanced file selection logic in dataset discovery prevents it from being reused by other consumers of the file system api.

      Remove FileSystemDatasetFactory's constructor-from-selector, optionally adding that functionality directly to fs::FileSelector. An explicit listing of files for use in construction of a FileSystemDatasetFactory can then be assembled using an fs::FileSelector and/or other globbing libraries, with arbitrary inclusion logic.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              bkietz Ben Kietzman
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: