Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
Currently: a fragment is a product of a scan; it is a lazy collection of scan tasks corresponding to a data source which is logically singular (like a single file, a single row group, ...). It would be more useful if instead a fragment were the direct object of a scan; one scans a fragment (or a collection of fragments):
- Remove ScanOptions from Fragment's properties and move it into Fragment::Scan parameters.
- Remove ScanOptions from Dataset::GetFragments. We can provide an overload to support predicate pushdown in FileSystemDataset and UnionDataset Dataset::GetFragments(std::shared_ptr<Expression> predicate).
- Expose lazy accessor to Fragment::physical_schema()
- Consolidate ScanOptions and ScanContext
This will lessen the cognitive dissonance between fragments and files since fragments will no longer include references to scan properties.
Attachments
Issue Links
- blocks
-
ARROW-8282 [C++/Python][Dataset] Support schema evolution for integer columns
-
- Resolved
-
-
ARROW-8318 [C++][Dataset] Dataset should instantiate Fragment
-
- Resolved
-
- links to