Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Currently R completely hides the `ScanOptions` class.
In python the class is exposed but the documentation prefers `dataset.scan` (which hides both the scanner and the scan options).
However, there is some useful information in the `ScanOptions`. Specifically, the projected schema (which is a product of the dataset schema and the projection expression and not easily recreated) and the materialized fields (the list of fields referenced by either the filter or the projection) which might be useful for reporting purposes.
Currently R uses the projected schema to convert a list of column names into a partition schema. Python does not rely on either field.
Options:
- Keep the status quo
- Expose the ScanOptions object (which itself is exposed via the Scanner)
- Expose the interesting fields via the Scanner
Currently the C++ design is halfway between the latter two (projected schema is exposed and options). My preference would be the third option. It raises a further question about how to expose the scanner itself in Python? Should the user be using ScannerBuilder? Should they use NewScan? Should they use the scanner directly at all or should it be hidden?
Attachments
Issue Links
- is depended upon by
-
ARROW-16410 [C++] Scanner -> ScanNode
- Open