Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
6.0.1
-
None
-
None
Description
Third-party packages may define dataset factories for table formats like Delta Lake and Apache Iceberg. These formats store metadata like schema, file lists, and file-level statistics on the side, and can construct a dataset without a discovery process needed. Python exposed enough API to do this successfully for a Delta Lake dataset reader here.
I propose adding the following to the R API:
- Expose Fragment as an R6 object
- Add the MakeFragment method to various file format objects. It's key that partition_expression is included as an argument. (See Python equivalent here)
- Add a dataset constructor that takes a list of Fragments