Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
Right now we create Tables, RecordBatches, ChunkedArrays, and Arrays using the corresponding $create() functions (or a few shortcut functions). This works well for converting other Arrow or base R types to Arow objects but doesn’t work well for objects in other packages (e.g., sf). This is related to ARROW-14378 in that it provides a mechanism for other packages support writing objects to Arrow in a more Arrow-native form instead of serializing attributes that are unlikely to be readable in other packages. Many of these came up when experimenting with carrow when trying to provide seamless arrow package compatibility for S3 objects that wrap external pointers to C API data structures. S3 is a good way to do this because the other package doesn't have to put arrow in Imports since it's a heavy dependency.
For argument’s sake I’ll propose adding the following methods:
- as_arrow_array(x, type = NULL) -> Array
- as_arrow_chunked_array(x, type = NULL) -> ChunkedArray
- as_arrow_record_batch(x, schema = NULL) -> RecordBatch
- as_arrow_table(x, schema = NULL) -> Table
- as_arrow_data_type -> DataType
- as_arrow_record_batch_reader(x, schema = NULL) -> RecordBatchReader
I’ll note that use as_adq() internally for similar reasons (to convert a few different object types into a arrow dplyr query when that’s the data structure we need).
As part of this ticket, if we choose to move forward, we should implement the default methods with some internal consistency (i.e., somebody wanting to provide Arrow support in a package probably only has to implement as_arrow_array() to get most support.
Attachments
Issue Links
- links to