Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
Triggered by https://stackoverflow.com/questions/71035754/pyarrow-drop-a-column-in-a-nested-structure. I thought there was already an issue about this, but don't directly find one.
Assume you have a struct array with some fields:
>>> arr = pa.StructArray.from_arrays([[1, 2, 3]]*3, names=['a', 'b', 'c']) >>> arr.type StructType(struct<a: int64, b: int64, c: int64>)
We have a kernel to select a single child field:
>>> pc.struct_field(arr, [0]) <pyarrow.lib.Int64Array object at 0x7ffa9e229940> [ 1, 2, 3 ]
But if you want to subset the StructArray to some of its fields, resulting in a new StructArray, that's not possible with struct_field, and doing this manually is a bit cumbersome:
>>> fields = ['a', 'c'] >>> arrays = [arr.field(n) for n in fields] >>> arr_subset = pa.StructArray.from_arrays(arrays, names=fields) >>> arr_subset.type StructType(struct<a: int64, c: int64>)
(this is still OK, but if you had a ChunkedArray, it certainly gets annoying)
One option could be to expand the existing struct_field to allow selecting multiple fields (although that probably gets ambigous/confusing with how you currently select a recursively nested field -> [0, 1] currently means "first child, second subchild" and not "first and second child").
Or a new kernel like "struct_subset" or some other name.
This might also overlap with general projection functionality? (cc westonpace)
Attachments
Issue Links
- is related to
-
ARROW-16112 [C++] Allow reordering fields of a StructArray via casting
- Open
- relates to
-
ARROW-1888 [C++] Implement casts from one struct type to another (with same field names and number of fields)
- Resolved
-
ARROW-7051 [C++] Improve MakeArrayOfNull to support creation of multiple arrays
- In Progress
-
ARROW-14658 [C++] Add basic support for nested field refs in scanning
- Resolved
- links to