Details
-
Wish
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
Description
Hi,
We developed several utilities that computes / accesses certain properties of Arrays and wonder if they make sense to get them into the upstream (into both the C++ API and pyarrow) and assuming yes, where is the best place to put them?
Maybe I have overlooked existing APIs that already do the same.. in that case please point out.
1/ ListLengthFromListArray(ListArray&)
Returns lengths of lists in a ListArray, as a Int32Array (or Int64Array for large lists). For example:
[[1, 2, 3], [], None] => [3, 0, 0] (or [3, 0, None], but we hope the returned array can be converted to numpy)
2/ GetBinaryArrayTotalByteSize(BinaryArray&)
Returns the total byte size of a BinaryArray (basically offset[len - 1] - offset[0]).
Alternatively, a BinaryArray::Flatten() -> Uint8Array would work.
3/ GetArrayNullBitmapAsByteArray(Array&)
Returns the array's null bitmap as a UInt8Array (which can be efficiently converted to a bool numpy array)
4/ GetFlattenedArrayParentIndices(ListArray&)
Makes a int32 array of the same length as the flattened ListArray. returned_array[i] == j means i-th element in the flattened ListArray came from j-th list in the ListArray.
For example [[1,2,3], [], None, [4,5]] => [0, 0, 0, 3, 3]
Attachments
Issue Links
- relates to
-
ARROW-8894 [C++] C++ array kernels framework and execution buildout (umbrella issue)
-
- Open
-
-
ARROW-9248 [C++] Add "list_size" function that returns Int32Array/Int64Array giving list cell sizes
-
- Resolved
-
-
ARROW-9249 [C++] Implement "list_parent_indices" vector function
-
- Resolved
-
-
ARROW-9247 [Python] Expose BinaryArray::total_values_length in bindings
-
- Resolved
-
-
ARROW-9116 [C++] Add BinaryArray::total_values_length()
-
- Resolved
-
- links to