Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6775

[C++] [Python] Proposal for several Array utility functions

    XMLWordPrintableJSON

    Details

      Description

      Hi,

      We developed several utilities that computes / accesses certain properties of Arrays and wonder if they make sense to get them into the upstream (into both the C++ API and pyarrow) and assuming yes, where is the best place to put them?

      Maybe I have overlooked existing APIs that already do the same.. in that case please point out.

       

      1/ ListLengthFromListArray(ListArray&)

      Returns lengths of lists in a ListArray, as a Int32Array (or Int64Array for large lists). For example:

      [[1, 2, 3], [], None] => [3, 0, 0] (or [3, 0, None], but we hope the returned array can be converted to numpy)

       

      2/ GetBinaryArrayTotalByteSize(BinaryArray&)

      Returns the total byte size of a BinaryArray (basically offset[len - 1] - offset[0]).

      Alternatively, a BinaryArray::Flatten() -> Uint8Array would work.

       

      3/ GetArrayNullBitmapAsByteArray(Array&)

      Returns the array's null bitmap as a UInt8Array (which can be efficiently converted to a bool numpy array)

       

      4/ GetFlattenedArrayParentIndices(ListArray&)

      Makes a int32 array of the same length as the flattened ListArray. returned_array[i] == j means i-th element in the flattened ListArray came from j-th list in the ListArray.

      For example [[1,2,3], [], None, [4,5]] => [0, 0, 0, 3, 3]

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                wesm Wes McKinney
                Reporter:
                brillsp Zhuo Peng
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 10m
                  1h 10m