Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-7066

[Python] support returning ChunkedArray from __arrow_array__ ?

    XMLWordPrintableJSON

    Details

      Description

      The __arrow_array__ protocol was added so that custom objects can define how they should be converted to a pyarrow Array (similar to numpy's __array__). This is then also used to support converting pandas DataFrames with columns using pandas' ExtensionArrays to a pyarrow Table (if the pandas ExtensionArray, such as nullable integer type, implements this __arrow_array__ method).

      This last use case could also be useful for fletcher (https://github.com/xhochy/fletcher/, a package that implements pandas ExtensionArrays that wrap pyarrow arrays, so they can be stored as is in a pandas DataFrame).
      However, fletcher stores ChunkedArrays in ExtensionArry / the columns of a pandas DataFrame (to have a better mapping with a Table, where the columns also consist of chunked arrays). While we currently require that the return value of __arrow_array__ is a pyarrow.Array.

      So I was wondering: could we relax this constraint and also allow ChunkedArray as return value?
      However, this protocol is currently called in the pa.array(..) function, which probably should keep returning an Array (and not ChunkedArray in certain cases).

      cc Uwe Korn

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jorisvandenbossche Joris Van den Bossche
                Reporter:
                jorisvandenbossche Joris Van den Bossche
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 10m
                  1h 10m