Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-7066

[Python] support returning ChunkedArray from __arrow_array__ ?

    XMLWordPrintableJSON

Details

    Description

      The __arrow_array__ protocol was added so that custom objects can define how they should be converted to a pyarrow Array (similar to numpy's __array__). This is then also used to support converting pandas DataFrames with columns using pandas' ExtensionArrays to a pyarrow Table (if the pandas ExtensionArray, such as nullable integer type, implements this __arrow_array__ method).

      This last use case could also be useful for fletcher (https://github.com/xhochy/fletcher/, a package that implements pandas ExtensionArrays that wrap pyarrow arrays, so they can be stored as is in a pandas DataFrame).
      However, fletcher stores ChunkedArrays in ExtensionArry / the columns of a pandas DataFrame (to have a better mapping with a Table, where the columns also consist of chunked arrays). While we currently require that the return value of __arrow_array__ is a pyarrow.Array.

      So I was wondering: could we relax this constraint and also allow ChunkedArray as return value?
      However, this protocol is currently called in the pa.array(..) function, which probably should keep returning an Array (and not ChunkedArray in certain cases).

      cc uwe

      Attachments

        Issue Links

          Activity

            People

              jorisvandenbossche Joris Van den Bossche
              jorisvandenbossche Joris Van den Bossche
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 10m
                  1h 10m