Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
The __arrow_array__ protocol was added so that custom objects can define how they should be converted to a pyarrow Array (similar to numpy's __array__). This is then also used to support converting pandas DataFrames with columns using pandas' ExtensionArrays to a pyarrow Table (if the pandas ExtensionArray, such as nullable integer type, implements this __arrow_array__ method).
This last use case could also be useful for fletcher (https://github.com/xhochy/fletcher/, a package that implements pandas ExtensionArrays that wrap pyarrow arrays, so they can be stored as is in a pandas DataFrame).
However, fletcher stores ChunkedArrays in ExtensionArry / the columns of a pandas DataFrame (to have a better mapping with a Table, where the columns also consist of chunked arrays). While we currently require that the return value of __arrow_array__ is a pyarrow.Array.
So I was wondering: could we relax this constraint and also allow ChunkedArray as return value?
However, this protocol is currently called in the pa.array(..) function, which probably should keep returning an Array (and not ChunkedArray in certain cases).
cc uwe
Attachments
Issue Links
- links to