[ARROW-7066] [Python] support returning ChunkedArray from __arrow_array__ ? - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.16.0
Component/s: Python
Labels:
- pull-request-available

External issue URL:
https://github.com/apache/arrow/issues/23375

Description

The __arrow_array__ protocol was added so that custom objects can define how they should be converted to a pyarrow Array (similar to numpy's __array__). This is then also used to support converting pandas DataFrames with columns using pandas' ExtensionArrays to a pyarrow Table (if the pandas ExtensionArray, such as nullable integer type, implements this __arrow_array__ method).

This last use case could also be useful for fletcher (https://github.com/xhochy/fletcher/, a package that implements pandas ExtensionArrays that wrap pyarrow arrays, so they can be stored as is in a pandas DataFrame).
However, fletcher stores ChunkedArrays in ExtensionArry / the columns of a pandas DataFrame (to have a better mapping with a Table, where the columns also consist of chunked arrays). While we currently require that the return value of __arrow_array__ is a pyarrow.Array.

So I was wondering: could we relax this constraint and also allow ChunkedArray as return value?
However, this protocol is currently called in the pa.array(..) function, which probably should keep returning an Array (and not ChunkedArray in certain cases).

cc uwe

Attachments

Issue Links

links to

GitHub Pull Request #5794

Activity

People

Assignee:: Joris Van den Bossche

Reporter:: Joris Van den Bossche

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 05/Nov/19 12:52

Updated:: 11/Jan/23 07:51

Resolved:: 12/Nov/19 22:30

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

1h 10m