Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-2428

[Python] Add API to map Arrow types (including extension types) to pandas ExtensionArray instances for to_pandas conversions

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.0.0
    • Component/s: Python

      Description

      With the next release of Pandas, it will be possible to define custom column types that back a pandas.Series. Thus we will not be able to cover all possible column types in the to_pandas conversion by default as we won't be aware of all extension arrays.

      To enable users to create ExtensionArray instances from Arrow columns in the to_pandas conversion, we should provide a hook in the to_pandas call where they can overload the default conversion routines with the ones that produce their ExtensionArray instances.

      This should avoid additional copies in the case where we would nowadays first convert the Arrow column into a default Pandas column (probably of object type) and the user would afterwards convert it to a more efficient ExtensionArray. This hook here will be especially useful when you build ExtensionArrays where the storage is backed by Arrow.

      The meta-issue that tracks the implementation inside of Pandas is: https://github.com/pandas-dev/pandas/issues/19696

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jorisvandenbossche Joris Van den Bossche
                Reporter:
                uwe Uwe Korn
              • Votes:
                2 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h
                  2h