Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-7883

[Python] pyarrow.serialize does not support pandas nullable integer type

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • None
    • None
    • Python

    Description

      Serializing an IntegerArray doesn't seem to work with the latest version of pandas and pyarrow

      import pandas as pd
      import pyarrow  # version 0.16
      import pyarrow as pa
      
      # workaround suggested in https://issues.apache.org/jira/browse/ARROW-5379
      pd.arrays.IntegerArray.__arrow_array__ = lambda self, type: pyarrow.array(self._data, mask=self._mask, type=type)
      
      df = pd.DataFrame([1, 2])
      df = df.convert_dtypes()
      
      # following https://arrow.apache.org/docs/python/ipc.html#serializing-pandas-objects
      context = pa.default_serialization_context()
      context.serialize(df) 
       SerializationCallbackError: pyarrow does not know how to serialize objects of type <class 'pandas.core.arrays.integer.IntegerArray'>

      xref https://stackoverflow.com/q/60285486/2146052

      Attachments

        Activity

          People

            Unassigned Unassigned
            Jonen Benjamin
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: