Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-2205

[Python] Option for integer object nulls

    XMLWordPrintableJSON

Details

    Description

      I have a use case where the loss of precision in casting integers to floats matters, and pandas supports storing integers with nulls without loss of precision in object columns. However, a roundtrip through arrow will cast the object columns to float columns, even though the object columns are stored in arrow as integers with nulls.

      This is a minimal example demonstrating the behavior of a roundtrip:

      import numpy as np
      import pandas as pd
      import pyarrow as pa
      
      df = pd.DataFrame({"a": np.array([None, 1], dtype=object)})
      df_pa = pa.Table.from_pandas(df).to_pandas()
      
      print(df)
      print(df_pa)
      

      The output is:

            a
      0  None
      1     1
           a
      0  NaN
      1  1.0
      

      This seems to be the desired behavior, given test_int_object_nulls in test_convert_pandas.

      I think it would be useful to add an option in the to_pandas methods to allow integers with nulls to be returned as object columns. The option can default to false in order to preserve the current behavior.

      Attachments

        Issue Links

          Activity

            People

              adshieh Albert Shieh
              adshieh Albert Shieh
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: