Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-2205

[Python] Option for integer object nulls

    Details

      Description

      I have a use case where the loss of precision in casting integers to floats matters, and pandas supports storing integers with nulls without loss of precision in object columns. However, a roundtrip through arrow will cast the object columns to float columns, even though the object columns are stored in arrow as integers with nulls.

      This is a minimal example demonstrating the behavior of a roundtrip:

      import numpy as np
      import pandas as pd
      import pyarrow as pa
      
      df = pd.DataFrame({"a": np.array([None, 1], dtype=object)})
      df_pa = pa.Table.from_pandas(df).to_pandas()
      
      print(df)
      print(df_pa)
      

      The output is:

            a
      0  None
      1     1
           a
      0  NaN
      1  1.0
      

      This seems to be the desired behavior, given test_int_object_nulls in test_convert_pandas.

      I think it would be useful to add an option in the to_pandas methods to allow integers with nulls to be returned as object columns. The option can default to false in order to preserve the current behavior.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                adshieh Albert Shieh
                Reporter:
                adshieh Albert Shieh
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: