Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-2135

[Python] NaN values silently casted to int64 when passing explicit schema for conversion in Table.from_pandas

    Details

      Description

      If you create a Table from a DataFrame of ints with a NaN value the NaN is improperly cast. Since pandas casts these to floats, when converted to a table the NaN is interpreted as an integer. This seems like a bug since a known limitation in pandas (the inability to have null valued integers data) is taking precedence over arrow's functionality to store these as an IntArray with nulls.

       

      import pyarrow as pa
      import pandas as pd
      
      df = pd.DataFrame({"a":[1, 2, pd.np.NaN]})
      schema = pa.schema([pa.field("a", pa.int64(), nullable=True)])
      table = pa.Table.from_pandas(df, schema=schema)
      table[0]
      
      
      <pyarrow.lib.Column object at 0x7f2151d19c90>
      chunk 0: <pyarrow.lib.Int64Array object at 0x7f213bf356d8>
      [
        1,
        2,
        -9223372036854775808
      ]

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                pitrou Antoine Pitrou
                Reporter:
                matthewgilbert Matthew Gilbert
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: