Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-2806

[Python] Inconsistent handling of np.nan

    Details

      Description

      Currently we handle np.nan differently between having a list or a numpy array as an input to pa.array():

      >>> pa.array(np.array([1, np.nan]))
      <pyarrow.lib.DoubleArray object at 0x11680bea8>
      [
        1.0,
        nan
      ]
      
      >>> pa.array([1., np.nan])
      Out[9]:
      <pyarrow.lib.DoubleArray object at 0x10bdacbd8>
      [
        1.0,
        NA
      ]
      

      I would actually think the last one is the correct one. Especially once one casts this to an integer column. There the first one produces a column with INT_MIN and the second one produces a real null.

      But, in test_array_conversions_no_sentinel_values we check that np.nan does not produce a Null.

      Even weirder:

      >>> df = pd.DataFrame({'a': [1., None]})
      >>> df
           a
      0  1.0
      1  NaN
      >>> pa.Table.from_pandas(df).column(0)
      <Column name='a' type=DataType(double)>
      chunk 0: <pyarrow.lib.DoubleArray object at 0x104bbf958>
      [
        1.0,
        NA
      ]
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                xhochy Uwe L. Korn
                Reporter:
                xhochy Uwe L. Korn
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 50m
                  1h 50m