Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-2806

[Python] Inconsistent handling of np.nan

    XMLWordPrintableJSON

Details

    Description

      Currently we handle np.nan differently between having a list or a numpy array as an input to pa.array():

      >>> pa.array(np.array([1, np.nan]))
      <pyarrow.lib.DoubleArray object at 0x11680bea8>
      [
        1.0,
        nan
      ]
      
      >>> pa.array([1., np.nan])
      Out[9]:
      <pyarrow.lib.DoubleArray object at 0x10bdacbd8>
      [
        1.0,
        NA
      ]
      

      I would actually think the last one is the correct one. Especially once one casts this to an integer column. There the first one produces a column with INT_MIN and the second one produces a real null.

      But, in test_array_conversions_no_sentinel_values we check that np.nan does not produce a Null.

      Even weirder:

      >>> df = pd.DataFrame({'a': [1., None]})
      >>> df
           a
      0  1.0
      1  NaN
      >>> pa.Table.from_pandas(df).column(0)
      <Column name='a' type=DataType(double)>
      chunk 0: <pyarrow.lib.DoubleArray object at 0x104bbf958>
      [
        1.0,
        NA
      ]
      

      Attachments

        Issue Links

          Activity

            People

              uwe Uwe Korn
              uwe Uwe Korn
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 50m
                  1h 50m