Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.9.0
Description
Currently we handle np.nan differently between having a list or a numpy array as an input to pa.array():
>>> pa.array(np.array([1, np.nan])) <pyarrow.lib.DoubleArray object at 0x11680bea8> [ 1.0, nan ] >>> pa.array([1., np.nan]) Out[9]: <pyarrow.lib.DoubleArray object at 0x10bdacbd8> [ 1.0, NA ]
I would actually think the last one is the correct one. Especially once one casts this to an integer column. There the first one produces a column with INT_MIN and the second one produces a real null.
But, in test_array_conversions_no_sentinel_values we check that np.nan does not produce a Null.
Even weirder:
>>> df = pd.DataFrame({'a': [1., None]}) >>> df a 0 1.0 1 NaN >>> pa.Table.from_pandas(df).column(0) <Column name='a' type=DataType(double)> chunk 0: <pyarrow.lib.DoubleArray object at 0x104bbf958> [ 1.0, NA ]
Attachments
Issue Links
- links to