Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
When converting Pandas data that contains floating point values to boolean, incorrect results are given
In [2]: import pyarrow as pa ...: import pandas as pd ...: a = [0.0, 1.0, 2.0, None, float('NaN')] ...: In [3]: s = pd.Series(a) In [4]: pa.Array.from_pandas(s, type=pa.bool_()) Out[4]: <pyarrow.lib.BooleanArray object at 0x7f1bfd099e68> [ False, False, False, False, False ]
Expected output should be True when value != 0
This originated from SPARK-25461
Attachments
Issue Links
- relates to
-
SPARK-25461 PySpark Pandas UDF outputs incorrect results when input columns contain None
- Resolved
- links to