[ARROW-2806] [Python] Inconsistent handling of np.nan - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.9.0
Fix Version/s: 0.10.0
Component/s: Python
Labels:
- pull-request-available

External issue URL:
https://github.com/apache/arrow/issues/19185

Description

Currently we handle np.nan differently between having a list or a numpy array as an input to pa.array():

>>> pa.array(np.array([1, np.nan]))
<pyarrow.lib.DoubleArray object at 0x11680bea8>
[
  1.0,
  nan
]

>>> pa.array([1., np.nan])
Out[9]:
<pyarrow.lib.DoubleArray object at 0x10bdacbd8>
[
  1.0,
  NA
]

I would actually think the last one is the correct one. Especially once one casts this to an integer column. There the first one produces a column with INT_MIN and the second one produces a real null.

But, in test_array_conversions_no_sentinel_values we check that np.nan does not produce a Null.

Even weirder:

>>> df = pd.DataFrame({'a': [1., None]})
>>> df
     a
0  1.0
1  NaN
>>> pa.Table.from_pandas(df).column(0)
<Column name='a' type=DataType(double)>
chunk 0: <pyarrow.lib.DoubleArray object at 0x104bbf958>
[
  1.0,
  NA
]

Attachments

Issue Links

links to

GitHub Pull Request #2270

Activity

People

Assignee:: Uwe Korn

Reporter:: Uwe Korn

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 07/Jul/18 15:28

Updated:: 11/Jan/23 07:23

Resolved:: 17/Jul/18 12:54

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

1h 50m