Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
0.17.0, 0.17.1
-
None
-
None
Description
In pyarrow 0.17.x when deserialising a pandas dataframe which has pd.NaT values in an object column, an ArrowInvalid error is raised:
pyarrow.lib.ArrowInvalid: Casting from timestamp[us] to timestamp[ns] would result in out of bounds timestamp: -62135596800000000
Reproducible code (using pyarrow==0.17.1 and pandas==1.0.3):
import pandas as pd import pyarrow.ipc as ipc import pyarrow as pa v = pd.DataFrame({ "bar": [1592808896000000000, pd.NaT] }) # works fine as datetime64[ns] but not as object type v = v.astype({"bar": "datetime64[ns]"}).astype({"bar": "object"}) bs = ipc.serialize_pandas(v).to_pybytes() df = ipc.deserialize_pandas(bs) # error
In pyarrow 0.16.0 no error occurs and df is returned as:
bar 0 2020-06-22 06:54:56.000000000 1 1754-08-30 22:43:41.128654848
Was the change in 0.17.x to raise an error an intentional behaviour change? Given the previous behaviour in 0.16.0 seemed a bit like undefined behaviour already, where it converted NaT to 1754-08-30 (which seems due to the -62135596800000000 timestamp mentioned in the error above?).
Also note that when serialized as datetime64[ns] rather than object, the code works fine in both 0.17.x and 0.16.0, returning:
bar 0 2020-06-22 06:54:56 1 NaT
Attachments
Issue Links
- is caused by
-
ARROW-842 [Python] Handle more kinds of null sentinel objects from pandas 0.x
- Resolved