Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.13.0
-
Ubuntu
Description
Creating ticket for issue reported in github(https://github.com/apache/arrow/issues/4284)
pyarrow (Issue with timestamp conversion from arrow to pandas)
pyarrow Table.to_pandas has option date_as_object but does not have similar option for timestamp. When a timestamp column in arrow table is converted to pandas the target datetype is pd.Timestamp and pd.Timestamp does not handle time > 2262-04-11 23:47:16.854775807 and hence in the below scenario the date is transformed to incorrect value. Adding timestamp_as_object option in pa.Table.to_pandas will help in this scenario.
#Python(3.6.8)
import pandas as pd
import pyarrow as pa
pd.version
'0.24.1'
pa.version
'0.13.0'
import datetime
df = pd.DataFrame({"test_date": [datetime.datetime(3000,12,31,12,0),datetime.datetime(3100,12,31,12,0)]})
df
test_date
0 3000-12-31 12:00:00
1 3100-12-31 12:00:00
pa_table = pa.Table.from_pandas(df)
pa_table[0]
Column name='test_date' type=TimestampType(timestamp[us])
[
[
32535172800000000,
35690846400000000
]
]
pa_table.to_pandas()
test_date
0 1831-11-22 12:50:52.580896768
1 1931-11-22 12:50:52.580896768
Attachments
Issue Links
- is related to
-
ARROW-3448 [Python] Pandas roundtrip doesn't preserve list of datetime objects
- Open
- relates to
-
ARROW-7856 [Python] to_pandas() causing datetimes > pd.Timestamp.max to wrap around
- Closed
- links to