Details
Description
When writing a dataframe containing `datetime.datetime` in an object columns any datetime that is greater than pd.Timestamp.max or less than pd.Timestamp.min is wrapped around.
For reference these are the timestamp min and max values.
In [43]: pd.Timestamp.max Out[43]: Timestamp('2262-04-11 23:47:16.854775807') In [44]: pd.Timestamp.min Out[44]: Timestamp('1677-09-21 00:12:43.145225')
To reproduce the error using pandas
In [49]: df = pd.DataFrame({"A":[datetime.datetime(2262,4,12)]}) In [50]: df Out[50]: A 0 2262-04-12 00:00:00 In [51]: df.to_parquet("datetimething.parquet") In [52]: pd.read_parquet("datetimething.parquet") Out[52]: A 0 1677-09-21 00:25:26.290448384
I have narrowed it down as far as to note that it is happening when converting a `pa.Table` using the `to_pandas()` method.
In [30]: df = pd.DataFrame({"A":[datetime.datetime(2262,4,12)]})
In [31]: tf = pa.Table.from_pandas(df)
In [32]: tf.columns
Out[32]: [<pyarrow.lib.ChunkedArray object at 0x7f23884deef8>
[
[
2262-04-12 00:00:00.000000
]
]
]
In [33]: tf.to_pandas()
Out[33]: A
0 1677-09-21 00:25:26.290448384
Attachments
Issue Links
- is related to
-
ARROW-5359 [Python] timestamp_as_object support for pa.Table.to_pandas in pyarrow
-
- Resolved
-
-
ARROW-7758 [Python] Wrong conversion of timestamps that are out of bounds for pandas (eg 0000-01-01)
-
- Resolved
-