Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
From https://github.com/pandas-dev/pandas/issues/35997: it seems we are handling a normal column and index column differently in the conversion to pandas.
In [5]: import pandas as pd ...: from datetime import datetime, timezone ...: ...: df = pd.DataFrame([[datetime.now(timezone.utc), datetime.now(timezone.utc)]], columns=['date_index', 'date_column']) ...: table = pa.Table.from_pandas(df.set_index('date_index')) ...: In [6]: table Out[6]: pyarrow.Table date_column: timestamp[ns, tz=+00:00] date_index: timestamp[ns, tz=+00:00] In [7]: table.to_pandas() ... UnknownTimeZoneError: '+00:00'
So this happens specifically for "fixed offset" timezones, and only for index columns (eg table.select(["date_column"]).to_pandas() works fine).
It seems this is because for columns we use our helper make_tz_aware to convert the string "+01:00" to a python timezone, which is then understood by pandas (the string is not handled by pandas). But for the index column we fail to do this.
Attachments
Issue Links
- links to