Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.11.1
-
pandas 0.23.4
pyarrow 0.11.1
Python 2.7, 3.5 - 3.7
MacOS High Sierra (10.13.6)
Description
On CPython 2.7.15, 3.5.6, 3.6.6, and 3.7.0, creating a Pandas DataFrame with a datetime.datetime object serializes to Parquet just fine, but crashes with an AttributeError if you try to use the built-in timezone objects.
To reproduce, on Python 3:
import datetime as dt import pandas as pd df = pd.DataFrame({'foo': [dt.datetime(2018, 1, 1, 1, 23, 45, tzinfo=dt.timezone.utc)]}) df.to_parquet('data.parq')
On Python 2, create a subclass of datetime.tzinfo as shown here and try the same thing.
The following exception results:
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pandas/core/frame.py", line 1945, in to_parquet compression=compression, **kwargs) File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pandas/io/parquet.py", line 257, in to_parquet return impl.write(df, path, compression=compression, **kwargs) File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pandas/io/parquet.py", line 118, in write table = self.api.Table.from_pandas(df) File "pyarrow/table.pxi", line 1217, in pyarrow.lib.Table.from_pandas File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 381, in dataframe_to_arrays convert_types)] File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 380, in <listcomp> for c, t in zip(columns_to_convert, File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 370, in convert_column return pa.array(col, type=ty, from_pandas=True, safe=safe) File "pyarrow/array.pxi", line 167, in pyarrow.lib.array File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 409, in get_datetimetz_type type_ = pa.timestamp(unit, tz) File "pyarrow/types.pxi", line 1038, in pyarrow.lib.timestamp File "pyarrow/types.pxi", line 955, in pyarrow.lib.tzinfo_to_string AttributeError: 'datetime.timezone' object has no attribute 'zone' 'datetime.timezone' object has no attribute 'zone'
This doesn't happen if you use pytz.UTC as the timezone object.
Attachments
Issue Links
- is related to
-
ARROW-4055 [Python] Fails to convert pytz.utc with versions 2018.3 and earlier
- Resolved
- links to