Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
5.0.0
-
Ubuntu 21.04
Description
The following code fails with:
File "[...]/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 1052, in _pandas_type_to_numpy_type return np.dtype(pandas_type) TypeError: data type 'datetimetz' not understood
Sample:
def run(): filename = "test.parquet" df = pd.DataFrame( data=range(31), columns=list("A"), index=pd.date_range("2021-01-01", "2021-01-31", freq="D", tz="CET"), ).T table = pa.Table.from_pandas(df) pq.write_to_dataset(table, root_path=filename) result = pq.read_table(filename).to_pandas() return result if __name__ == "__main__": run()
The code tries to store a dataframe where the columns are timezone aware datetimes.
Observations:
If I remove the .T at the end of the dataframe, so that the datatime index are rows it is working (but not what I want).
If I remove the timezone information tz="CET" the code is working.
I assume this bug is related to Error in pandas conversion for datetimetz row index
Attachments
Issue Links
- links to