Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
6.0.1
-
None
-
None
Description
When trying to save a Pandas dataframe with a nested type (list within list, list within dict) using pyarrow engine, the following error is encountered
ArrowInvalid: ('cannot mix list and non-list, non-null values', 'Conversion failed for column A with type object')
Repro:
import pandas as pd x = pd.DataFrame({"A": [[24, 27, [1, 1]]]}) x.to_parquet('/tmp/a.pqt', engine="pyarrow")
Doing a bit of googling, it appears that this is a known Arrow shortcoming. However, this is a commonly encountered datastructure, and 'fastparquet' handles this seamlessly. Is there a proposed timeline/plan for fixing this?