Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
Currently if you try to append together metadata from row groups with different schemas , you get the following error:
File "/home/mmilton/.conda/envs/mmilton/envs/driverpipe/lib/python3.9/site-packages/dask/dataframe/io/parquet/arrow.py", line 52, in _append_row_groups metadata.append_row_groups(md) File "pyarrow/_parquet.pyx", line 628, in pyarrow._parquet.FileMetaData.append_row_groups self._metadata.AppendRowGroups(deref(c_metadata)) RuntimeError: AppendRowGroups requires equal schemas.
What would be useful here is to actually pass the schema difference in the error object in terms of which columns disagree. This information should also be in the error message.
For example if it said:
RuntimeError: AppendRowGroups requires equal schemas. Column "foo" was previously an int32 but the latest row group is storing it as an int64
Attachments
Issue Links
- links to