Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Not A Problem
-
9.0.0
-
None
-
None
-
Windows10
Description
Parquet files cannot be opened in Windows Parquet Viewer when stored with Arrow Version 9.0.0. It worked when stored with version 8 and earlier.
Windows Parquet Viewer: 2.3.5 and 2.3.6
pyarrow version: 9.0.0
Error: System.AggregateException: One or more errors occured. ---> Parquet.ParquetException: encoding RLE_DICTIONARY is not supported.
at Parquet.File.DataColumnReader.ReadColumn(BinaryReader reader ... in DataColumnReader.cs: line 259
After further checking I found that it seems the problem seems to relate to a default parquet version change.
When I use pyarrow 9 and configure version to 1.0 it works again from the windows tool - when its 2.4 its not working (or supported in the windows tool).
df.to_parquet(r'C:\temp\test_10.parquet', version='1.0')
df.to_parquet(r'C:\temp\test_24.parquet', version='2.4')
Question might be if such a default change is a bug or a feature.
Finally found:
ARROW-12203- [C++][Python] Switch default Parquet version to 2.4 (#13280)
So probably its a feature - and we need to adapt our code