Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-17352

Parquet files cannot be opened in Windows Parquet Viewer when stored with Arrow Version 9.0.0

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Not A Problem
    • 9.0.0
    • None
    • Parquet
    • None
    • Windows10

    Description

      Parquet files cannot be opened in Windows Parquet Viewer when stored with Arrow Version 9.0.0. It worked when stored with version 8 and earlier.

      Windows Parquet Viewer: 2.3.5 and 2.3.6

      pyarrow version: 9.0.0

      Error: System.AggregateException: One or more errors occured. ---> Parquet.ParquetException: encoding RLE_DICTIONARY is not supported. 

      at Parquet.File.DataColumnReader.ReadColumn(BinaryReader reader ... in DataColumnReader.cs: line 259

       

      After further checking I found that it seems the problem seems to relate to a default parquet version change.

      When I use pyarrow 9 and configure version to 1.0 it works again from the windows tool - when its 2.4 its not working (or supported in the windows tool).

      df.to_parquet(r'C:\temp\test_10.parquet', version='1.0')
      df.to_parquet(r'C:\temp\test_24.parquet', version='2.4')

      Question might be if such a default change is a bug or a feature.

      Finally found: 

      • ARROW-12203 - [C++][Python] Switch default Parquet version to 2.4 (#13280)

      So probably its a feature - and we need to adapt our code

       

      Attachments

        1. arrow9error.PNG
          50 kB
          Oliver Klein

        Activity

          People

            Unassigned Unassigned
            eedokl Oliver Klein
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: