Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-17450

[C++][Parquet] Cannot read columns with Run Length Encoding (RLE)

    XMLWordPrintableJSON

Details

    Description

      Reading from Arrow-Parquet c++, Parquet files with RLE encoding in columns error out with 

      "Unknown encoding type."

      The error is thrown only in arrow-parquet c++ and error is due to RLE encoding not defined in the decoder. 

       

      https://github.com/apache/arrow/blob/master/cpp/src/parquet/column_reader.cc#L769

      The files were generated from Athena using Iceberg, with the following query. 

       

      create table test (d_bool1 boolean)
      LOCATION 's3://'
      TBLPROPERTIES (
      'table_type'='ICEBERG',
      'format'='parquet'
      );
      insert into  test VALUES (true);
      

       

      Attachments

        1. athena_boolean.gz.parquet
          0.2 kB
          Nishanth

        Issue Links

          Activity

            People

              sfc-gh-nthimmegowda Nishanth
              sfc-gh-nthimmegowda Nishanth
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 9.5h
                  9.5h