Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-400

Error reading some files after PARQUET-77 bytebuffer read path

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.9.0, 1.8.2
    • Component/s: None
    • Labels:
      None

      Description

      This issue is based on a discussion on the list started by Daniel C. Weeks

      Full discussion:
      https://mail-archives.apache.org/mod_mbox/parquet-dev/201512.mbox/%3CCAMpYv7C_szTheua9N95bXvbd2ROmV63BFiJTK-K-aDNK6ZNBKA%40mail.gmail.com%3E

      From the thread (he later provided a small repro file that is attached here):

      Just wanted to see if you or anyone else has run into problems reading
      files after the ByteBuffer patch. I've been running into issues and have
      narrowed it down to the ByteBuffer commit using a small repro file (written
      with 1.6.0, unfortunately can't share the data).

      It doesn't happen for every file, but those that fail give this error:

      can not read class org.apache.parquet.format.PageHeader: Required field
      'uncompressed_page_size' was not found in serialized data! Struct:
      PageHeader(type:null, uncompressed_page_size:0, compressed_page_size:0)

        Attachments

        1. bytebyffer_read_fail.gz.parquet
          218 kB
          Jason Altekruse

          Issue Links

            Activity

              People

              • Assignee:
                jaltekruse Jason Altekruse
                Reporter:
                jaltekruse Jason Altekruse
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: