Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-1401

Many read errors with parquet files

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: Impala 2.0
    • Fix Version/s: Impala 2.0.1
    • Component/s: None
    • Labels:
    • Environment:
      ubuntu, installed from packages

      Description

      We're seeing failures reading almost any parquet file we have with Impala 2.0. The files are generally generated with Scalding and parquet-mr 1.6.0rc3.

      I've attached a small test file which exhibits the failures (and is built under the conditions listed above). When we run

      SELECT * from <table> LIMIT 100
      

      We get errors like the following:

      Backend 0:Metadata states that in group hdfs://path/to/data/part-m-00000.parquet[0] there are 10 rows, but only 0 rows were read.
      couldn't deserialize thrift msg:
      No more data to read.
      ParquetScanner: Could not deserialize page header.
      

      And no result is returned.

        Attachments

          Activity

            People

            • Assignee:
              skye Skye Wanderman-Milne
              Reporter:
              colinmarc Colin Marc
            • Votes:
              6 Vote for this issue
              Watchers:
              20 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: