Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-1401

Many read errors with parquet files

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • Impala 2.0
    • Impala 2.0.1
    • None
    • ubuntu, installed from packages

    Description

      We're seeing failures reading almost any parquet file we have with Impala 2.0. The files are generally generated with Scalding and parquet-mr 1.6.0rc3.

      I've attached a small test file which exhibits the failures (and is built under the conditions listed above). When we run

      SELECT * from <table> LIMIT 100
      

      We get errors like the following:

      Backend 0:Metadata states that in group hdfs://path/to/data/part-m-00000.parquet[0] there are 10 rows, but only 0 rows were read.
      couldn't deserialize thrift msg:
      No more data to read.
      ParquetScanner: Could not deserialize page header.
      

      And no result is returned.

      Attachments

        1. broken.parquet
          2 kB
          Colin Marc

        Activity

          People

            skye Skye Wanderman-Milne
            colinmarc Colin Marc
            Votes:
            6 Vote for this issue
            Watchers:
            16 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: