Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
Impala 2.0
-
None
-
ubuntu, installed from packages
Description
We're seeing failures reading almost any parquet file we have with Impala 2.0. The files are generally generated with Scalding and parquet-mr 1.6.0rc3.
I've attached a small test file which exhibits the failures (and is built under the conditions listed above). When we run
SELECT * from <table> LIMIT 100
We get errors like the following:
Backend 0:Metadata states that in group hdfs://path/to/data/part-m-00000.parquet[0] there are 10 rows, but only 0 rows were read.
couldn't deserialize thrift msg:
No more data to read.
ParquetScanner: Could not deserialize page header.
And no result is returned.