Impala may return non-deterministic errors for certain corrupt Parquet files that are compressed. See the relevant snippet from BaseScalarColumnReader::ReadDataPage() below:
The 'decompressed_buffer' is not initialized, and it is possible that decompressor_->ProcessBlock32() succeeds without writing to all the bytes in the 'decompressed_buffer' leading to non-deterministic errors being reported later in the scan. For example, this may happen when the 'compressed_page_size' is corrupt and set to 1.
We've seen the following errors being reported for files like this: