Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
Impala 4.0.0
-
ghx-label-9
Description
In a recent precommit job, an Impalad crashed with the following DCHECK:
F0604 01:18:36.921769 30923 parquet-page-reader.cc:67] b64df3da7eea7c65:16f9c6e800000001] Check failed: col_end < file_desc.file_length (6820 vs. 6820)
The assert is checking that the end of a column is before the end of the file. This must be true, because the footer takes up space at the end of the file.
The code for this DCHECK is:
int64_t col_end = col_start + col_len; // Already validated in ValidateColumnOffsets() DCHECK_GT(col_end, 0); DCHECK_LT(col_end, file_desc.file_length); <---------
This mentions that this was already validated in ParquetMetadataUtils::ValidateColumnOffsets(). That is where the problem is:
int64_t col_len = col_chunk.meta_data.total_compressed_size; int64_t col_end = col_start + col_len; if (col_end <= 0 || col_end > file_length) { return Status(Substitute("Parquet file '$0': metadata is corrupt. Column $1 has " "invalid column offsets (offset=$2, size=$3, file_size=$4).", filename, i, col_start, col_len, file_length)); }
The condition should be "col_end >= file_length".
If we knew the size of the parquet footer, this check could be stricter as well.