Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9831

TestScannersFuzzing::test_fuzz_alltypes() hits DCHECK in parquet-page-reader.cc

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 4.0.0
    • Impala 4.0.0
    • Backend

    Description

      In a recent precommit job, an Impalad crashed with the following DCHECK:

      F0604 01:18:36.921769 30923 parquet-page-reader.cc:67] b64df3da7eea7c65:16f9c6e800000001] Check failed: col_end < file_desc.file_length (6820 vs. 6820) 

      The assert is checking that the end of a column is before the end of the file. This must be true, because the footer takes up space at the end of the file.

      The code for this DCHECK is:

        int64_t col_end = col_start + col_len;
        // Already validated in ValidateColumnOffsets()
        DCHECK_GT(col_end, 0);
        DCHECK_LT(col_end, file_desc.file_length); <---------

      This mentions that this was already validated in ParquetMetadataUtils::ValidateColumnOffsets(). That is where the problem is:

      int64_t col_len = col_chunk.meta_data.total_compressed_size;
      int64_t col_end = col_start + col_len;
      if (col_end <= 0 || col_end > file_length) {
        return Status(Substitute("Parquet file '$0': metadata is corrupt. Column $1 has "
            "invalid column offsets (offset=$2, size=$3, file_size=$4).", filename, i,
            col_start, col_len, file_length));
      }

      The condition should be "col_end >= file_length".

      If we knew the size of the parquet footer, this check could be stricter as well.

      Attachments

        Activity

          People

            joemcdonnell Joe McDonnell
            joemcdonnell Joe McDonnell
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: