Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5197

Parquet scan may incorrectly report "Corrupt Parquet file" in the logs

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Impala 2.9.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Backend
    • Labels:

      Description

      With IMPALA-5186, Dan Hecht noticed messages like:

      I0407 12:57:05.306138 85140 status.cc:114] Corrupt Parquet file 'hdfs://vc0332.halxg.cloudera.com:8020/user/hive/warehouse/tpch_100_parquet.db/partsupp/3444dbb2ccec395e-45da764500000007_1009013170_data.0.parq': column 'ps_partkey' had 1024 remaining values but expected 0
      

      I spent a bit more time investigating this, and it seems possible but difficult to reproduce this, though it's non-deterministic from what I can tell.

      The stress test executes various COMPUTE STATS statements on the tables under test, with different MT_DOP settings. This is also in conjunction with a memory limit which the stress test applies to each statement.

      Sometimes, it's possible to trigger these corrupt parquet file warnings. When that happens, the COMPUTE STATS fails with "memory limit exceeded".

      For example, these queries reproduced the problem on the first try:

      set mem_limit=1225m;
      set mt_dop=16;
      compute stats tpcds_300_decimal_parquet.store_sales;
      
      set mem_limit=527m;
      set mt_dop=4;
      compute stats tpcds_300_decimal_parquet.store_sales;
      

      These memory limits are right on the edge of the apparent limits of the statement. Sometimes the statement would appear to completely succeed; other times it would not be able to under the memory limits, but no corrupt messages were printed.

        Issue Links

          Activity

          Hide
          mikesbrown Michael Brown added a comment -

          The stress test would never catch this since from its end, it sees "memory limit exceeded". Are we interested in knowing whether previous Impala versions ever hit this?

          Show
          mikesbrown Michael Brown added a comment - The stress test would never catch this since from its end, it sees "memory limit exceeded". Are we interested in knowing whether previous Impala versions ever hit this?
          Hide
          dhecht Dan Hecht added a comment -

          Michael Brown I don't think we need to check previous impala versions. But do you know if it reproduces with mt_dop=0?

          Probably what happens is when we hit the MEM_LIMIT_EXCEEDED, we stop scanning and then the code that checks for inconsistencies in metadata doesn't realize we stopped because of the MEM_LIMIT_EXCEEDED and produces the error anyway.

          Show
          dhecht Dan Hecht added a comment - Michael Brown I don't think we need to check previous impala versions. But do you know if it reproduces with mt_dop=0? Probably what happens is when we hit the MEM_LIMIT_EXCEEDED, we stop scanning and then the code that checks for inconsistencies in metadata doesn't realize we stopped because of the MEM_LIMIT_EXCEEDED and produces the error anyway.
          Hide
          dhecht Dan Hecht added a comment - - edited

          Michael Brown, actually, I think we've found the bug so no need to check mt_dop=0.

          The bug is here:

          HdfsParquetScanner::AssembleRows()
               if (UNLIKELY(!continue_execution || num_tuples_mismatch)) {
                  // Skipping this row group. Free up all the resources with this row group.
                  FlushRowGroupResources(row_batch);
                  scratch_batch_->num_tuples = 0;
                  DCHECK(scratch_batch_->AtEnd());
                  *skip_row_group = true;
                  if (num_tuples_mismatch) {                            <==== should also check continue_exection
                    parse_status_.MergeStatus(Substitute("Corrupt Parquet file '$0': column '$1' "
                        "had $2 remaining values but expected $3", filename(),
                        col_reader->schema_element().name, last_num_tuples,
                        scratch_batch_->num_tuples));
                  }
                  return Status::OK();
          
          Show
          dhecht Dan Hecht added a comment - - edited Michael Brown , actually, I think we've found the bug so no need to check mt_dop=0. The bug is here: HdfsParquetScanner::AssembleRows() if (UNLIKELY(!continue_execution || num_tuples_mismatch)) { // Skipping this row group. Free up all the resources with this row group. FlushRowGroupResources(row_batch); scratch_batch_->num_tuples = 0; DCHECK(scratch_batch_->AtEnd()); *skip_row_group = true ; if (num_tuples_mismatch) { <==== should also check continue_exection parse_status_.MergeStatus(Substitute( "Corrupt Parquet file '$0': column '$1' " "had $2 remaining values but expected $3" , filename(), col_reader->schema_element().name, last_num_tuples, scratch_batch_->num_tuples)); } return Status::OK();
          Hide
          mikesbrown Michael Brown added a comment -

          Thank you for letting me know.

          Show
          mikesbrown Michael Brown added a comment - Thank you for letting me know.
          Hide
          tarmstrong Tim Armstrong added a comment -

          I saw this with mt_dop=0 when hitting memory limit exceeded on a large spilling query.

          use tpch_60_parquet;
          set mem_limit=3gb;
          select o_orderkey, o_custkey, o_orderstatus, sum(o_totalprice), o_orderdate, o_orderpriority, o_clerk, o_shippriority, o_comment from orders group by 1,2,3,5,6,7,8,9 having sum(o_totalprice) > 100;
          
          WARNINGS:
          Memory limit exceeded
          Corrupt Parquet file 'hdfs://localhost:20500/test-warehouse/tpch_60_parquet.db/orders/b74dedd28b9eb130-8edeb18100000000_1160152705_data.8.parq': column 'o_comment' had 1024 remaining values but expected 0
          
          Show
          tarmstrong Tim Armstrong added a comment - I saw this with mt_dop=0 when hitting memory limit exceeded on a large spilling query. use tpch_60_parquet; set mem_limit=3gb; select o_orderkey, o_custkey, o_orderstatus, sum(o_totalprice), o_orderdate, o_orderpriority, o_clerk, o_shippriority, o_comment from orders group by 1,2,3,5,6,7,8,9 having sum(o_totalprice) > 100; WARNINGS: Memory limit exceeded Corrupt Parquet file 'hdfs: //localhost:20500/test-warehouse/tpch_60_parquet.db/orders/b74dedd28b9eb130-8edeb18100000000_1160152705_data.8.parq': column 'o_comment' had 1024 remaining values but expected 0
          Hide
          kwho Michael Ho added a comment -
          Show
          kwho Michael Ho added a comment - Fix is out for review: https://gerrit.cloudera.org/#/c/6787/
          Hide
          kwho Michael Ho added a comment -

          IMPALA-5197: Erroneous corrupted Parquet file message
          The Parquet file column reader may fail in the middle
          of producing a scratch tuple batch for various reasons
          such as exceeding memory limit or cancellation. In which
          case, the scratch tuple batch may not have materialized
          all the rows in a row group. We shouldn't erroneously
          report that the file is corrupted in this case as the
          column reader didn't completely read the entire row group.

          A new test case is added to verify that we won't see this
          error message. A new failpoint phase GETNEXT_SCANNER is
          also added to differentiate it from the GETNEXT in the
          scan node itself.

          Change-Id: I9138039ec60fbe9deff250b8772036e40e42e1f6
          Reviewed-on: http://gerrit.cloudera.org:8080/6787
          Reviewed-by: Michael Ho <kwho@cloudera.com>
          Tested-by: Impala Public Jenkins

          Show
          kwho Michael Ho added a comment - IMPALA-5197 : Erroneous corrupted Parquet file message The Parquet file column reader may fail in the middle of producing a scratch tuple batch for various reasons such as exceeding memory limit or cancellation. In which case, the scratch tuple batch may not have materialized all the rows in a row group. We shouldn't erroneously report that the file is corrupted in this case as the column reader didn't completely read the entire row group. A new test case is added to verify that we won't see this error message. A new failpoint phase GETNEXT_SCANNER is also added to differentiate it from the GETNEXT in the scan node itself. Change-Id: I9138039ec60fbe9deff250b8772036e40e42e1f6 Reviewed-on: http://gerrit.cloudera.org:8080/6787 Reviewed-by: Michael Ho <kwho@cloudera.com> Tested-by: Impala Public Jenkins

            People

            • Assignee:
              kwho Michael Ho
              Reporter:
              mikesbrown Michael Brown
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development