Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5197

Parquet scan may incorrectly report "Corrupt Parquet file" in the logs

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Impala 2.9.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Backend
    • Labels:

      Description

      With IMPALA-5186, Dan Hecht noticed messages like:

      I0407 12:57:05.306138 85140 status.cc:114] Corrupt Parquet file 'hdfs://vc0332.halxg.cloudera.com:8020/user/hive/warehouse/tpch_100_parquet.db/partsupp/3444dbb2ccec395e-45da764500000007_1009013170_data.0.parq': column 'ps_partkey' had 1024 remaining values but expected 0
      

      I spent a bit more time investigating this, and it seems possible but difficult to reproduce this, though it's non-deterministic from what I can tell.

      The stress test executes various COMPUTE STATS statements on the tables under test, with different MT_DOP settings. This is also in conjunction with a memory limit which the stress test applies to each statement.

      Sometimes, it's possible to trigger these corrupt parquet file warnings. When that happens, the COMPUTE STATS fails with "memory limit exceeded".

      For example, these queries reproduced the problem on the first try:

      set mem_limit=1225m;
      set mt_dop=16;
      compute stats tpcds_300_decimal_parquet.store_sales;
      
      set mem_limit=527m;
      set mt_dop=4;
      compute stats tpcds_300_decimal_parquet.store_sales;
      

      These memory limits are right on the edge of the apparent limits of the statement. Sometimes the statement would appear to completely succeed; other times it would not be able to under the memory limits, but no corrupt messages were printed.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                kwho Michael Ho
                Reporter:
                mikesbrown Michael Brown
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: