[IMPALA-5197] Parquet scan may incorrectly report "Corrupt Parquet file" in the logs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: Impala 2.9.0
Fix Version/s: Impala 2.9.0
Component/s: Backend
Labels:
- stress

Target Version:

Impala 2.9.0
Epic Color:
ghx-label-5

Description

With ~~IMPALA-5186~~, dhecht noticed messages like:

I0407 12:57:05.306138 85140 status.cc:114] Corrupt Parquet file 'hdfs://vc0332.halxg.cloudera.com:8020/user/hive/warehouse/tpch_100_parquet.db/partsupp/3444dbb2ccec395e-45da764500000007_1009013170_data.0.parq': column 'ps_partkey' had 1024 remaining values but expected 0

I spent a bit more time investigating this, and it seems possible but difficult to reproduce this, though it's non-deterministic from what I can tell.

The stress test executes various COMPUTE STATS statements on the tables under test, with different MT_DOP settings. This is also in conjunction with a memory limit which the stress test applies to each statement.

Sometimes, it's possible to trigger these corrupt parquet file warnings. When that happens, the COMPUTE STATS fails with "memory limit exceeded".

For example, these queries reproduced the problem on the first try:

set mem_limit=1225m;
set mt_dop=16;
compute stats tpcds_300_decimal_parquet.store_sales;

set mem_limit=527m;
set mt_dop=4;
compute stats tpcds_300_decimal_parquet.store_sales;

These memory limits are right on the edge of the apparent limits of the statement. Sometimes the statement would appear to completely succeed; other times it would not be able to under the memory limits, but no corrupt messages were printed.

Attachments

Issue Links

is broken by

IMPALA-3962 stress test core dump on two nodes with strange, possibly corrupt stack

Resolved

Activity

People

Assignee:: Michael Ho

Reporter:: Michael Brown

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 11/Apr/17 18:00

Updated:: 09/May/17 17:40

Resolved:: 09/May/17 17:40