[IMPALA-8561] ScanRanges with mtime=-1 can lead to inconsistent reads when using the file handle cache - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: Impala 3.3.0
Fix Version/s: Impala 3.3.0
Component/s: Backend
Labels:
None

Target Version:

Impala 3.3.0
Epic Color:
ghx-label-2

Description

colored textThe file handle cache relies on the mtime to distinguish between different versions of a file. For example, if file X exists with mtime=1, then it is overwritten and the metadata is updated so that now it is at mtime=2, the file handle cache treats them as completely different things and can never use a single file handle to serve both. However, some codepaths generate ScanRanges with an mtime of -1. This removes the ability to distinguish these two versions of a file and can read to consistency problems.

A specific example is the code that reads the parquet footer HdfsParquetScanner::ProcessFooter(). We don't know ahead of time how big the Parquet footer is. So, we read 100KB (determined by FOOTER_SIZE). If the footer size encoded in the last few bytes of the file indicates that the footer is larger than that code here, then we issue a separate read for the actual size of the footer. That separate read does not inherit the mtime of the original read and instead uses an mtime of -1. I verified this by adding tracing and issuing a select against functional_parquet.widetable_1000_cols.

A failure scenario associated with this is that we read the last 100KB using a ScanRange with mtime=2, then we find that the footer is larger than 100KB and issue a ScanRange with mtime=-1. This uses a file handle that is from a previous version of the file equivalent to mtime=1. The data it is reading may not come from the end of the file, or it may be at the end of the file but the footer has a different length. (There is no validation on the new read to check the magic value or metadata size reported by the new buffer.) Either would result in a failure to deserialize the thrift for the footer. For example, a problem case could produce an error message like:

File hdfs://test-warehouse/example_file.parq of length 1048576 bytes has invalid file metadata at file offset 462017. Error = couldn't deserialize thrift msg:
TProtocolException: Invalid data
.

To fix this, we should examine all locations that can result in ScanRanges with mtime=-1 and eliminate any that we can. For example, the HdfsParquetScanner::ProcessFooter() code should create a ScanRange that inherits the mtime from the original footer ScanRange. Also, the file handle cache should refuse to cache file handles with mtime=-1.

The code in HdfsParquetScanner::ProcessFooter() should add validation for the magic value and metadata size when reading a footer larger than 100KB to verify that we are reading something valid. The thrift deserialize failure gives some information, but catching this case more specifically would provide a better error message.

Workarounds

This is most often caused by overwriting files in-place (e.g. INSERT OVERWRITE from Hive) without refreshing the metadata. You can avoid the issue by avoiding these in-place rewrites or by consistently running REFRESH <tbl> in Impala after the modifications. After further consideration, there are several symptoms that are not resolved via REFRESH/INVALIDATE.

Setting --max_cached_file_handles=0 in the impalad startup options can work around the issue, at the cost of performance.

Attachments

Issue Links

is duplicated by

IMPALA-10042 ERROR: File has an invalid version number: This could be due to stale metadata. Try running "refresh <Table_name>".

Resolved

relates to

IMPALA-8562 Data cache should skip scan range with mtime == -1

Resolved

Activity

People

Assignee:: Joe McDonnell

Reporter:: Joe McDonnell

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 16/May/19 23:23

Updated:: 24/Aug/20 17:01

Resolved:: 14/Jun/19 17:39