Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
Impala 3.2.0
-
None
-
None
-
ghx-label-5
Description
When querying an S3 backed table that is being modified (e.g. distcp content from another cluster) and Impala is able to determine that a file in that table has been deleted (e.g. using the S3guard feature in CDH), queries still fail with a FileNotFound exception.
Performing a metadata refresh after the copy completes does resolve the problem. However this doesn't help during the copy phase. Requesting an enhancement where Impala can ignore files if knows that they've been deleted.