Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
Impala 3.4.0
-
None
-
ghx-label-13
Description
The file listing thread in catalogd will go into a dead loop if it gets a RemoteIterator on a non-existing path. The first call of the RemoteIterator.hasNext() will throw a FileNotFoundException. However, this exception will be catched and the loop will continue, which results in a dead loop. Related codes: https://github.com/apache/impala/blob/d89c04bf806682d3449c566ce979632bd2ac5b29/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java#L789-L814
static class FilterIterator implements RemoteIterator<FileStatus> { ... public boolean hasNext() throws IOException { ... while (curFile_ == null) { FileStatus next; try { if (!baseIterator_.hasNext()) return false; // <---- throws FileNotFoundException ... next = baseIterator_.next(); } catch (FileNotFoundException ex) { ... LOG.warn(ex.getMessage()); continue; // <--------- catch the exception and continue into a dead loop } if (!isInIgnoredDirectory(startPath_, next)) { curFile_ = next; return true; } } return true; }
When will the path to be loading not exist?
It happens when metadata (table/partition location) in HMS still have the path. But it's actually removed from the storage.
When will impala get such an invalid RemoteIterator?
For FileSystem implementations that don't override the FileSystem#listStatusIterator() interface, e.g. S3AFileSystem before HADOOP-17281, AzureBlobFileSystem, and GoogleHadoopFileSystem.
Attachments
Issue Links
- is caused by
-
IMPALA-9122 Ignore FileNotFoundException when loading a table
- Resolved
- is related to
-
IMPALA-11464 hasNext() throws FileNotFoundException on staging files and breaks file metadata loading
- Resolved
- relates to
-
IMPALA-8663 FileMetadataLoader should skip listing files in hidden and tmp directories
- Resolved
- links to