Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-10579

Deadloop in table metadata loading when using an invalid RemoteIterator

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Impala 3.4.0
    • Fix Version/s: Impala 4.0.0
    • Component/s: Catalog
    • Labels:
      None
    • Epic Color:
      ghx-label-13

      Description

      The file listing thread in catalogd will go into a dead loop if it gets a RemoteIterator on a non-existing path. The first call of the RemoteIterator.hasNext() will throw a FileNotFoundException. However, this exception will be catched and the loop will continue, which results in a dead loop. Related codes: https://github.com/apache/impala/blob/d89c04bf806682d3449c566ce979632bd2ac5b29/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java#L789-L814

        static class FilterIterator implements RemoteIterator<FileStatus> {
          ...
          public boolean hasNext() throws IOException {
            ...
            while (curFile_ == null) {
              FileStatus next;
              try {
                if (!baseIterator_.hasNext()) return false; // <---- throws FileNotFoundException
                ...
                next = baseIterator_.next();
              } catch (FileNotFoundException ex) {
                ...
                LOG.warn(ex.getMessage());
                continue;  // <--------- catch the exception and continue into a dead loop
              }
              if (!isInIgnoredDirectory(startPath_, next)) {
                curFile_ = next;
                return true;
              }
            }
            return true;
          }
      

      When will the path to be loading not exist?
      It happens when metadata (table/partition location) in HMS still have the path. But it's actually removed from the storage.

      When will impala get such an invalid RemoteIterator?
      For FileSystem implementations that don't override the FileSystem#listStatusIterator() interface, e.g. S3AFileSystem before HADOOP-17281, AzureBlobFileSystem, and GoogleHadoopFileSystem.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                stigahuang Quanlong Huang
                Reporter:
                stigahuang Quanlong Huang
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: