Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-11729

Improve NNStorageRetentionManager failure handling.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Currently NNStorageRetentionManager will simply skip a storage directory if a problem is detected. Since checkpoint saving does not go through the same set of checks, this can lead to a space exhaustion seen in HDFS-11714.

      Instead of ignoring errors, it should handle it properly. One potential improvement is to catch the exception and report the storage directory failure using NNStorage.reportErrorsOnDirectories(). attemptRestoreRemovedStorage() will need extra checks. E.g. existence of a VERSION file.

      Attachments

        1. HDFS-11729.002.patch
          17 kB
          Weiwei Yang
        2. HDFS-11729.001.patch
          7 kB
          Weiwei Yang

        Activity

          People

            cheersyang Weiwei Yang
            kihwal Kihwal Lee
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: