Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-13813

Exit NameNode if dangling child inode is detected when saving FsImage

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.1.0, 2.10.0, 2.9.1, 3.0.3
    • Fix Version/s: 2.10.0, 3.2.0, 2.9.2, 3.0.4, 3.1.2
    • Component/s: hdfs, namenode
    • Labels:
      None
    • Target Version/s:

      Description

      Recently, the same stack trace as in HDFS-9406 appears again in the field. The symptom of the problem is that loadINodeDirectorySection() can't find a child inode in inodeMap by the node id in the children list of the directory. The child inode could be missing or deleted.

      As for now we didn't have a clear trace to reproduce the problem. Therefore, I'm proposing this improvement to detect such corruption (data structure inconsistency) when saving the FsImage, so that we can have the FsImage and Edit Log to hopefully reproduce the problem stably.

       

      In a previous patch HDFS-13314, Arpit Agarwal did a great job catching potential FsImage corruption in two cases. This patch includes a third case where a child inode does not exist in the global FSDirectory dir when saving (serializing) INodeDirectorySection.

        Attachments

        1. HDFS-13813.001.patch
          5 kB
          Siyao Meng
        2. HDFS-13813.002.patch
          5 kB
          Siyao Meng

          Issue Links

            Activity

              People

              • Assignee:
                smeng Siyao Meng
                Reporter:
                smeng Siyao Meng
              • Votes:
                0 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: