Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9406

FSImage may get corrupted after deleting snapshot

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.6.0
    • Fix Version/s: 2.8.0, 2.7.3, 3.0.0-alpha1
    • Component/s: namenode
    • Labels:
      None
    • Environment:

      CentOS 6 amd64, CDH 5.4.4-1
      2xCPU: Intel(R) Xeon(R) CPU E5-2640 v3
      Memory: 32GB
      Namenode blocks: ~700_000 blocks, no HA setup

    • Hadoop Flags:
      Reviewed

      Description

      FSImage corruption happened after HDFS snapshots were taken. Cluster was not used
      at that time.

      When namenode restarts it reported NULL pointer exception:

      15/11/07 10:01:15 INFO namenode.FileJournalManager: Recovering unfinalized segments in /tmp/fsimage_checker_5857/fsimage/current
      15/11/07 10:01:15 INFO namenode.FSImage: No edit log streams selected.
      15/11/07 10:01:18 INFO namenode.FSImageFormatPBINode: Loading 1370277 INodes.
      15/11/07 10:01:27 ERROR namenode.NameNode: Failed to start namenode.
      java.lang.NullPointerException
              at org.apache.hadoop.hdfs.server.namenode.INodeDirectory.addChild(INodeDirectory.java:531)
              at org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:252)
              at org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:202)
              at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:261)
              at org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:180)
              at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:226)
              at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:929)
              at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:913)
              at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:732)
              at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:668)
              at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:281)
              at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1061)
              at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:765)
              at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:584)
              at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:643)
              at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:810)
              at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:794)
              at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1487)
              at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1553)
      15/11/07 10:01:27 INFO util.ExitUtil: Exiting with status 1
      

      Corruption happened after "07.11.2015 00:15", and after that time blocks ~9300 blocks were invalidated that shouldn't be.
      After recovering FSimage I discovered that around ~9300 blocks were missing.

      I also attached log of namenode before and after corruption happened.

        Attachments

        1. HDFS-9406.branch-2.7.patch
          8 kB
          Yongjun Zhang
        2. HDFS-9406.003.patch
          8 kB
          Yongjun Zhang
        3. HDFS-9406.002.patch
          5 kB
          Yongjun Zhang
        4. HDFS-9406.001.patch
          5 kB
          Yongjun Zhang

          Issue Links

            Activity

              People

              • Assignee:
                yzhangal Yongjun Zhang
                Reporter:
                stanislav.antic@gmail.com Stanislav Antic
              • Votes:
                0 Vote for this issue
                Watchers:
                20 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: