Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-4423

Checkpoint exception causes fatal damage to fsimage.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 1.0.4, 1.1.1
    • 1.1.2
    • namenode
    • None
    • CentOS 6.2

    • Reviewed

    Description

      The impact of class is org.apache.hadoop.hdfs.server.namenode.FSImage.java

      boolean loadFSImage(MetaRecoveryContext recovery) throws IOException {
      ...
      latestNameSD.read();
          needToSave |= loadFSImage(getImageFile(latestNameSD, NameNodeFile.IMAGE));
          LOG.info("Image file of size " + imageSize + " loaded in " 
              + (FSNamesystem.now() - startTime)/1000 + " seconds.");
          
          // Load latest edits
          if (latestNameCheckpointTime > latestEditsCheckpointTime)
            // the image is already current, discard edits
            needToSave |= true;
          else // latestNameCheckpointTime == latestEditsCheckpointTime
            needToSave |= (loadFSEdits(latestEditsSD, recovery) > 0);
          
          return needToSave;
        }
      

      If it is the normal flow of the checkpoint,the value of latestNameCheckpointTime is equal to the value of latestEditsCheckpointTime,and it will exec “else”.
      The problem is that,latestNameCheckpointTime > latestEditsCheckpointTime:
      SecondNameNode starts checkpoint,
      ...
      NameNode:rollFSImage,NameNode shutdown after write latestNameCheckpointTime and before write latestEditsCheckpointTime.
      Start NameNode:because latestNameCheckpointTime > latestEditsCheckpointTime,so the value of needToSave is true, and it will not update “rootDir”'s nsCount that is the cluster's file number(update exec at loadFSEdits “FSNamesystem.getFSNamesystem().dir.updateCountForINodeWithQuota()”),and then “saveNamespace” will write file number to fsimage whit default value “1”。
      The next time,loadFSImage will fail.

      Maybe,it will work:

      boolean loadFSImage(MetaRecoveryContext recovery) throws IOException {
      ...
      latestNameSD.read();
          needToSave |= loadFSImage(getImageFile(latestNameSD, NameNodeFile.IMAGE));
          LOG.info("Image file of size " + imageSize + " loaded in " 
              + (FSNamesystem.now() - startTime)/1000 + " seconds.");
          
          // Load latest edits
          if (latestNameCheckpointTime > latestEditsCheckpointTime){
            // the image is already current, discard edits
            needToSave |= true;
            FSNamesystem.getFSNamesystem().dir.updateCountForINodeWithQuota();
          }
          else // latestNameCheckpointTime == latestEditsCheckpointTime
            needToSave |= (loadFSEdits(latestEditsSD, recovery) > 0);
          
          return needToSave;
        }
      

      Attachments

        1. HDFS-4423-branch-1.1.patch
          7 kB
          Chris Nauroth

        Activity

          People

            cnauroth Chris Nauroth
            chenfolin ChenFolin
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 72h
                72h
                Remaining:
                Remaining Estimate - 72h
                72h
                Logged:
                Time Spent - Not Specified
                Not Specified