Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-4423

Checkpoint exception causes fatal damage to fsimage.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.0.4, 1.1.1
    • Fix Version/s: 1.1.2
    • Component/s: namenode
    • Labels:
      None
    • Environment:

      CentOS 6.2

    • Hadoop Flags:
      Reviewed

      Description

      The impact of class is org.apache.hadoop.hdfs.server.namenode.FSImage.java

      boolean loadFSImage(MetaRecoveryContext recovery) throws IOException {
      ...
      latestNameSD.read();
          needToSave |= loadFSImage(getImageFile(latestNameSD, NameNodeFile.IMAGE));
          LOG.info("Image file of size " + imageSize + " loaded in " 
              + (FSNamesystem.now() - startTime)/1000 + " seconds.");
          
          // Load latest edits
          if (latestNameCheckpointTime > latestEditsCheckpointTime)
            // the image is already current, discard edits
            needToSave |= true;
          else // latestNameCheckpointTime == latestEditsCheckpointTime
            needToSave |= (loadFSEdits(latestEditsSD, recovery) > 0);
          
          return needToSave;
        }
      

      If it is the normal flow of the checkpoint,the value of latestNameCheckpointTime is equal to the value of latestEditsCheckpointTime,and it will exec “else”.
      The problem is that,latestNameCheckpointTime > latestEditsCheckpointTime:
      SecondNameNode starts checkpoint,
      ...
      NameNode:rollFSImage,NameNode shutdown after write latestNameCheckpointTime and before write latestEditsCheckpointTime.
      Start NameNode:because latestNameCheckpointTime > latestEditsCheckpointTime,so the value of needToSave is true, and it will not update “rootDir”'s nsCount that is the cluster's file number(update exec at loadFSEdits “FSNamesystem.getFSNamesystem().dir.updateCountForINodeWithQuota()”),and then “saveNamespace” will write file number to fsimage whit default value “1”。
      The next time,loadFSImage will fail.

      Maybe,it will work:

      boolean loadFSImage(MetaRecoveryContext recovery) throws IOException {
      ...
      latestNameSD.read();
          needToSave |= loadFSImage(getImageFile(latestNameSD, NameNodeFile.IMAGE));
          LOG.info("Image file of size " + imageSize + " loaded in " 
              + (FSNamesystem.now() - startTime)/1000 + " seconds.");
          
          // Load latest edits
          if (latestNameCheckpointTime > latestEditsCheckpointTime){
            // the image is already current, discard edits
            needToSave |= true;
            FSNamesystem.getFSNamesystem().dir.updateCountForINodeWithQuota();
          }
          else // latestNameCheckpointTime == latestEditsCheckpointTime
            needToSave |= (loadFSEdits(latestEditsSD, recovery) > 0);
          
          return needToSave;
        }
      

        Activity

        ChenFolin created issue -
        Hide
        Chris Nauroth added a comment -

        Thank you for the detailed write-up, ChenFolin. I have one additional question. You mentioned an exception causing NameNode to shutdown during checkpoint after writing latest name checkpoint time, but before writing latest edits checkpoint time. Do you have details on that exception? Was that exception related to this bug, or was it something unrelated that just exposed this problem in the loadFSImage logic?

        Your assessment about the call to FSDirectory#updateCountForINodeWithQuota looks correct. I'm thinking that we should move that call out of FSImage#loadFSEdits and into FSImage#loadFSImage, so that the end of loadFSImage would look like this:

        boolean loadFSImage(MetaRecoveryContext recovery) throws IOException {
        ...    
          // Load latest edits
          if (latestNameCheckpointTime > latestEditsCheckpointTime)
            // the image is already current, discard edits
            needToSave |= true;
          else // latestNameCheckpointTime == latestEditsCheckpointTime
            needToSave |= (loadFSEdits(latestEditsSD, recovery) > 0);
            
          // update the counts.
          FSNamesystem.getFSNamesystem().dir.updateCountForINodeWithQuota();    
          return needToSave;
        }
        

        Moving the call there would help guarantee that it always happens.

        Show
        Chris Nauroth added a comment - Thank you for the detailed write-up, ChenFolin . I have one additional question. You mentioned an exception causing NameNode to shutdown during checkpoint after writing latest name checkpoint time, but before writing latest edits checkpoint time. Do you have details on that exception? Was that exception related to this bug, or was it something unrelated that just exposed this problem in the loadFSImage logic? Your assessment about the call to FSDirectory#updateCountForINodeWithQuota looks correct. I'm thinking that we should move that call out of FSImage#loadFSEdits and into FSImage#loadFSImage , so that the end of loadFSImage would look like this: boolean loadFSImage(MetaRecoveryContext recovery) throws IOException { ... // Load latest edits if (latestNameCheckpointTime > latestEditsCheckpointTime) // the image is already current, discard edits needToSave |= true ; else // latestNameCheckpointTime == latestEditsCheckpointTime needToSave |= (loadFSEdits(latestEditsSD, recovery) > 0); // update the counts. FSNamesystem.getFSNamesystem().dir.updateCountForINodeWithQuota(); return needToSave; } Moving the call there would help guarantee that it always happens.
        Hide
        Chris Nauroth added a comment -

        I'm uploading a patch that's similar to the original suggestion from ChenFolin.

        When I tried the approach I suggested in my last comment, it didn't work. The reason is that SecondaryNameNode calls directly in to FSImage#loadFSEdits and depends on that method to call FSNamesystem.getFSNamesystem().dir.updateCountForINodeWithQuota() by side effect. It's a less impactful change to add the call in the if block for when the image is already current.

        I've also added a test that simulates the error condition by running a cluster with separate directories for image and edits, forcing the fstime file for edits to contain 0, and then going through a series of restarts/checkpoints to make sure that it can still load the merged image. Before I applied the change in FSImage, this test would fail with EOFException on the last restart, similar to what is described in the bug report. After I applied the fix in FSImage, the test passed.

        Show
        Chris Nauroth added a comment - I'm uploading a patch that's similar to the original suggestion from ChenFolin . When I tried the approach I suggested in my last comment, it didn't work. The reason is that SecondaryNameNode calls directly in to FSImage#loadFSEdits and depends on that method to call FSNamesystem.getFSNamesystem().dir.updateCountForINodeWithQuota() by side effect. It's a less impactful change to add the call in the if block for when the image is already current. I've also added a test that simulates the error condition by running a cluster with separate directories for image and edits, forcing the fstime file for edits to contain 0, and then going through a series of restarts/checkpoints to make sure that it can still load the merged image. Before I applied the change in FSImage , this test would fail with EOFException on the last restart, similar to what is described in the bug report. After I applied the fix in FSImage , the test passed.
        Chris Nauroth made changes -
        Field Original Value New Value
        Attachment HDFS-4423-branch-1.1.patch [ 12567000 ]
        Suresh Srinivas made changes -
        Assignee Chris Nauroth [ cnauroth ]
        Hide
        Chris Nauroth added a comment -

        Here is the output from test-patch. Regarding the Findbugs warnings, this is the exact same output I get from using a no-op patch file (a 0-byte file as input to test-patch.sh) applied to branch-1. There are no new warnings related to this patch. Perhaps we need to investigate if a prior patch accidentally introduced new warnings.

        [exec] -1 overall.
        [exec]
        [exec] +1 @author. The patch does not contain any @author tags.
        [exec]
        [exec] +1 tests included. The patch appears to include 4 new or modified tests.
        [exec]
        [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
        [exec]
        [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
        [exec]
        [exec] -1 findbugs. The patch appears to introduce 12 new Findbugs (version 1.3.9) warnings.

        Show
        Chris Nauroth added a comment - Here is the output from test-patch. Regarding the Findbugs warnings, this is the exact same output I get from using a no-op patch file (a 0-byte file as input to test-patch.sh) applied to branch-1. There are no new warnings related to this patch. Perhaps we need to investigate if a prior patch accidentally introduced new warnings. [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 4 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce 12 new Findbugs (version 1.3.9) warnings.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        +1 patch looks good.

        Show
        Tsz Wo Nicholas Sze added a comment - +1 patch looks good.
        Tsz Wo Nicholas Sze made changes -
        Hadoop Flags Reviewed [ 10343 ]
        Hide
        Tsz Wo Nicholas Sze added a comment -

        I have run the tests with the patch. All tests passed except TestNetUtils but it was due to my local network environment but not related to the patch.

        Show
        Tsz Wo Nicholas Sze added a comment - I have run the tests with the patch. All tests passed except TestNetUtils but it was due to my local network environment but not related to the patch.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        I have committed this. Thanks, Chris!

        Show
        Tsz Wo Nicholas Sze added a comment - I have committed this. Thanks, Chris!
        Tsz Wo Nicholas Sze made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Fix Version/s 1.1.2 [ 12323593 ]
        Resolution Fixed [ 1 ]
        Hide
        Chris Nauroth added a comment -

        Thank you, Nicholas! From my own test run, the only failure I saw was TestJvmReuse. This test has been failing consistently for me on branch-1, even before this patch. I will file a separate jira for follow-up on that.

        Show
        Chris Nauroth added a comment - Thank you, Nicholas! From my own test run, the only failure I saw was TestJvmReuse . This test has been failing consistently for me on branch-1, even before this patch. I will file a separate jira for follow-up on that.
        Hide
        Matt Foley added a comment -

        Closed upon successful release of 1.1.2.

        Show
        Matt Foley added a comment - Closed upon successful release of 1.1.2.
        Matt Foley made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Chris Nauroth
            Reporter:
            ChenFolin
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 72h
              72h
              Remaining:
              Remaining Estimate - 72h
              72h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development