|
I managed to corrupt current name-node image using your patch.
Actually the image was set to an empty file, so that the name-node would not even restart after the checkpoint. I started 2 secondary nodes. The first of them was in the middle of getFSImage(), when the second called rollFSImage() and received the following exception: java.lang.IllegalStateException: Committed This is likely to be related to the patch, since the second secondary node would just get an exception trying to rollEditsLog(). The periodic checkpoint protocol is changed to handle the case if two Secondary's are racing with one another to upload a new checkpoint.
The NameNode periodic checkpoint has four states. A rollEdit moves the state to ROLLED_EDIT. A upload new image is allowed only if the state is ROLLED_EDIT. It sets the state to UPLOAD_START. When the upload of the upload of the new image is finished, the state is set to UPLOAD_DONE. The rollFsImage is allowed only if the state is UPLOAD_DONE. merged patch with latest trunk.
Incorporated all of Konstantin's comments.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12366319/secondaryRestart4.patch against trunk revision r578879. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/815/testReport/ This message is automatically generated. Integrated in Hadoop-Nightly #250 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/250/
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
If rollEditLog finds that the edits log already exists, then it simply returns success. rollFsImage fails if it was not preceeded by a call to rollEditLog. This lock-step ensures that a stale instance of a secondary namenode cannot fool the primary namenode into uploading a stale fsimage file.