Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-1981

When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.22.0
    • Component/s: namenode
    • Labels:
      None
    • Environment:

      Linux

    • Hadoop Flags:
      Reviewed

      Description

      This scenario is applicable in NN and BNN case.

      When the namenode goes down after creating the edits.new, on subsequent restart the divertFileStreams will not happen to edits.new as the edits.new file is already present and the size is zero.

      so on trying to saveCheckPoint an exception occurs
      2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: GetImage failed. java.io.IOException: Namenode has an edit log with timestamp of 2011-05-23 16:38:56 but new checkpoint was created using editlog with timestamp 2011-05-23 16:37:30. Checkpoint Aborted.

      This is a bug or is that the behaviour.

      1. HDFS-1981_0.22.patch
        4 kB
        Uma Maheswara Rao G
      2. HDFS-1981_0.23.patch
        4 kB
        Uma Maheswara Rao G
      3. HDFS-1981.patch
        5 kB
        ramkrishna.s.vasudevan
      4. HDFS-1981-1.patch
        5 kB
        ramkrishna.s.vasudevan
      5. HDFS-1981-2.patch
        4 kB
        ramkrishna.s.vasudevan

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Patch Available Patch Available Open Open
          41d 21h 58m 4 Uma Maheswara Rao G 27/Jul/11 13:31
          Open Open Patch Available Patch Available
          23d 3h 22m 5 Uma Maheswara Rao G 27/Jul/11 13:34
          Patch Available Patch Available Resolved Resolved
          11h 1m 1 Konstantin Shvachko 28/Jul/11 00:35
          Resolved Resolved Closed Closed
          137d 6h 43m 1 Konstantin Shvachko 12/Dec/11 06:19
          Konstantin Shvachko made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-22-branch #73 (See https://builds.apache.org/job/Hadoop-Hdfs-22-branch/73/)
          HDFS-1981. svn merge -c 1151666 from trunk to branch-0.22.

          shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1151668
          Files :

          • /hadoop/common/branches/branch-0.22/hdfs/src/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
          • /hadoop/common/branches/branch-0.22/hdfs/CHANGES.txt
          • /hadoop/common/branches/branch-0.22/hdfs/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestFSImage.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-22-branch #73 (See https://builds.apache.org/job/Hadoop-Hdfs-22-branch/73/ ) HDFS-1981 . svn merge -c 1151666 from trunk to branch-0.22. shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1151668 Files : /hadoop/common/branches/branch-0.22/hdfs/src/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java /hadoop/common/branches/branch-0.22/hdfs/CHANGES.txt /hadoop/common/branches/branch-0.22/hdfs/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestFSImage.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #738 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/738/)
          HDFS-1981. NameNode does not saveNamespace() when editsNew is empty. Contributed by Uma Maheswara Rao G.

          shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1151666
          Files :

          • /hadoop/common/trunk/hdfs/CHANGES.txt
          • /hadoop/common/trunk/hdfs/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestFSImage.java
          • /hadoop/common/trunk/hdfs/src/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #738 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/738/ ) HDFS-1981 . NameNode does not saveNamespace() when editsNew is empty. Contributed by Uma Maheswara Rao G. shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1151666 Files : /hadoop/common/trunk/hdfs/CHANGES.txt /hadoop/common/trunk/hdfs/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestFSImage.java /hadoop/common/trunk/hdfs/src/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #811 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/811/)
          HDFS-1981. NameNode does not saveNamespace() when editsNew is empty. Contributed by Uma Maheswara Rao G.

          shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1151666
          Files :

          • /hadoop/common/trunk/hdfs/CHANGES.txt
          • /hadoop/common/trunk/hdfs/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestFSImage.java
          • /hadoop/common/trunk/hdfs/src/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #811 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/811/ ) HDFS-1981 . NameNode does not saveNamespace() when editsNew is empty. Contributed by Uma Maheswara Rao G. shv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1151666 Files : /hadoop/common/trunk/hdfs/CHANGES.txt /hadoop/common/trunk/hdfs/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestFSImage.java /hadoop/common/trunk/hdfs/src/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java
          Konstantin Shvachko made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Resolution Fixed [ 1 ]
          Hide
          Konstantin Shvachko added a comment -

          I just committed this. Thanks Uma.

          I agree with Todd this is mostly targeted for 0.22, but had to follow the procedures so committed to both trunk and the branch.

          Show
          Konstantin Shvachko added a comment - I just committed this. Thanks Uma. I agree with Todd this is mostly targeted for 0.22, but had to follow the procedures so committed to both trunk and the branch.
          Hide
          Konstantin Shvachko added a comment -

          + 1 on both patches.

          Show
          Konstantin Shvachko added a comment - + 1 on both patches.
          Hide
          Uma Maheswara Rao G added a comment -

          Reason for above failures:
          This patch is based on 0.22 branch. So, HDFS-1981_0.22.patch can not compile on trunk directly.

          Show
          Uma Maheswara Rao G added a comment - Reason for above failures: This patch is based on 0.22 branch. So, HDFS-1981 _0.22.patch can not compile on trunk directly.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12487967/HDFS-1981_0.22.patch
          against trunk revision 1151344.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The patch appears to cause tar ant target to fail.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:

          -1 contrib tests. The patch failed contrib unit tests.

          -1 system test framework. The patch failed system test framework compile.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1038//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1038//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1038//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12487967/HDFS-1981_0.22.patch against trunk revision 1151344. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: -1 contrib tests. The patch failed contrib unit tests. -1 system test framework. The patch failed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1038//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1038//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1038//console This message is automatically generated.
          Uma Maheswara Rao G made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Uma Maheswara Rao G added a comment -

          Updated the patch for 0.22 version.

          Show
          Uma Maheswara Rao G added a comment - Updated the patch for 0.22 version.
          Uma Maheswara Rao G made changes -
          Attachment HDFS-1981_0.22.patch [ 12487967 ]
          Uma Maheswara Rao G made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Uma Maheswara Rao G added a comment -

          Thanks Todd,
          I will update patch for 0.22 branch.
          I think we can merge the tests in HDFS-1073 / 0.23 as well for more coverage. What do you say?

          --thanks

          Show
          Uma Maheswara Rao G added a comment - Thanks Todd, I will update patch for 0.22 branch. I think we can merge the tests in HDFS-1073 / 0.23 as well for more coverage. What do you say? --thanks
          Hide
          Todd Lipcon added a comment -

          Hi Uma. Since the merge of HDFS-1073 is imminent, and this bug is not present in HDFS-1073, I think it's best to target only 0.22 for this patch.

          Show
          Todd Lipcon added a comment - Hi Uma. Since the merge of HDFS-1073 is imminent, and this bug is not present in HDFS-1073 , I think it's best to target only 0.22 for this patch.
          Hide
          Uma Maheswara Rao G added a comment -

          Hi Konstantin,
          I have provided path on 0.23 version.
          Are you expecting patch on 0.22 version? I think , we have not release 0.22 officially right. That is why i provided patch directly on trunk.
          If you are expecting patch specifically on 0.22 branch, i can provide it.
          is it required on 0.22 as well?
          Current patch can be committed on trunk.

          --Thanks

          Show
          Uma Maheswara Rao G added a comment - Hi Konstantin, I have provided path on 0.23 version. Are you expecting patch on 0.22 version? I think , we have not release 0.22 officially right. That is why i provided patch directly on trunk. If you are expecting patch specifically on 0.22 branch, i can provide it. is it required on 0.22 as well? Current patch can be committed on trunk. --Thanks
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12487871/HDFS-1981_0.23.patch
          against trunk revision 1150960.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1024//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1024//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1024//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12487871/HDFS-1981_0.23.patch against trunk revision 1150960. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1024//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1024//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1024//console This message is automatically generated.
          Uma Maheswara Rao G made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Assignee Uma Maheswara Rao G [ umamaheswararao ]
          Hide
          Uma Maheswara Rao G added a comment -

          Hi Konstantin,
          Thanks for the comments.
          Updated the patch on trunk.

          I did not remove the getStorage API, because in latest trunk i can see that api in FSImage class.

          
            public NNStorage getStorage() {
              return storage;
            }
          
          Show
          Uma Maheswara Rao G added a comment - Hi Konstantin, Thanks for the comments. Updated the patch on trunk. I did not remove the getStorage API, because in latest trunk i can see that api in FSImage class. public NNStorage getStorage() { return storage; }
          Uma Maheswara Rao G made changes -
          Attachment HDFS-1981_0.23.patch [ 12487871 ]
          Uma Maheswara Rao G made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Konstantin Shvachko added a comment -

          The patch looks good, except it is not compiling now.
          You should not remove the two imports from FSImage.

          In TestFSImage.testLoadFsEditsShouldReturnTrueWhenEditsNewExists()

          • getNameDirs() should not take parameters
          • FSImage does not have getStorage() method
          • Also member conf is not used anywhere in the test, can be removed

          If you could update the patch, I'll commit it.

          Show
          Konstantin Shvachko added a comment - The patch looks good, except it is not compiling now. You should not remove the two imports from FSImage. In TestFSImage.testLoadFsEditsShouldReturnTrueWhenEditsNewExists() getNameDirs() should not take parameters FSImage does not have getStorage() method Also member conf is not used anywhere in the test, can be removed If you could update the patch, I'll commit it.
          Hide
          ramkrishna.s.vasudevan added a comment -

          Hi Todd,

          Thanks for your comments..
          I have reworked on some of the comments.

          I think you have reviewed the old patch and not the patch with the name
          HDFS-1981-1.patch

          Any way I have corrected some of the comments in the latest patch also

          • As Konstantin said, please use Junit 4 (annotations API) instead of Junit 3, and use the MiniDFSCluster builder
            Already Addressed in previous patch.
          • typo: NEW_EIDTS_STREAM
            have changed this to NEW_EDITS_STREAM
          • don't use the string constant "dfs.name.dir" - there are constants in DFSConfigKeys for this
            Updated
          • "false == editsNew.exists()" ?? !editsNew.exists()
            Udpated
          • TODOs in the test case. don't swallow exceptions
            Updated
          • you can use IOUtils.cleanup or IOUtils.closeStream in the finally block inside of the block
            Updated
          • no need to clear editsStreams in teardown method - it's an instance var so it will be recreated for each case anyway
            Updated
          • what's the purpose of the setup which creates bImg? It's not used in any of the test cases.
            Instead of using the variable bImg, have now created an instance local level
          • assertion text is wrong: "image should be deleted" – but it's checking that "edits.new" should be deleted.
            Fixed in the previous patch- as per the latest fix told by Konstantin
          Show
          ramkrishna.s.vasudevan added a comment - Hi Todd, Thanks for your comments.. I have reworked on some of the comments. I think you have reviewed the old patch and not the patch with the name HDFS-1981 -1.patch Any way I have corrected some of the comments in the latest patch also As Konstantin said, please use Junit 4 (annotations API) instead of Junit 3, and use the MiniDFSCluster builder Already Addressed in previous patch. typo: NEW_EIDTS_STREAM have changed this to NEW_EDITS_STREAM don't use the string constant "dfs.name.dir" - there are constants in DFSConfigKeys for this Updated "false == editsNew.exists()" ?? !editsNew.exists() Udpated TODOs in the test case. don't swallow exceptions Updated you can use IOUtils.cleanup or IOUtils.closeStream in the finally block inside of the block Updated no need to clear editsStreams in teardown method - it's an instance var so it will be recreated for each case anyway Updated what's the purpose of the setup which creates bImg? It's not used in any of the test cases. Instead of using the variable bImg, have now created an instance local level assertion text is wrong: "image should be deleted" – but it's checking that "edits.new" should be deleted. Fixed in the previous patch- as per the latest fix told by Konstantin
          ramkrishna.s.vasudevan made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          ramkrishna.s.vasudevan made changes -
          Attachment HDFS-1981-2.patch [ 12485748 ]
          ramkrishna.s.vasudevan made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Todd Lipcon added a comment -
          • As Konstantin said, please use Junit 4 (annotations API) instead of Junit 3, and use the MiniDFSCluster builder
          • typo: NEW_EIDTS_STREAM
          • don't use the string constant "dfs.name.dir" - there are constants in DFSConfigKeys for this
          • "false == editsNew.exists()" ?? !editsNew.exists()
          • TODOs in the test case. don't swallow exceptions
          • you can use IOUtils.cleanup or IOUtils.closeStream in the finally block inside of the block
          • no need to clear editsStreams in teardown method - it's an instance var so it will be recreated for each case anyway
          • what's the purpose of the setup which creates bImg? It's not used in any of the test cases.
          • assertion text is wrong: "image should be deleted" – but it's checking that "edits.new" should be deleted.
          Show
          Todd Lipcon added a comment - As Konstantin said, please use Junit 4 (annotations API) instead of Junit 3, and use the MiniDFSCluster builder typo: NEW_EIDTS_STREAM don't use the string constant "dfs.name.dir" - there are constants in DFSConfigKeys for this "false == editsNew.exists()" ?? !editsNew.exists() TODOs in the test case. don't swallow exceptions you can use IOUtils.cleanup or IOUtils.closeStream in the finally block inside of the block no need to clear editsStreams in teardown method - it's an instance var so it will be recreated for each case anyway what's the purpose of the setup which creates bImg? It's not used in any of the test cases. assertion text is wrong: "image should be deleted" – but it's checking that "edits.new" should be deleted.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12484445/HDFS-1981-1.patch
          against trunk revision 1140030.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/860//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/860//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/860//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12484445/HDFS-1981-1.patch against trunk revision 1140030. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/860//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/860//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/860//console This message is automatically generated.
          ramkrishna.s.vasudevan made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          ramkrishna.s.vasudevan made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          ramkrishna.s.vasudevan made changes -
          Attachment HDFS-1981-1.patch [ 12484445 ]
          Hide
          Konstantin Shvachko added a comment -

          Not sure what introduced it, but
          The problem is that NN does not saveNamespace() when editsNew is present.
          This only happens in Ramakrishna's scenario, when editsNew is empty. That is when you start the checkpoint, and fail NN before modifying anything in the namespace.

          Deleting editsNew, is probably valid, but not consistent, since at this stage NN is in read-only mode. That is if something goes wrong we should leave the storage directory in exactly the same state as it was before the startup.

          I propose to increment numEdits if editsNew exists. This will trigger saving namespace after loading. So just one line change:

          . if (editsNew.exists() && editsNew.length() > 0) {
          +   numEdits ++;
              edits = new EditLogFileInputStream(editsNew);
              numEdits += loader.loadFSEdits(edits);
              edits.close();
            }
          

          Well, may be not one line as you need to increment even if editsNew.length() == 0.

          Your test should work in this case as well. Could you please convert it to JUnit4 and use MiniDFSCluster.Builder instead of a direct constructor.

          Show
          Konstantin Shvachko added a comment - Not sure what introduced it, but The problem is that NN does not saveNamespace() when editsNew is present. This only happens in Ramakrishna's scenario, when editsNew is empty. That is when you start the checkpoint, and fail NN before modifying anything in the namespace. Deleting editsNew, is probably valid, but not consistent, since at this stage NN is in read-only mode. That is if something goes wrong we should leave the storage directory in exactly the same state as it was before the startup. I propose to increment numEdits if editsNew exists. This will trigger saving namespace after loading. So just one line change: . if (editsNew.exists() && editsNew.length() > 0) { + numEdits ++; edits = new EditLogFileInputStream(editsNew); numEdits += loader.loadFSEdits(edits); edits.close(); } Well, may be not one line as you need to increment even if editsNew.length() == 0 . Your test should work in this case as well. Could you please convert it to JUnit4 and use MiniDFSCluster.Builder instead of a direct constructor.
          Hide
          Todd Lipcon added a comment -

          It looks like this was probably caused by the bug fix HADOOP-5314. In 0.20, the NN would always save the image at startup due to 5314, and thus we wouldn't run into the issue described here.

          Still working on understanding the issue, but this is one of the missing links.

          Show
          Todd Lipcon added a comment - It looks like this was probably caused by the bug fix HADOOP-5314 . In 0.20, the NN would always save the image at startup due to 5314, and thus we wouldn't run into the issue described here. Still working on understanding the issue, but this is one of the missing links.
          Todd Lipcon made changes -
          Link This issue relates to HADOOP-5314 [ HADOOP-5314 ]
          Konstantin Shvachko made changes -
          Fix Version/s 0.22.0 [ 12314241 ]
          Fix Version/s 0.23.0 [ 12315571 ]
          Affects Version/s 0.22.0 [ 12314241 ]
          Affects Version/s 0.23.0 [ 12315571 ]
          Priority Major [ 3 ] Blocker [ 1 ]
          Hide
          Konstantin Shvachko added a comment -

          I have seen this bug in 0.22. Making it a blocker. Will review the patch [very] soon.

          Show
          Konstantin Shvachko added a comment - I have seen this bug in 0.22. Making it a blocker. Will review the patch [very] soon.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12482669/HDFS-1981.patch
          against trunk revision 1135329.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The applied patch generated 32 javac compiler warnings (more than the trunk's current 31 warnings).

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these core unit tests:
          org.apache.hadoop.cli.TestHDFSCLI
          org.apache.hadoop.hdfs.TestHDFSTrash

          +1 contrib tests. The patch passed contrib unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/786//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/786//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/786//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12482669/HDFS-1981.patch against trunk revision 1135329. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 32 javac compiler warnings (more than the trunk's current 31 warnings). -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestHDFSCLI org.apache.hadoop.hdfs.TestHDFSTrash +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/786//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/786//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/786//console This message is automatically generated.
          ramkrishna.s.vasudevan made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          ramkrishna.s.vasudevan made changes -
          Attachment HDFS-1981.patch [ 12482669 ]
          ramkrishna.s.vasudevan made changes -
          Field Original Value New Value
          Hadoop Flags [Reviewed]
          Hide
          ramkrishna.s.vasudevan added a comment -

          Writing UT for this may be difficult to reproduce the scenario.

          The steps that I followed to reproduce this issue are
          1. Start namenode and backup namenode
          2. Allow checkpointing to happen such that the edits.new file is created on the namenode.
          3. At this point kill the NN and BNN.
          4. Now start the NN and BNN.
          5. When checkpointing starts again we will get the above exception.

          The exact problem comes in the loadFSEdits() api in FSImage.java

          Here if the loadFSEdits() api returns 0 then

          if (fsImage.recoverTransitionRead(dataDirs, editsDirs, startOpt))

          { fsImage.saveNamespace(true); }

          saveNamespace() will not be invoked.

          Kindly correct me if you find any problems in this.

          Show
          ramkrishna.s.vasudevan added a comment - Writing UT for this may be difficult to reproduce the scenario. The steps that I followed to reproduce this issue are 1. Start namenode and backup namenode 2. Allow checkpointing to happen such that the edits.new file is created on the namenode. 3. At this point kill the NN and BNN. 4. Now start the NN and BNN. 5. When checkpointing starts again we will get the above exception. The exact problem comes in the loadFSEdits() api in FSImage.java Here if the loadFSEdits() api returns 0 then if (fsImage.recoverTransitionRead(dataDirs, editsDirs, startOpt)) { fsImage.saveNamespace(true); } saveNamespace() will not be invoked. Kindly correct me if you find any problems in this.
          Hide
          Todd Lipcon added a comment -

          Hi Ramkrishna. Can you provide a unit test which shows this issue? It would be especially good to see such a test against 0.22, since HDFS-1073 will restructure all this code when it's merged into 0.23.

          Show
          Todd Lipcon added a comment - Hi Ramkrishna. Can you provide a unit test which shows this issue? It would be especially good to see such a test against 0.22, since HDFS-1073 will restructure all this code when it's merged into 0.23.
          ramkrishna.s.vasudevan created issue -

            People

            • Assignee:
              Uma Maheswara Rao G
              Reporter:
              ramkrishna.s.vasudevan
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development