Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-966

NameNode recovers lease even in safemode

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: namenode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The NameNode recovers a lease even when it is in safemode.

      1. leaseRecoverSafeMode.txt
        0.6 kB
        dhruba borthakur
      2. leaseRecoverSafeMode2.txt
        2 kB
        dhruba borthakur

        Issue Links

          Activity

          Hide
          dhruba borthakur added a comment -

          I just committed this.

          Show
          dhruba borthakur added a comment - I just committed this.
          Hide
          dhruba borthakur added a comment -

          The test failure is caused by HDFS-1101. I will commit this patch.

          Show
          dhruba borthakur added a comment - The test failure is caused by HDFS-1101 . I will commit this patch.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12442369/leaseRecoverSafeMode2.txt
          against trunk revision 936132.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/158/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/158/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/158/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/158/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12442369/leaseRecoverSafeMode2.txt against trunk revision 936132. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/158/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/158/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/158/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/158/console This message is automatically generated.
          Hide
          dhruba borthakur added a comment -

          The failed unit test is datanode.TestDiskError and is not connected to this patch, but I will resubmit this patch again.

          Show
          dhruba borthakur added a comment - The failed unit test is datanode.TestDiskError and is not connected to this patch, but I will resubmit this patch again.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12442369/leaseRecoverSafeMode2.txt
          against trunk revision 936024.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/157/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/157/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/157/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/157/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12442369/leaseRecoverSafeMode2.txt against trunk revision 936024. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/157/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/157/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/157/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/157/console This message is automatically generated.
          Hide
          dhruba borthakur added a comment -

          Merged patch with latest trunk.

          Show
          dhruba borthakur added a comment - Merged patch with latest trunk.
          Hide
          Konstantin Shvachko added a comment -

          I don't think there is. Looking in the code, we currently let lease recovery during startup, but we probably never hit it, because the hard limit is reset to 1 hour, and the name-node is likely to leave safe mode before it expires.
          +1 for the patch.

          Show
          Konstantin Shvachko added a comment - I don't think there is. Looking in the code, we currently let lease recovery during startup, but we probably never hit it, because the hard limit is reset to 1 hour, and the name-node is likely to leave safe mode before it expires. +1 for the patch.
          Hide
          dhruba borthakur added a comment -

          I think we do not introduce any new complexity/race conditions if we prevent recovering a lease while in safemode. If you folks agree, then this patch should be ok, isn't it?

          Is there a scenario where not recovering a lease during safemode could cause trouble for the NN?

          Show
          dhruba borthakur added a comment - I think we do not introduce any new complexity/race conditions if we prevent recovering a lease while in safemode. If you folks agree, then this patch should be ok, isn't it? Is there a scenario where not recovering a lease during safemode could cause trouble for the NN?
          Hide
          Todd Lipcon added a comment -

          Nope, I agree with you, I'm pretty sure. But right now in trunk and 20 both, renewLease does check for safemode. Is that a mistake? Are there any ramifications to changing it? I agree with Konstantin that this stuff is very dangerous, but also agree with you that it's important for HA to fix it up.

          Show
          Todd Lipcon added a comment - Nope, I agree with you, I'm pretty sure. But right now in trunk and 20 both, renewLease does check for safemode. Is that a mistake? Are there any ramifications to changing it? I agree with Konstantin that this stuff is very dangerous, but also agree with you that it's important for HA to fix it up.
          Hide
          dhruba borthakur added a comment -

          I think we should prevent lease recovery in safemode. But we should allow the dfsclients to continue renewing their leases even if the namenode is in safemode. Is there a problem that you visualize?

          Show
          dhruba borthakur added a comment - I think we should prevent lease recovery in safemode. But we should allow the dfsclients to continue renewing their leases even if the namenode is in safemode. Is there a problem that you visualize?
          Hide
          Todd Lipcon added a comment -

          Should we allow lease renewal while in safemode? I was looking over the HDFS-988 patch and had this question.

          Show
          Todd Lipcon added a comment - Should we allow lease renewal while in safemode? I was looking over the HDFS-988 patch and had this question.
          Hide
          dhruba borthakur added a comment -

          One motivation is that when the NameNode hot standby is running (i.e. AvatarNode), then we have to ensure that the hot-standby is really-really a standby and is not actively participating in making any modifications to HDFS state.

          when the NN starts, it starts off in safemode. We would not like to start lease recovery when the namenode is in safemode. Typically, when you open the NameNode for business by exiting safemode, we should start the lease recovery of any unclosed files if required, isn't it?

          Show
          dhruba borthakur added a comment - One motivation is that when the NameNode hot standby is running (i.e. AvatarNode), then we have to ensure that the hot-standby is really-really a standby and is not actively participating in making any modifications to HDFS state. when the NN starts, it starts off in safemode. We would not like to start lease recovery when the namenode is in safemode. Typically, when you open the NameNode for business by exiting safemode, we should start the lease recovery of any unclosed files if required, isn't it?
          Hide
          Konstantin Shvachko added a comment -

          This looks like a simple change, but could have serious consequences. Could you please elaborate on the motivation, possible problems, +/-.
          One question I see is what happens if lease recovery starts, when not all blocks have been reported (during startup)?

          Show
          Konstantin Shvachko added a comment - This looks like a simple change, but could have serious consequences. Could you please elaborate on the motivation, possible problems, +/-. One question I see is what happens if lease recovery starts, when not all blocks have been reported (during startup)?

            People

            • Assignee:
              dhruba borthakur
              Reporter:
              dhruba borthakur
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development