Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-3026

HA: Handle failure during HA state transition

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0-alpha
    • Fix Version/s: 2.0.0-alpha
    • Component/s: ha, namenode
    • Labels:
      None

      Description

      This JIRA is to address a TODO in NameNode about handling the possibility of an incomplete HA state transition.

      1. HDFS-3026-HDFS-1623.patch
        9 kB
        Aaron T. Myers
      2. HDFS-3026.patch
        8 kB
        Aaron T. Myers
      3. HDFS-3026.patch
        8 kB
        Aaron T. Myers
      4. HDFS-3026.patch
        8 kB
        Aaron T. Myers

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1076 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1076/)
          HDFS-3026. HA: Handle failure during HA state transition. Contributed by Aaron T. Myers. (Revision 1337030)

          Result = SUCCESS
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1337030
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStateTransitionFailure.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1076 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1076/ ) HDFS-3026 . HA: Handle failure during HA state transition. Contributed by Aaron T. Myers. (Revision 1337030) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1337030 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStateTransitionFailure.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1040 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1040/)
          HDFS-3026. HA: Handle failure during HA state transition. Contributed by Aaron T. Myers. (Revision 1337030)

          Result = FAILURE
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1337030
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStateTransitionFailure.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1040 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1040/ ) HDFS-3026 . HA: Handle failure during HA state transition. Contributed by Aaron T. Myers. (Revision 1337030) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1337030 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStateTransitionFailure.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #2246 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2246/)
          HDFS-3026. HA: Handle failure during HA state transition. Contributed by Aaron T. Myers. (Revision 1337030)

          Result = ABORTED
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1337030
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStateTransitionFailure.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2246 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2246/ ) HDFS-3026 . HA: Handle failure during HA state transition. Contributed by Aaron T. Myers. (Revision 1337030) Result = ABORTED atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1337030 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStateTransitionFailure.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #2303 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2303/)
          HDFS-3026. HA: Handle failure during HA state transition. Contributed by Aaron T. Myers. (Revision 1337030)

          Result = SUCCESS
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1337030
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStateTransitionFailure.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2303 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2303/ ) HDFS-3026 . HA: Handle failure during HA state transition. Contributed by Aaron T. Myers. (Revision 1337030) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1337030 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStateTransitionFailure.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #2229 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2229/)
          HDFS-3026. HA: Handle failure during HA state transition. Contributed by Aaron T. Myers. (Revision 1337030)

          Result = SUCCESS
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1337030
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStateTransitionFailure.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2229 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2229/ ) HDFS-3026 . HA: Handle failure during HA state transition. Contributed by Aaron T. Myers. (Revision 1337030) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1337030 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStateTransitionFailure.java
          Hide
          Aaron T. Myers added a comment -

          Thanks a lot for the reviews, Eli. I've just committed this to trunk, branch-2, and branch-2.0.0-alpha.

          Show
          Aaron T. Myers added a comment - Thanks a lot for the reviews, Eli. I've just committed this to trunk, branch-2, and branch-2.0.0-alpha.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12526418/HDFS-3026.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2414//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2414//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12526418/HDFS-3026.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2414//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2414//console This message is automatically generated.
          Hide
          Eli Collins added a comment -

          +1 looks great

          Show
          Eli Collins added a comment - +1 looks great
          Hide
          Aaron T. Myers added a comment -

          Forgot to address the findbugs warning - just need to synchronize NameNode#setRuntimeForTesting.

          Show
          Aaron T. Myers added a comment - Forgot to address the findbugs warning - just need to synchronize NameNode#setRuntimeForTesting.
          Hide
          Aaron T. Myers added a comment -

          Thanks a lot for the review, Eli. Here's an updated patch.

          Good idea re: trash emptier thread. I've done that in this patch.

          As for the other exits in NameNode - all of those are as exit codes from shell commands (e.g. format, bootstrapStandby, etc.), or from the the static main function, none of which I think really benefit from calling this method. Good point about making the error message more generic, though. I've gone ahead and done that.

          Show
          Aaron T. Myers added a comment - Thanks a lot for the review, Eli. Here's an updated patch. Good idea re: trash emptier thread. I've done that in this patch. As for the other exits in NameNode - all of those are as exit codes from shell commands (e.g. format, bootstrapStandby, etc.), or from the the static main function, none of which I think really benefit from calling this method. Good point about making the error message more generic, though. I've gone ahead and done that.
          Hide
          Eli Collins added a comment -

          Looks good. Two suggestions:

          doImmediateShutdown is HA-specific (refers state transition and logs a specific return code), how about making it take a return code and a message and converting the 5 or so other places in NameNode where we do

          LOG.error("something is wrong");  // or System.err etc
          System.exit(<some code>);
          

          with

          doImmediateShutdown(<some code>, "something is wrong")
          

          and can use another wrapper for the HA state transition case?

          Noticed we don't check FS_TRASH_INTERVAL for bogus value, perhaps fail to start the emptier if it's bogus and use a negative value in the test instead of introducing failToStartTrashEmptierForTests?

          Show
          Eli Collins added a comment - Looks good. Two suggestions: doImmediateShutdown is HA-specific (refers state transition and logs a specific return code), how about making it take a return code and a message and converting the 5 or so other places in NameNode where we do LOG.error( "something is wrong" ); // or System .err etc System .exit(<some code>); with doImmediateShutdown(<some code>, "something is wrong" ) and can use another wrapper for the HA state transition case? Noticed we don't check FS_TRASH_INTERVAL for bogus value, perhaps fail to start the emptier if it's bogus and use a negative value in the test instead of introducing failToStartTrashEmptierForTests?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12526356/HDFS-3026.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2406//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/2406//artifact/trunk/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2406//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12526356/HDFS-3026.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2406//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/2406//artifact/trunk/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2406//console This message is automatically generated.
          Hide
          Aaron T. Myers added a comment -

          I looked into what it would take to make the RPC server support a semi-shutdown state, wherein it could return one final response to the client who initiated a shutdown from an RPC, but cancel all other RPCs and not accept any further incoming connections. To do so requires a fair bit of surgery to the o.a.h.ipc.Server shutdown code. Since clients initiating HA state transitions must already handle the case where an RPC to the NN times out, it doesn't seem worth it to do so.

          In the patch attached, I've removed the delayed shutdown code and instead just shutdown the NN immediately upon failure to fully perform an HA state transition.

          Show
          Aaron T. Myers added a comment - I looked into what it would take to make the RPC server support a semi-shutdown state, wherein it could return one final response to the client who initiated a shutdown from an RPC, but cancel all other RPCs and not accept any further incoming connections. To do so requires a fair bit of surgery to the o.a.h.ipc.Server shutdown code. Since clients initiating HA state transitions must already handle the case where an RPC to the NN times out, it doesn't seem worth it to do so. In the patch attached, I've removed the delayed shutdown code and instead just shutdown the NN immediately upon failure to fully perform an HA state transition.
          Hide
          Aaron T. Myers added a comment -

          Converting to top-level issue with commit of HDFS-1623.

          Show
          Aaron T. Myers added a comment - Converting to top-level issue with commit of HDFS-1623 .
          Hide
          Aaron T. Myers added a comment -

          Rather than waiting N seconds and running the NN in an ill-defined state which will handle requests, seems like it would be better flag that the NN shutdown on reply to the RPC, and the shutdown path attempt NN#stop vs exiting hard. Would be useful for other non-HA cases where we discover in RPC context that we'd like to shutdown. Agree?

          The reason I didn't call NN#stop is because NN#stop itself tries to do a state transition, which doesn't seem wise to do in the event of a failed state transition.

          I'm not sure what you mean by "Rather than waiting N seconds and running the NN in an ill-definded state which will handle requests..." Do you mean that, in addition to waiting N seconds before shutting down, we should also make sure that no subsequent incoming RPCs will be handled by the NN?

          Show
          Aaron T. Myers added a comment - Rather than waiting N seconds and running the NN in an ill-defined state which will handle requests, seems like it would be better flag that the NN shutdown on reply to the RPC, and the shutdown path attempt NN#stop vs exiting hard. Would be useful for other non-HA cases where we discover in RPC context that we'd like to shutdown. Agree? The reason I didn't call NN#stop is because NN#stop itself tries to do a state transition, which doesn't seem wise to do in the event of a failed state transition. I'm not sure what you mean by "Rather than waiting N seconds and running the NN in an ill-definded state which will handle requests..." Do you mean that, in addition to waiting N seconds before shutting down, we should also make sure that no subsequent incoming RPCs will be handled by the NN?
          Hide
          Eli Collins added a comment -

          bq. Wrt delayed shutdown, we likely have (or should have) similar code elsewhere right since there's nothing HA specific?

          Not sure quite what you mean by this. Like where?

          We have other paths where a failure to start/stop services hooks into NN shutdown (the NameNode constructor exceptions caught by NameNode#main, and NameNode#stop), was thinking we could re-use those, but I forgot that these transitions you're worried about are triggered by RPC, which is, duh, why you're delaying shutdown.

          Rather than waiting N seconds and running the NN in an ill-defined state which will handle requests, seems like it would be better flag that the NN shutdown on reply to the RPC, and the shutdown path attempt NN#stop vs exiting hard. Would be useful for other non-HA cases where we discover in RPC context that we'd like to shutdown. Agree?

          Show
          Eli Collins added a comment - bq. Wrt delayed shutdown, we likely have (or should have) similar code elsewhere right since there's nothing HA specific? Not sure quite what you mean by this. Like where? We have other paths where a failure to start/stop services hooks into NN shutdown (the NameNode constructor exceptions caught by NameNode#main, and NameNode#stop), was thinking we could re-use those, but I forgot that these transitions you're worried about are triggered by RPC, which is, duh, why you're delaying shutdown. Rather than waiting N seconds and running the NN in an ill-defined state which will handle requests, seems like it would be better flag that the NN shutdown on reply to the RPC, and the shutdown path attempt NN#stop vs exiting hard. Would be useful for other non-HA cases where we discover in RPC context that we'd like to shutdown. Agree?
          Hide
          Aaron T. Myers added a comment -

          Over on HDFS-2920 (where this patch was originally posted as part of a larger patch), Eli reviewed the patch and had the following feedback:

          Wrt delayed shutdown, we likely have (or should have) similar code elsewhere right since there's nothing HA specific?

          Not sure quite what you mean by this. Like where?

          Why is the shutdown delayed rather than immediate?

          The reason for the delayed shutdown is because if we did an immediate shutdown, the state transition RPCs would never throw an error - they would either succeed or the NN would shut down. That seems unfortunate. It's reasonable to do a delayed shutdown here because the FailoverController won't continue with the failover if an error is thrown, so it won't tell the other node to transition to the active state, unless some other fencing mechanism succeeds, in which case the point is moot.

          Wrt "Error encountered during state transition.", isn't the error most likely due to a failure to start a service?

          Likely, but not necessarily. It could be a failure to stop a service, a failure to open an edit log for write, an OOM, or even an NPE due to a bug.

          Show
          Aaron T. Myers added a comment - Over on HDFS-2920 (where this patch was originally posted as part of a larger patch), Eli reviewed the patch and had the following feedback: Wrt delayed shutdown, we likely have (or should have) similar code elsewhere right since there's nothing HA specific? Not sure quite what you mean by this. Like where? Why is the shutdown delayed rather than immediate? The reason for the delayed shutdown is because if we did an immediate shutdown, the state transition RPCs would never throw an error - they would either succeed or the NN would shut down. That seems unfortunate. It's reasonable to do a delayed shutdown here because the FailoverController won't continue with the failover if an error is thrown, so it won't tell the other node to transition to the active state, unless some other fencing mechanism succeeds, in which case the point is moot. Wrt "Error encountered during state transition.", isn't the error most likely due to a failure to start a service? Likely, but not necessarily. It could be a failure to stop a service, a failure to open an edit log for write, an OOM, or even an NPE due to a bug.
          Hide
          Aaron T. Myers added a comment -

          Here's a patch which addresses the issue.

          Show
          Aaron T. Myers added a comment - Here's a patch which addresses the issue.

            People

            • Assignee:
              Aaron T. Myers
              Reporter:
              Aaron T. Myers
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development