Hadoop Common
  1. Hadoop Common
  2. HADOOP-8721

ZKFC should not retry 45 times when attempting a graceful fence during a failover

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 2.0.0-alpha
    • Fix Version/s: 2.0.2-alpha
    • Component/s: auto-failover, ha
    • Labels:
      None

      Description

      Scenario:
      Active NN on machine1
      Standby NN on machine2
      Machine1 is isolated from the network (machine1 network cable unplugged)
      After zk session timeout ZKFC at machine2 side gets notification that NN1 is not there.
      ZKFC tries to failover NN2 as active.
      As part of this during fencing it tries to connect to machine1 and kill NN1. (sshfence technique configured)
      This connection retry happens for 45 times( as it takes ipc.client.connect.max.socket.retries)
      Also after that standby NN is not able to take over as active (because of fencing failure).
      Suggestion: If ZKFC is not able to reach other NN for specified time/no of retries it can consider that NN as dead and instruct the other NN to take over as active as there is no chance of the other NN (NN1) retaining its state as active after zk session timeout when its isolated from network

      From ZKFC log:

      2012-06-21 17:46:14,378 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 22 time(s).
      2012-06-21 17:46:35,378 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 23 time(s).
      2012-06-21 17:46:56,378 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 24 time(s).
      2012-06-21 17:47:17,378 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 25 time(s).
      2012-06-21 17:47:38,382 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 26 time(s).
      2012-06-21 17:47:59,382 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 27 time(s).
      2012-06-21 17:48:20,386 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 28 time(s).
      2012-06-21 17:48:41,386 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 29 time(s).
      2012-06-21 17:49:02,386 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 30 time(s).
      2012-06-21 17:49:23,386 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 31 time(s).
      
      1. HDFS-3561-3.patch
        3 kB
        Vinayakumar B
      2. HDFS-3561-2.patch
        3 kB
        Vinayakumar B
      3. HDFS-3561.patch
        3 kB
        Vinayakumar B

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1175 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1175/)
          HADOOP-8721. ZKFC should not retry 45 times when attempting a graceful fence during a failover. Contributed by Vinayakumar B. (Revision 1376194)

          Result = FAILURE
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1376194
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/FailoverController.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1175 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1175/ ) HADOOP-8721 . ZKFC should not retry 45 times when attempting a graceful fence during a failover. Contributed by Vinayakumar B. (Revision 1376194) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1376194 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/FailoverController.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1143 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1143/)
          HADOOP-8721. ZKFC should not retry 45 times when attempting a graceful fence during a failover. Contributed by Vinayakumar B. (Revision 1376194)

          Result = FAILURE
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1376194
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/FailoverController.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1143 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1143/ ) HADOOP-8721 . ZKFC should not retry 45 times when attempting a graceful fence during a failover. Contributed by Vinayakumar B. (Revision 1376194) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1376194 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/FailoverController.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #2650 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2650/)
          HADOOP-8721. ZKFC should not retry 45 times when attempting a graceful fence during a failover. Contributed by Vinayakumar B. (Revision 1376194)

          Result = FAILURE
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1376194
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/FailoverController.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2650 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2650/ ) HADOOP-8721 . ZKFC should not retry 45 times when attempting a graceful fence during a failover. Contributed by Vinayakumar B. (Revision 1376194) Result = FAILURE atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1376194 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/FailoverController.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #2686 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2686/)
          HADOOP-8721. ZKFC should not retry 45 times when attempting a graceful fence during a failover. Contributed by Vinayakumar B. (Revision 1376194)

          Result = SUCCESS
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1376194
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/FailoverController.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2686 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2686/ ) HADOOP-8721 . ZKFC should not retry 45 times when attempting a graceful fence during a failover. Contributed by Vinayakumar B. (Revision 1376194) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1376194 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/FailoverController.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #2622 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2622/)
          HADOOP-8721. ZKFC should not retry 45 times when attempting a graceful fence during a failover. Contributed by Vinayakumar B. (Revision 1376194)

          Result = SUCCESS
          atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1376194
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/FailoverController.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2622 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2622/ ) HADOOP-8721 . ZKFC should not retry 45 times when attempting a graceful fence during a failover. Contributed by Vinayakumar B. (Revision 1376194) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1376194 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/FailoverController.java
          Hide
          Aaron T. Myers added a comment -

          I've just committed this to trunk and branch-2. Thanks a lot for the contribution, Vinay.

          Show
          Aaron T. Myers added a comment - I've just committed this to trunk and branch-2. Thanks a lot for the contribution, Vinay.
          Hide
          Aaron T. Myers added a comment -

          Great! Thanks for doing that.

          I'm going to commit this momentarily.

          Show
          Aaron T. Myers added a comment - Great! Thanks for doing that. I'm going to commit this momentarily.
          Hide
          Vinayakumar B added a comment -

          Yes Aaron,
          We tested the described scenario after setting number of retries to 1.
          ZKFC retries only one time and gets SocketTimeOutException and throws back the exception to caller without retrying further.

          Show
          Vinayakumar B added a comment - Yes Aaron, We tested the described scenario after setting number of retries to 1. ZKFC retries only one time and gets SocketTimeOutException and throws back the exception to caller without retrying further.
          Hide
          Aaron T. Myers added a comment -

          +1, the latest patch looks good to me.

          Vinay, can you comment on what testing you did of this patch? Were you able to verify manually that the ZKFC now doesn't retry 45 times during a failover? I'll commit this patch as soon as this question is answered.

          Show
          Aaron T. Myers added a comment - +1, the latest patch looks good to me. Vinay, can you comment on what testing you did of this patch? Were you able to verify manually that the ZKFC now doesn't retry 45 times during a failover? I'll commit this patch as soon as this question is answered.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12541971/HDFS-3561-3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3066//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3066//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12541971/HDFS-3561-3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3066//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3066//console This message is automatically generated.
          Hide
          Vinayakumar B added a comment -

          Attaching the patch with latest comment fixes.

          Show
          Vinayakumar B added a comment - Attaching the patch with latest comment fixes.
          Hide
          Aaron T. Myers added a comment -

          Sounds good, Vinay. I'll be happy to review/commit the patch once you make these changes.

          Show
          Aaron T. Myers added a comment - Sounds good, Vinay. I'll be happy to review/commit the patch once you make these changes.
          Hide
          Vinayakumar B added a comment -

          Thanks Aaron,

          I agree with your preference. I will post a new patch for that, which also contains below fixes.

          There's no need for the new getGracefulFenceConnectRetries function

          Ok, I will remove it.

          There's no need for these lines two be on separate lines

          This was created by eclipse formatter. Anyway, i will change it.

          Show
          Vinayakumar B added a comment - Thanks Aaron, I agree with your preference. I will post a new patch for that, which also contains below fixes. There's no need for the new getGracefulFenceConnectRetries function Ok, I will remove it. There's no need for these lines two be on separate lines This was created by eclipse formatter. Anyway, i will change it.
          Hide
          Aaron T. Myers added a comment -

          That's a good point, Vinay, that the method will only ever be called once, but I still think that creating a copy of the conf object in the FailoverController constructor makes the code a little clearer. I don't think the increased memory usage from having one extra copy of the conf object will be an issue at all. It will also be good from a future-proofing perspective to make sure that any mutations to the passed-in Configuration object don't affect the behavior of a long-lived FailoverController object. Does that make sense? Note that I don't feel super strongly about this; it's just my preference. If you disagree, we can go with what you have here.

          Two little nits I noticed while taking another look at this patch:

          1. There's no need for the new getGracefulFenceConnectRetries function, since it's only ever called from the constructor of this class. The other two similar methods are necessary because they're called from the ZKFailoverController class. At the very least, the function should be made private.
          2. There's no need for these lines two be on separate lines:
            +    newConf
            +        .setInt(
            
          Show
          Aaron T. Myers added a comment - That's a good point, Vinay, that the method will only ever be called once, but I still think that creating a copy of the conf object in the FailoverController constructor makes the code a little clearer. I don't think the increased memory usage from having one extra copy of the conf object will be an issue at all. It will also be good from a future-proofing perspective to make sure that any mutations to the passed-in Configuration object don't affect the behavior of a long-lived FailoverController object. Does that make sense? Note that I don't feel super strongly about this; it's just my preference. If you disagree, we can go with what you have here. Two little nits I noticed while taking another look at this patch: There's no need for the new getGracefulFenceConnectRetries function, since it's only ever called from the constructor of this class. The other two similar methods are necessary because they're called from the ZKFailoverController class. At the very least, the function should be made private. There's no need for these lines two be on separate lines: + newConf + .setInt(
          Hide
          Vinayakumar B added a comment -

          Hi Aaron T. Myers any more comments you have on this..?

          Show
          Vinayakumar B added a comment - Hi Aaron T. Myers any more comments you have on this..?
          Hide
          Vinayakumar B added a comment -

          That sounds good. But as of now, in ZKFC, tryGracefulFence() is called by creating the new instance of FC itself. That means setting in Constructor or inside tryGracefulFence() both are same.
          If we create the copy of conf only for tryGracefulFence() in constructor, FC will hold the copied conf in memory for the local target also, where we never use tryGracefulFence(). i.e. we will hold the duplicate conf instance which is not necessary.

          Anything I am missing..?

          Show
          Vinayakumar B added a comment - That sounds good. But as of now, in ZKFC, tryGracefulFence() is called by creating the new instance of FC itself. That means setting in Constructor or inside tryGracefulFence() both are same. If we create the copy of conf only for tryGracefulFence() in constructor, FC will hold the copied conf in memory for the local target also, where we never use tryGracefulFence(). i.e. we will hold the duplicate conf instance which is not necessary. Anything I am missing..?
          Hide
          Aaron T. Myers added a comment -

          Instead of creating a new Configuration object every time tryGracefulFence is called, how about creating a copy of the conf object in the FailoverController constructor, and setting the IPC_CLIENT_CONNECT_MAX_RETRIES there? This would also mean you wouldn't have to have the extra instance variable or method introduced by this patch.

          Show
          Aaron T. Myers added a comment - Instead of creating a new Configuration object every time tryGracefulFence is called, how about creating a copy of the conf object in the FailoverController constructor, and setting the IPC_CLIENT_CONNECT_MAX_RETRIES there? This would also mean you wouldn't have to have the extra instance variable or method introduced by this patch.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12536365/HDFS-3561-2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2815//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2815//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12536365/HDFS-3561-2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2815//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2815//console This message is automatically generated.
          Hide
          Vinayakumar B added a comment -

          Attaching the patch by addressing above comments

          Show
          Vinayakumar B added a comment - Attaching the patch by addressing above comments
          Hide
          Aaron T. Myers added a comment -

          I'd think that we'd only want the lower number of retries when trying to connect to the node that may be down.

          Show
          Aaron T. Myers added a comment - I'd think that we'd only want the lower number of retries when trying to connect to the node that may be down.
          Hide
          Vinayakumar B added a comment -

          Thanks Aaron for the suggestion.

          I have one question here.
          Shall we set the Connection retries to 1, for all proxy connections for FailOverController, i.e. local target as well as remote target.
          Any way connecting to Local target should not take much time.

          What is your suggestion?

          Show
          Vinayakumar B added a comment - Thanks Aaron for the suggestion. I have one question here. Shall we set the Connection retries to 1, for all proxy connections for FailOverController, i.e. local target as well as remote target. Any way connecting to Local target should not take much time. What is your suggestion?
          Hide
          Aaron T. Myers added a comment -

          Seems to me like these new configs should not be made specific to the ZKFC, but rather should apply to all failover controllers. Given that, I think we should change the config keys to be named similarly to the other FC graceful connection configs, e.g. "ha.failover-controller.graceful-fence.rpc-timeout.ms". Furthermore, we should push down the handling for this into the FailoverController, and not put it in ZKFailoverController.

          Show
          Aaron T. Myers added a comment - Seems to me like these new configs should not be made specific to the ZKFC, but rather should apply to all failover controllers. Given that, I think we should change the config keys to be named similarly to the other FC graceful connection configs, e.g. "ha.failover-controller.graceful-fence.rpc-timeout.ms". Furthermore, we should push down the handling for this into the FailoverController, and not put it in ZKFailoverController.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12535395/HDFS-3561.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2750//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12535395/HDFS-3561.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2750//console This message is automatically generated.
          Hide
          Uma Maheswara Rao G added a comment -

          How about the configuration key name like this
          HA_ZKFC_GRACEFUL_FENCE_MAX_RETRIES --> HA_ZKFC_GRACEFUL_FENCE_CONNECTION_RETRIES_KEY ?

          Also will it be more clearer to separate out the configurations( retries on connection and socket timeout retries)? because they are different configs exposed in common.

          @Aaron, could you please add your comments here?

          Show
          Uma Maheswara Rao G added a comment - How about the configuration key name like this HA_ZKFC_GRACEFUL_FENCE_MAX_RETRIES --> HA_ZKFC_GRACEFUL_FENCE_CONNECTION_RETRIES_KEY ? Also will it be more clearer to separate out the configurations( retries on connection and socket timeout retries)? because they are different configs exposed in common. @Aaron, could you please add your comments here?
          Hide
          Vinayakumar B added a comment -

          Attaching patch to reduce the number of retries for graceful fence

          Show
          Vinayakumar B added a comment - Attaching patch to reduce the number of retries for graceful fence
          Hide
          Uma Maheswara Rao G added a comment -

          Thanks Aaron

          Regardless, note that I don't object to the main point of this JIRA, i.e. that we should lower the number of retries when attempting to gracefully fence an NN. If someone wants to supply a patch for that, I'll be happy to review/commit it.

          Sure. I will ask Vinay to upload the patch what we have.

          Show
          Uma Maheswara Rao G added a comment - Thanks Aaron Regardless, note that I don't object to the main point of this JIRA, i.e. that we should lower the number of retries when attempting to gracefully fence an NN. If someone wants to supply a patch for that, I'll be happy to review/commit it. Sure. I will ask Vinay to upload the patch what we have.
          Hide
          Aaron T. Myers added a comment -

          Ah, yes. Both in the case of BKJM or the QJM, fencing is effectively built into write protocol, so there won't be any need for external fencing options.

          Regardless, note that I don't object to the main point of this JIRA, i.e. that we should lower the number of retries when attempting to gracefully fence an NN. If someone wants to supply a patch for that, I'll be happy to review/commit it.

          Show
          Aaron T. Myers added a comment - Ah, yes. Both in the case of BKJM or the QJM, fencing is effectively built into write protocol, so there won't be any need for external fencing options. Regardless, note that I don't object to the main point of this JIRA, i.e. that we should lower the number of retries when attempting to gracefully fence an NN. If someone wants to supply a patch for that, I'll be happy to review/commit it.
          Hide
          Uma Maheswara Rao G added a comment -

          Hi Aaron, Thanks a lot.

              for example if we have shared storage fencing at writer level. Whenever new writer comes, automatically old writer will not be allowed.
          
          I'm not sure what you mean by this.
          

          What I mean by this is, take a case of BookKeeper:
          BookKeeper itself has a fencing logic. Whenever we open a new ledger, automatically it will get fenced. Other clients will not be able to proceed for further writes.
          I hope similar logic may come in QJM also using some writer level epoch numbers.

          May be we need to develop a fence method with this clients and fence the things, but only the thing i don't like is I need to to place that client libraries in ZKFC(this step may not be required as NN's respective JM's will any way has to open writer, that writer itself will fence automatically). I hope u understand my point here.

          Show
          Uma Maheswara Rao G added a comment - Hi Aaron, Thanks a lot. for example if we have shared storage fencing at writer level. Whenever new writer comes, automatically old writer will not be allowed. I'm not sure what you mean by this . What I mean by this is, take a case of BookKeeper: BookKeeper itself has a fencing logic. Whenever we open a new ledger, automatically it will get fenced. Other clients will not be able to proceed for further writes. I hope similar logic may come in QJM also using some writer level epoch numbers. May be we need to develop a fence method with this clients and fence the things, but only the thing i don't like is I need to to place that client libraries in ZKFC(this step may not be required as NN's respective JM's will any way has to open writer, that writer itself will fence automatically). I hope u understand my point here.
          Hide
          Aaron T. Myers added a comment -

          How we can do shared storage fencing from ZKFC?

          Many NFS filers have APIs for fencing all subsequent writes or reads from a given IP address. The ZKFC can be configured to run a script which initiates fencing via this method. See the "dfs.ha.fencing.methods" section on this page.

          for example if we have shared storage fencing at writer level. Whenever new writer comes, automatically old writer will not be allowed.

          I'm not sure what you mean by this.

          Still my question remains. If we want to fence 100% from ZKFC itself, why shared storage writer level fencing required?
          How you are handling network down scenario in your clusters?

          The ZKFC is always responsible for triggering fencing, so it's not really that we "fence 100% from ZKFC itself." Rather, the ZKFC is capable of triggering the fencing process for a given NN via other devices, e.g. the filer or PDU.

          I hope this clears things up.

          Show
          Aaron T. Myers added a comment - How we can do shared storage fencing from ZKFC? Many NFS filers have APIs for fencing all subsequent writes or reads from a given IP address. The ZKFC can be configured to run a script which initiates fencing via this method. See the "dfs.ha.fencing.methods" section on this page . for example if we have shared storage fencing at writer level. Whenever new writer comes, automatically old writer will not be allowed. I'm not sure what you mean by this. Still my question remains. If we want to fence 100% from ZKFC itself, why shared storage writer level fencing required? How you are handling network down scenario in your clusters? The ZKFC is always responsible for triggering fencing, so it's not really that we "fence 100% from ZKFC itself." Rather, the ZKFC is capable of triggering the fencing process for a given NN via other devices, e.g. the filer or PDU. I hope this clears things up.
          Hide
          Uma Maheswara Rao G added a comment -

          Hi Aaron, Thanks a lot for the explanation.

          I have few questions. I think I am missing some thing here.

          You can optionally configure more fencing methods, for example IP-based shared storage fencing

          How we can do shared storage fencing from ZKFC? for example if we have shared storage fencing at writer level. Whenever new writer comes, automatically old writer will not be allowed.

          One of the stated aims of the HA work was to favor data reliability over availability. So, if we can't guarantee the correctness of the data, we shouldn't cause a state transition.

          Still my question remains. If we want to fence 100% from ZKFC itself, why shared storage writer level fencing required?

          How you are handling network down scenario in your clusters?

          Regards,
          Uma

          Show
          Uma Maheswara Rao G added a comment - Hi Aaron, Thanks a lot for the explanation. I have few questions. I think I am missing some thing here. You can optionally configure more fencing methods, for example IP-based shared storage fencing How we can do shared storage fencing from ZKFC? for example if we have shared storage fencing at writer level. Whenever new writer comes, automatically old writer will not be allowed. One of the stated aims of the HA work was to favor data reliability over availability. So, if we can't guarantee the correctness of the data, we shouldn't cause a state transition. Still my question remains. If we want to fence 100% from ZKFC itself, why shared storage writer level fencing required? How you are handling network down scenario in your clusters? Regards, Uma
          Hide
          Uma Maheswara Rao G added a comment -

          Hi Aaron, Thanks a lot for the explanation.

          I have few questions.

          You can optionally configure more fencing methods, for example IP-based shared storage fencing

          Show
          Uma Maheswara Rao G added a comment - Hi Aaron, Thanks a lot for the explanation. I have few questions. You can optionally configure more fencing methods, for example IP-based shared storage fencing
          Hide
          Aaron T. Myers added a comment -

          I think some wires are getting crossed here. Some clarifications:

          • The ZKFC always performs the act of fencing, by executing the configured fencing methods.
          • There are two fencing methods shipped out of the box: 1) RPC to the active NN to tell it to move to the standby state, 2) ssh to the active NN and `kill -9' the NN process.
          • You can optionally configure more fencing methods, for example IP-based shared storage fencing, or IP-based STONITH via PDU fencing.
          • The ZKFC proceeds to execute the various fencing methods in the order they're configured.
          • One of the stated aims of the HA work was to favor data reliability over availability. So, if we can't guarantee the correctness of the data, we shouldn't cause a state transition.

          Given all of the above, at least one of the fencing methods must succeed before the ZFKC can reasonably cause the standby to transition to active. Imagine a network failure wherein the ZKFCs can no longer reach the active NN, but can reach the standby. If we just try to ping the active NN for a while, without having successfully fenced it, and then transition the standby to active since we can't ping the previous active, then both NNs might be active simultaneously, write to the shared storage and corrupt the FS metadata. This isn't acceptable.

          As I said previously, I'm very much in favor of lowering the number of graceful fencing retries to a reasonable value. Todd recommended 0 or 1, which sounds fine by me. What I'm not in favor of is changing the ZKFC to ever cause the standby to become active without some fencing method succeeding.

          Show
          Aaron T. Myers added a comment - I think some wires are getting crossed here. Some clarifications: The ZKFC always performs the act of fencing, by executing the configured fencing methods. There are two fencing methods shipped out of the box: 1) RPC to the active NN to tell it to move to the standby state, 2) ssh to the active NN and `kill -9' the NN process. You can optionally configure more fencing methods, for example IP-based shared storage fencing, or IP-based STONITH via PDU fencing. The ZKFC proceeds to execute the various fencing methods in the order they're configured. One of the stated aims of the HA work was to favor data reliability over availability. So, if we can't guarantee the correctness of the data, we shouldn't cause a state transition. Given all of the above, at least one of the fencing methods must succeed before the ZFKC can reasonably cause the standby to transition to active. Imagine a network failure wherein the ZKFCs can no longer reach the active NN, but can reach the standby. If we just try to ping the active NN for a while, without having successfully fenced it, and then transition the standby to active since we can't ping the previous active, then both NNs might be active simultaneously, write to the shared storage and corrupt the FS metadata. This isn't acceptable. As I said previously, I'm very much in favor of lowering the number of graceful fencing retries to a reasonable value. Todd recommended 0 or 1, which sounds fine by me. What I'm not in favor of is changing the ZKFC to ever cause the standby to become active without some fencing method succeeding.
          Hide
          Uma Maheswara Rao G added a comment -

          Yes, we have multiple level of fencings.

          First fencing should happen at ZKFC level. Other place, will have Fencing at Shared storage level.
          What if fencing logic check for IP reachability? If ip is reachable and not able to fence means fence failed, If IP itself is not reachable then, no way it can kill the remote node. At this case we may need to give chance to next level fencer (shared storage)?
          If ZKFC itself ensure 100% fence then, why shared storage fencing is required? Am i missing something here?

          Show
          Uma Maheswara Rao G added a comment - Yes, we have multiple level of fencings. First fencing should happen at ZKFC level. Other place, will have Fencing at Shared storage level. What if fencing logic check for IP reachability? If ip is reachable and not able to fence means fence failed, If IP itself is not reachable then, no way it can kill the remote node. At this case we may need to give chance to next level fencer (shared storage)? If ZKFC itself ensure 100% fence then, why shared storage fencing is required? Am i missing something here?
          Hide
          Vinayakumar B added a comment -

          This isn't acceptable. The point of fencing is to ensure that if the previously-active NN returns from appearing to have been down, it doesn't start writing to the shared directory again while the new active is also writing to that directory.

          If the shared storage provides good fencing, so that if it is fenced once and not allowing to write anymore from old writer then this situation will never come.

          If the fail-over not happening due to fencing failure of old active that means network down is not supported in ZKFC..?

          I think, taking transitioning current standby to Active in the current situation will be the correct behavior.

          Any thoughts on this..?

          Show
          Vinayakumar B added a comment - This isn't acceptable. The point of fencing is to ensure that if the previously-active NN returns from appearing to have been down, it doesn't start writing to the shared directory again while the new active is also writing to that directory. If the shared storage provides good fencing, so that if it is fenced once and not allowing to write anymore from old writer then this situation will never come. If the fail-over not happening due to fencing failure of old active that means network down is not supported in ZKFC..? I think, taking transitioning current standby to Active in the current situation will be the correct behavior. Any thoughts on this..?
          Hide
          Todd Lipcon added a comment -

          +1 for setting it to 0 or 1 for the graceful fence attempt.

          Show
          Todd Lipcon added a comment - +1 for setting it to 0 or 1 for the graceful fence attempt.
          Hide
          Aaron T. Myers added a comment -

          Suggestion: If ZKFC is not able to reach other NN for specified time/no of retries it can consider that NN as dead and instruct the other NN to take over as active as there is no chance of the other NN (NN1) retaining its state as active after zk session timeout when its isolated from network

          This isn't acceptable. The point of fencing is to ensure that if the previously-active NN returns from appearing to have been down, it doesn't start writing to the shared directory again while the new active is also writing to that directory.

          I think we can set retries to 1/2 for avoiding unnecessary actions on small nw fluctuations? or we can set it to 0 as we are already setting the same values in ConfiguredFailoverProxyProvider for failover clients.

          We set it to 0 in ConfiguredFailoverProxyProvider because we want to trying failing over immediately as the retry mechanism, instead of repeatedly trying to contact a machine that may in fact be completely down.

          I agree, though, that setting it to a lower number than 45 makes sense in the case of the client in the ZKFC, and perhaps making it configurable separately.

          Show
          Aaron T. Myers added a comment - Suggestion: If ZKFC is not able to reach other NN for specified time/no of retries it can consider that NN as dead and instruct the other NN to take over as active as there is no chance of the other NN (NN1) retaining its state as active after zk session timeout when its isolated from network This isn't acceptable. The point of fencing is to ensure that if the previously-active NN returns from appearing to have been down, it doesn't start writing to the shared directory again while the new active is also writing to that directory. I think we can set retries to 1/2 for avoiding unnecessary actions on small nw fluctuations? or we can set it to 0 as we are already setting the same values in ConfiguredFailoverProxyProvider for failover clients. We set it to 0 in ConfiguredFailoverProxyProvider because we want to trying failing over immediately as the retry mechanism, instead of repeatedly trying to contact a machine that may in fact be completely down. I agree, though, that setting it to a lower number than 45 makes sense in the case of the client in the ZKFC, and perhaps making it configurable separately.
          Hide
          Uma Maheswara Rao G added a comment -

          I think we can set retries to 1/2 for avoiding unnecessary actions on small nw fluctuations? or we can set it to 0 as we are already setting the same values in ConfiguredFailoverProxyProvider for failover clients.

           public static final String  DFS_CLIENT_FAILOVER_CONNECTION_RETRIES_KEY = "dfs.client.failover.connection.retries";
            public static final int     DFS_CLIENT_FAILOVER_CONNECTION_RETRIES_DEFAULT = 0;
            public static final String  DFS_CLIENT_FAILOVER_CONNECTION_RETRIES_ON_SOCKET_TIMEOUTS_KEY = "dfs.client.failover.connection.retries.on.timeouts";
            public static final int     DFS_CLIENT_FAILOVER_CONNECTION_RETRIES_ON_SOCKET_TIMEOUTS_DEFAULT = 0;
          
          Show
          Uma Maheswara Rao G added a comment - I think we can set retries to 1/2 for avoiding unnecessary actions on small nw fluctuations? or we can set it to 0 as we are already setting the same values in ConfiguredFailoverProxyProvider for failover clients. public static final String DFS_CLIENT_FAILOVER_CONNECTION_RETRIES_KEY = "dfs.client.failover.connection.retries" ; public static final int DFS_CLIENT_FAILOVER_CONNECTION_RETRIES_DEFAULT = 0; public static final String DFS_CLIENT_FAILOVER_CONNECTION_RETRIES_ON_SOCKET_TIMEOUTS_KEY = "dfs.client.failover.connection.retries.on.timeouts" ; public static final int DFS_CLIENT_FAILOVER_CONNECTION_RETRIES_ON_SOCKET_TIMEOUTS_DEFAULT = 0;
          Hide
          Vinayakumar B added a comment -

          During transition, fencing of old active will be done.

          Here before actually using the fencing method configured, gracefull fencing will be tried. Now zkfc will try to get the proxy of other machine Namenode. since the n/w is down, it is not able to get the connection and it is retrying for 45 times configured using ipc.client.connect.max.retries.on.timeouts

          LOG.info("Should fence: " + target);
              boolean gracefulWorked = new FailoverController(conf,
                  RequestSource.REQUEST_BY_ZKFC).tryGracefulFence(target);
              if (gracefulWorked) {
                // It's possible that it's in standby but just about to go into active,
                // no? Is there some race here?
                LOG.info("Successfully transitioned " + target + " to standby " +
                    "state without fencing");
                return;
              }

          I think in ZKFC case we can reduce the number of retries.

          Any thoughts?

          Show
          Vinayakumar B added a comment - During transition, fencing of old active will be done. Here before actually using the fencing method configured, gracefull fencing will be tried. Now zkfc will try to get the proxy of other machine Namenode. since the n/w is down, it is not able to get the connection and it is retrying for 45 times configured using ipc.client.connect.max.retries.on.timeouts LOG.info( "Should fence: " + target); boolean gracefulWorked = new FailoverController(conf, RequestSource.REQUEST_BY_ZKFC).tryGracefulFence(target); if (gracefulWorked) { // It's possible that it's in standby but just about to go into active, // no? Is there some race here? LOG.info( "Successfully transitioned " + target + " to standby " + "state without fencing" ); return ; } I think in ZKFC case we can reduce the number of retries. Any thoughts?
          Hide
          Uma Maheswara Rao G added a comment -

          Good catch Suja. Thanks for filing the JIRA.

          Show
          Uma Maheswara Rao G added a comment - Good catch Suja. Thanks for filing the JIRA.

            People

            • Assignee:
              Vinayakumar B
              Reporter:
              suja s
            • Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development