Hadoop Common
  1. Hadoop Common
  2. HADOOP-8247

Auto-HA: add a config to enable auto-HA, which disables manual FC

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: Auto Failover (HDFS-3042)
    • Fix Version/s: Auto Failover (HDFS-3042)
    • Component/s: auto-failover, ha
    • Labels:
      None

      Description

      Currently, if automatic failover is set up and running, and the user uses the "haadmin -failover" command, he or she can end up putting the system in an inconsistent state, where the state in ZK disagrees with the actual state of the world. To fix this, we should add a config flag which is used to enable auto-HA. When this flag is set, we should disallow use of the haadmin command to initiate failovers. We should refuse to run ZKFCs when the flag is not set. Of course, this flag should be scoped by nameservice.

      1. hadoop-8247.txt
        67 kB
        Todd Lipcon
      2. hadoop-8247.txt
        71 kB
        Todd Lipcon
      3. hadoop-8247.txt
        71 kB
        Todd Lipcon
      4. hadoop-8247.txt
        71 kB
        Todd Lipcon
      5. hadoop-8247.txt
        76 kB
        Todd Lipcon
      6. hadoop-8247.txt
        69 kB
        Todd Lipcon

        Activity

        Hide
        Todd Lipcon added a comment -

        The problem of this jira is that it makes the auto and manual failover exclusive to each other

        Yes, this is a temporary state along the way. As discussed elsewhere, we need to flip the manual HA commands over to communicate with the ZKFCs when automatic failover is enabled. Since that code isn't done yet, the current behavior is to disable manual failover.

        Show
        Todd Lipcon added a comment - The problem of this jira is that it makes the auto and manual failover exclusive to each other Yes, this is a temporary state along the way. As discussed elsewhere, we need to flip the manual HA commands over to communicate with the ZKFCs when automatic failover is enabled. Since that code isn't done yet, the current behavior is to disable manual failover.
        Hide
        Mingjie Lai added a comment -

        I overlooked the jira so just found the discussion here.

        The problem of this jira is that it makes the auto and manual failover exclusive to each other. While in section 2.7.2 of the design doc at https://issues.apache.org/jira/browse/HDFS-2185, a manual failover is supposed to be handled by zkfc if the auto failover configure is on. That proposal makes it possible that we can perform a manual failover for upgrade purposes, while having auto failover to prevent a nn failure.

        I will open another jira to address the above issue. However I foresee the behaviour of manual failover will be changed since it will be allowed no matter auto failover is turned on or not.

        Show
        Mingjie Lai added a comment - I overlooked the jira so just found the discussion here. The problem of this jira is that it makes the auto and manual failover exclusive to each other. While in section 2.7.2 of the design doc at https://issues.apache.org/jira/browse/HDFS-2185 , a manual failover is supposed to be handled by zkfc if the auto failover configure is on. That proposal makes it possible that we can perform a manual failover for upgrade purposes, while having auto failover to prevent a nn failure. I will open another jira to address the above issue. However I foresee the behaviour of manual failover will be changed since it will be allowed no matter auto failover is turned on or not.
        Hide
        Todd Lipcon added a comment -

        Committed to branch, thx for reviews all.

        Show
        Todd Lipcon added a comment - Committed to branch, thx for reviews all.
        Hide
        Aaron T. Myers added a comment -

        Good catch, Todd. +1, the updated patch looks good to me.

        Show
        Aaron T. Myers added a comment - Good catch, Todd. +1, the updated patch looks good to me.
        Hide
        Todd Lipcon added a comment -

        I re-ran the tests right before commit and realized I accidentally introduced a hang by adding the confirmation prompt above. I've just updated TestDFSHAAdmin to prep System.in with "yes" responses. Aaron, can you check just the diff in that class? Thanks.

        Show
        Todd Lipcon added a comment - I re-ran the tests right before commit and realized I accidentally introduced a hang by adding the confirmation prompt above. I've just updated TestDFSHAAdmin to prep System.in with "yes" responses. Aaron, can you check just the diff in that class? Thanks.
        Hide
        Aaron T. Myers added a comment -

        +1, the latest patch looks good to me.

        Show
        Aaron T. Myers added a comment - +1, the latest patch looks good to me.
        Hide
        Hari Mankude added a comment -

        +1 from me

        Show
        Hari Mankude added a comment - +1 from me
        Hide
        Todd Lipcon added a comment -

        Attached patch implements the above. Here is a transcript of its usage:
        $ ./bin/hdfs haadmin -transitionToStandby nn1 -forcemanual
        You have specified the forcemanual flag. This flag is dangerous, as it can induce a split-brain scenario that WILL CORRUPT your HDFS namespace, possibly irrecoverably.

        It is recommended not to use this flag, but instead to shut down the cluster and disable automatic failover if you prefer to manually manage your HA state.

        You may abort safely by answering 'n' or hitting ^C now.

        Are you sure you want to continue? (Y or N) n
        12/04/10 17:05:53 FATAL ha.HAAdmin: Aborted

        todd@todd-w510:~/git/hadoop-common/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT$ ./bemanual haadmin -transitionToStandby nn1 -force
        You have specified the forcemanual flag. This flag is dangerous, as it can induce a split-brain scenario that WILL CORRUPT your HDFS namespace, possibly irrecoverably.

        It is recommended not to use this flag, but instead to shut down the cluster and disable automatic failover if you prefer to manually manage your HA state.

        You may abort safely by answering 'n' or hitting ^C now.

        Are you sure you want to continue? (Y or N) y
        12/04/10 17:02:05 WARN ha.HAAdmin: Proceeding with manual HA state management even though
        automatic failover is enabled for NameNode at todd-w510/127.0.0.1:8021

        Show
        Todd Lipcon added a comment - Attached patch implements the above. Here is a transcript of its usage: $ ./bin/hdfs haadmin -transitionToStandby nn1 -forcemanual You have specified the forcemanual flag. This flag is dangerous, as it can induce a split-brain scenario that WILL CORRUPT your HDFS namespace, possibly irrecoverably. It is recommended not to use this flag, but instead to shut down the cluster and disable automatic failover if you prefer to manually manage your HA state. You may abort safely by answering 'n' or hitting ^C now. Are you sure you want to continue? (Y or N) n 12/04/10 17:05:53 FATAL ha.HAAdmin: Aborted todd@todd-w510:~/git/hadoop-common/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT$ ./bemanual haadmin -transitionToStandby nn1 -force You have specified the forcemanual flag. This flag is dangerous, as it can induce a split-brain scenario that WILL CORRUPT your HDFS namespace, possibly irrecoverably. It is recommended not to use this flag, but instead to shut down the cluster and disable automatic failover if you prefer to manually manage your HA state. You may abort safely by answering 'n' or hitting ^C now. Are you sure you want to continue? (Y or N) y 12/04/10 17:02:05 WARN ha.HAAdmin: Proceeding with manual HA state management even though automatic failover is enabled for NameNode at todd-w510/127.0.0.1:8021
        Hide
        Hari Mankude added a comment -

        As I said above, I'm OK giving it a scarier name and/or making it prompt for confirmation upon use, with a scary warning message. I'm even OK removing it from the documentation, so people aren't lured into using it when they don't really know what they're doing.

        I am ok with adding a prompt for confirmation upon use and removing it from documentation.

        Show
        Hari Mankude added a comment - As I said above, I'm OK giving it a scarier name and/or making it prompt for confirmation upon use, with a scary warning message. I'm even OK removing it from the documentation, so people aren't lured into using it when they don't really know what they're doing. I am ok with adding a prompt for confirmation upon use and removing it from documentation.
        Hide
        Todd Lipcon added a comment -

        There are always admins who disregard these warnings

        I think they deserve what they get... admins can also decide to run "rm -Rf /my/metadata/dir" and get into a bad state.

        Instead, wouldn't it be better to come up with a set of procedures to unwedge the cluster, starting with setting auto-failover key to false, resetting NNs and using manual failover

        Assumedly you want to be able to do this without incurring downtime. Certainly if downtime is acceptable, that would be the right response.. But still I think having a manual override here is useful for advanced operators who need to use it in an extenuating circumstance.

        As I said above, I'm OK giving it a scarier name and/or making it prompt for confirmation upon use, with a scary warning message. I'm even OK removing it from the documentation, so people aren't lured into using it when they don't really know what they're doing.

        Show
        Todd Lipcon added a comment - There are always admins who disregard these warnings I think they deserve what they get... admins can also decide to run "rm -Rf /my/metadata/dir" and get into a bad state. Instead, wouldn't it be better to come up with a set of procedures to unwedge the cluster, starting with setting auto-failover key to false, resetting NNs and using manual failover Assumedly you want to be able to do this without incurring downtime. Certainly if downtime is acceptable, that would be the right response.. But still I think having a manual override here is useful for advanced operators who need to use it in an extenuating circumstance. As I said above, I'm OK giving it a scarier name and/or making it prompt for confirmation upon use, with a scary warning message. I'm even OK removing it from the documentation, so people aren't lured into using it when they don't really know what they're doing.
        Hide
        Hari Mankude added a comment -

        Hi Hari. That's specifically the point of the FORCEMANUAL flag. It is not safe to use it with automatic failover. So, the user has to accept the warning and acknowledge they're about to do something dumb, that will break auto failover if the ZKFCs are running.

        If there is such a significant element of danger in using FORCEMANUAL with unforseen consequences and possibilities such as data corruption and even worse HA state, is this command option useful? There are always admins who disregard these warnings. Instead, wouldn't it be better to come up with a set of procedures to unwedge the cluster, starting with setting auto-failover key to false, resetting NNs and using manual failover?

        Show
        Hari Mankude added a comment - Hi Hari. That's specifically the point of the FORCEMANUAL flag. It is not safe to use it with automatic failover. So, the user has to accept the warning and acknowledge they're about to do something dumb, that will break auto failover if the ZKFCs are running. If there is such a significant element of danger in using FORCEMANUAL with unforseen consequences and possibilities such as data corruption and even worse HA state, is this command option useful? There are always admins who disregard these warnings. Instead, wouldn't it be better to come up with a set of procedures to unwedge the cluster, starting with setting auto-failover key to false, resetting NNs and using manual failover?
        Hide
        Todd Lipcon added a comment -

        I also ran the manual tests again. Here's the usage output of HAAdmin:

        Usage: DFSHAAdmin [-ns <nameserviceId>]
            [-transitionToActive [--forcemanual] <serviceId>]
            [-transitionToStandby [--forcemanual] <serviceId>]
            [-failover [--forcefence] [--forceactive] [--forcemanual] <serviceId> <serviceId>]
            [-getServiceState <serviceId>]
            [-checkHealth <serviceId>]
            [-help <command>]
        
          --forceManual allows the manual failover commands to be used
                        even when automatic failover is enabled. This
                        flag is DANGEROUS and should only be used with
                        expert guidance.
        

        Here's what happens if I try to use a state change command with auto-HA enabled:

        $ ./bin/hdfs haadmin -transitionToActive nn1
        Automatic failover is enabled for NameNode at todd-w510/127.0.0.1:8021
        Refusing to manually manage HA state, since it may cause
        a split-brain scenario or other incorrect state.
        If you are very sure you know what you are doing, please 
        specify the forcemanual flag.
        $ echo $?
        255
        

        Also checked the other two state-changing ops (transitionToStandby and failover) and they yielded the same error message.

        • I verified that -getServiceState and -checkHealth continue to work.
        • I verified that the -forceManual flag worked:
        $ ./bin/hdfs haadmin -transitionToStandby -forcemanual nn1
        12/04/09 16:12:38 WARN ha.HAAdmin: Proceeding with manual HA state management even though
        automatic failover is enabled for NameNode at todd-w510/127.0.0.1:8021
        

        (also for -transitionToActive and -failover)

        • Verified that start-dfs.sh starts the ZKFCs on both of my configured NNs when auto-HA is enabled. Also verified stop-dfs.sh stops the ZKFCs. Discovered trivial bug HDFS-3234 here.

        Next, I modified my config to set the auto failover flag to false.

        • verified that start-dfs.sh doesn't try to start ZKFCs.
        • verified that if I try to start a ZKFC, it bails:
          12/04/09 16:19:12 INFO tools.DFSZKFailoverController: Failover controller configured for NameNode nameserviceId1.nn2
          12/04/09 16:19:12 FATAL ha.ZKFailoverController: Automatic failover is not enabled for NameNode at todd-w510/127.0.0.1:8022. Please ensure that automatic failover is enabled in the configuration before running the ZK failover controller.
          
        • verified that the haadmin commands all function without any -forcemanual flag specified.
        Show
        Todd Lipcon added a comment - I also ran the manual tests again. Here's the usage output of HAAdmin: Usage: DFSHAAdmin [-ns <nameserviceId>] [-transitionToActive [--forcemanual] <serviceId>] [-transitionToStandby [--forcemanual] <serviceId>] [-failover [--forcefence] [--forceactive] [--forcemanual] <serviceId> <serviceId>] [-getServiceState <serviceId>] [-checkHealth <serviceId>] [-help <command>] --forceManual allows the manual failover commands to be used even when automatic failover is enabled. This flag is DANGEROUS and should only be used with expert guidance. Here's what happens if I try to use a state change command with auto-HA enabled: $ ./bin/hdfs haadmin -transitionToActive nn1 Automatic failover is enabled for NameNode at todd-w510/127.0.0.1:8021 Refusing to manually manage HA state, since it may cause a split-brain scenario or other incorrect state. If you are very sure you know what you are doing, please specify the forcemanual flag. $ echo $? 255 Also checked the other two state-changing ops (transitionToStandby and failover) and they yielded the same error message. I verified that -getServiceState and -checkHealth continue to work. I verified that the -forceManual flag worked: $ ./bin/hdfs haadmin -transitionToStandby -forcemanual nn1 12/04/09 16:12:38 WARN ha.HAAdmin: Proceeding with manual HA state management even though automatic failover is enabled for NameNode at todd-w510/127.0.0.1:8021 (also for -transitionToActive and -failover) Verified that start-dfs.sh starts the ZKFCs on both of my configured NNs when auto-HA is enabled. Also verified stop-dfs.sh stops the ZKFCs. Discovered trivial bug HDFS-3234 here. Next, I modified my config to set the auto failover flag to false. verified that start-dfs.sh doesn't try to start ZKFCs. verified that if I try to start a ZKFC, it bails: 12/04/09 16:19:12 INFO tools.DFSZKFailoverController: Failover controller configured for NameNode nameserviceId1.nn2 12/04/09 16:19:12 FATAL ha.ZKFailoverController: Automatic failover is not enabled for NameNode at todd-w510/127.0.0.1:8022. Please ensure that automatic failover is enabled in the configuration before running the ZK failover controller. verified that the haadmin commands all function without any -forcemanual flag specified.
        Hide
        Todd Lipcon added a comment -

        Ran unit tests and caught one test bug:

        • assertEquals(RequestSource.REQUEST_BY_USER, reqInfoCaptor.getValue());
          + assertEquals(RequestSource.REQUEST_BY_USER,
          + reqInfoCaptor.getValue().getSource());
        Show
        Todd Lipcon added a comment - Ran unit tests and caught one test bug: assertEquals(RequestSource.REQUEST_BY_USER, reqInfoCaptor.getValue()); + assertEquals(RequestSource.REQUEST_BY_USER, + reqInfoCaptor.getValue().getSource());
        Hide
        Todd Lipcon added a comment -

        New rev of the patch addresses Aaron's comments above. I also added a few more javadocs here and there, and renamed some methods for better clarity

        Show
        Todd Lipcon added a comment - New rev of the patch addresses Aaron's comments above. I also added a few more javadocs here and there, and renamed some methods for better clarity
        Hide
        Todd Lipcon added a comment -

        P.S. if you'd like I'd be happy to rename it to something even scarier sounding... like "--dangerous-manual-override", or whatever you prefer.

        Show
        Todd Lipcon added a comment - P.S. if you'd like I'd be happy to rename it to something even scarier sounding... like "--dangerous-manual-override", or whatever you prefer.
        Hide
        Todd Lipcon added a comment -

        Hi Hari. That's specifically the point of the FORCEMANUAL flag. It is not safe to use it with automatic failover. So, the user has to accept the warning and acknowledge they're about to do something dumb, that will break auto failover if the ZKFCs are running.

        The purpose of allowing it at all is to give a recourse for an expert admin if their ZK cluster has crashed and they need to manually do a failover in an emergency situation. Its use is highly discouraged. The warning printed is:

                "  --forceManual allows the manual failover commands to be used\n" +
                "                even when automatic failover is enabled. This\n" +
                "                flag is DANGEROUS and should only be used with\n" +
                "                expert guidance.");
        
        Show
        Todd Lipcon added a comment - Hi Hari. That's specifically the point of the FORCEMANUAL flag. It is not safe to use it with automatic failover. So, the user has to accept the warning and acknowledge they're about to do something dumb, that will break auto failover if the ZKFCs are running. The purpose of allowing it at all is to give a recourse for an expert admin if their ZK cluster has crashed and they need to manually do a failover in an emergency situation. Its use is highly discouraged. The warning printed is: " --forceManual allows the manual failover commands to be used\n" + " even when automatic failover is enabled. This\n" + " flag is DANGEROUS and should only be used with\n" + " expert guidance." );
        Hide
        Hari Mankude added a comment -

        Todd,

        Can you add tests or verify that FORCEMANUAL option works with automatic failover? Specifically, the races to consider are

        1. User tries the forcemanual option on NN1 with transitionToActive() when NN1 is transiting to a standby state (ZKFC1 lost the ephemeral node) and NN2 is becomingActive()
        2. User tries the forcemanual option on NN1 with transitionToStandby() when it is transiting to an active state.
        3. Both these situations when there are other dfsclient threads which are posting operations (say mkdir and rmdir) against the active NN.

        I don't know if it is possible to test these scenarios with mockito or not.

        Show
        Hari Mankude added a comment - Todd, Can you add tests or verify that FORCEMANUAL option works with automatic failover? Specifically, the races to consider are 1. User tries the forcemanual option on NN1 with transitionToActive() when NN1 is transiting to a standby state (ZKFC1 lost the ephemeral node) and NN2 is becomingActive() 2. User tries the forcemanual option on NN1 with transitionToStandby() when it is transiting to an active state. 3. Both these situations when there are other dfsclient threads which are posting operations (say mkdir and rmdir) against the active NN. I don't know if it is possible to test these scenarios with mockito or not.
        Hide
        Aaron T. Myers added a comment -

        Patch looks great to me, Todd. Just a few little nits:

        1. Add a comment to HAAdmin#addFailoverOpts explaining why FORCEMANUAL isn't being added there.
        2. Nit: "String []argv = cmd.getArgs();" - put a space between "[]" and "argv"
        3. Not quite sure of the purpose of these changes:
          -  public void monitorHealth() throws HealthCheckFailedException,
          +  public void monitorHealth()
          +                              throws HealthCheckFailedException,
                                                AccessControlException,
                                                IOException;
          

          and:

          -  public HAServiceStatus getServiceStatus() throws AccessControlException,
          +  public HAServiceStatus getServiceStatus()
          +                                            throws AccessControlException,
                                                              IOException;
          
        4. There's an errant whitespace change in TestZKFailoverControllerStress.
        5. Not quite sure what the point of this change is:
          -          return nn.getRpcServer().getServiceStatus().getState() == state;
          +          HAServiceStatus curStatus = nn.getRpcServer().getServiceStatus();
          +          return curStatus.getState() == state;
          
        6. There's a few spots in this patch where you just do some unrelated cleanup of calls to transitionToActive() in the tests. Totally fine, but you might want to mention this in a JIRA comment in the future if you're not going to break them out into a separate patch.
        7. Some goofy javadoc comment closing:
          +  /**
          +   * Test that, even if automatic HA is enabled, the monitoring operations
          +   * still function correctly.
          +   *    */
          

        Side note: Really good javadoc cleanup on NAMENODE_SPECIFIC_KEYS. Good on you.

        Show
        Aaron T. Myers added a comment - Patch looks great to me, Todd. Just a few little nits: Add a comment to HAAdmin#addFailoverOpts explaining why FORCEMANUAL isn't being added there. Nit: "String []argv = cmd.getArgs();" - put a space between "[]" and "argv" Not quite sure of the purpose of these changes: - public void monitorHealth() throws HealthCheckFailedException, + public void monitorHealth() + throws HealthCheckFailedException, AccessControlException, IOException; and: - public HAServiceStatus getServiceStatus() throws AccessControlException, + public HAServiceStatus getServiceStatus() + throws AccessControlException, IOException; There's an errant whitespace change in TestZKFailoverControllerStress. Not quite sure what the point of this change is: - return nn.getRpcServer().getServiceStatus().getState() == state; + HAServiceStatus curStatus = nn.getRpcServer().getServiceStatus(); + return curStatus.getState() == state; There's a few spots in this patch where you just do some unrelated cleanup of calls to transitionToActive() in the tests. Totally fine, but you might want to mention this in a JIRA comment in the future if you're not going to break them out into a separate patch. Some goofy javadoc comment closing: + /** + * Test that, even if automatic HA is enabled, the monitoring operations + * still function correctly. + * */ Side note: Really good javadoc cleanup on NAMENODE_SPECIFIC_KEYS. Good on you.
        Hide
        Todd Lipcon added a comment -

        New rev of the patch:

        • adds a "-forcemanual" flag to the manual failover controller, which allows you to use it even if auto HA is enabled. This is for experts only, and has warnings indicating as much
        • adds test cases, etc
        • renames the HARequestInfoProto to HAStateChangeRequestInfoProto, and makes it only apply to the transitionToX() calls. I figure it makes sense to allow the getServiceState/monitorHealth calls to be done by anyone regardless of the HA mode

        I'll do one more pass of testing and javadoccing tomorrow, but this should mostly be ready to review.

        Show
        Todd Lipcon added a comment - New rev of the patch: adds a "-forcemanual" flag to the manual failover controller, which allows you to use it even if auto HA is enabled. This is for experts only, and has warnings indicating as much adds test cases, etc renames the HARequestInfoProto to HAStateChangeRequestInfoProto, and makes it only apply to the transitionToX() calls. I figure it makes sense to allow the getServiceState/monitorHealth calls to be done by anyone regardless of the HA mode I'll do one more pass of testing and javadoccing tomorrow, but this should mostly be ready to review.
        Hide
        Todd Lipcon added a comment -

        Can we make this simpler by not supporting manual failover?

        Yes. That's the current version of the patch - if you enable automatic, then you don't get manual. But, as described in the design doc in HDFS-2185, there are good reasons to support manually initiated failover even when the system is set up for automatic. That will be done separately as a followup. This patch is just meant for safety purposes.

        Another advantage of this patch is that we can amend the "start-dfs.sh" script to automatically start ZKFCs when the conf flag is present. My next rev will do this.

        Show
        Todd Lipcon added a comment - Can we make this simpler by not supporting manual failover? Yes. That's the current version of the patch - if you enable automatic, then you don't get manual. But, as described in the design doc in HDFS-2185 , there are good reasons to support manually initiated failover even when the system is set up for automatic. That will be done separately as a followup. This patch is just meant for safety purposes. Another advantage of this patch is that we can amend the "start-dfs.sh" script to automatically start ZKFCs when the conf flag is present. My next rev will do this.
        Hide
        Todd Lipcon added a comment -

        Hi Hari. JIRA doesn't support cross-project subtasks. You can use the following filter to track all auto-HA related tasks: https://issues.apache.org/jira/secure/IssueNavigator.jspa?mode=hide&requestId=12319482 (let me know if the link doesn't work, I think I set it up to be world-shared)

        Show
        Todd Lipcon added a comment - Hi Hari. JIRA doesn't support cross-project subtasks. You can use the following filter to track all auto-HA related tasks: https://issues.apache.org/jira/secure/IssueNavigator.jspa?mode=hide&requestId=12319482 (let me know if the link doesn't work, I think I set it up to be world-shared)
        Hide
        Hari Mankude added a comment -

        Can we make this simpler by not supporting manual failover?

        Show
        Hari Mankude added a comment - Can we make this simpler by not supporting manual failover?
        Hide
        Hari Mankude added a comment -

        Can you make this a sub-task of hdfs-3042 so that we can track this?

        Show
        Hari Mankude added a comment - Can you make this a sub-task of hdfs-3042 so that we can track this?
        Hide
        Aaron T. Myers added a comment -

        That all sounds good to me, Todd. Thanks for explaining.

        Show
        Aaron T. Myers added a comment - That all sounds good to me, Todd. Thanks for explaining.
        Hide
        Todd Lipcon added a comment -

        I added a struct because I figured we may want to add more fields in the future that fulfill a similar purpose. For example, I can imagine that a failover event might be tagged with a string "reason" field – sort of like how the Linux "shutdown" command can take a message. This would just be logged on the NN side. Another example is the proposed fix for HADOOP-8217, where we need to add an epoch number to the failover requests to get an ordering of failover events.

        Show
        Todd Lipcon added a comment - I added a struct because I figured we may want to add more fields in the future that fulfill a similar purpose. For example, I can imagine that a failover event might be tagged with a string "reason" field – sort of like how the Linux "shutdown" command can take a message. This would just be logged on the NN side. Another example is the proposed fix for HADOOP-8217 , where we need to add an epoch number to the failover requests to get an ordering of failover events.
        Hide
        Aaron T. Myers added a comment -

        Add a new flag dfs.ha.automatic-failover.enabled, which is set per-nameservice or globally
        Add a new RequestInfo structure as a parameter to all the HAServiceProtocol methods. This currently just has one field, which indicates what type of client the request is on behalf of. It can either be a user (manual CLI failover), ZKFC (auto failover), or USER_FORCE – indicating that it's a user who wants to avoid this safety check.
        In the NN, if auto-failover is enabled, disallow HA requests from users. If it's not enabled, disallow HA requests from ZKFCs.
        In the ZKFC, disallow startup if auto-failover is disabled

        All this makes a lot of sense to me, Todd. The only question I have is whether or not it really makes sense to add a RequestInfo structure, instead of just an extra parameter whose value is defined by a simple 3-element enum. What else do you envision being added to the RequestInfo structure?

        Show
        Aaron T. Myers added a comment - Add a new flag dfs.ha.automatic-failover.enabled, which is set per-nameservice or globally Add a new RequestInfo structure as a parameter to all the HAServiceProtocol methods. This currently just has one field, which indicates what type of client the request is on behalf of. It can either be a user (manual CLI failover), ZKFC (auto failover), or USER_FORCE – indicating that it's a user who wants to avoid this safety check. In the NN, if auto-failover is enabled, disallow HA requests from users. If it's not enabled, disallow HA requests from ZKFCs. In the ZKFC, disallow startup if auto-failover is disabled All this makes a lot of sense to me, Todd. The only question I have is whether or not it really makes sense to add a RequestInfo structure, instead of just an extra parameter whose value is defined by a simple 3-element enum. What else do you envision being added to the RequestInfo structure?
        Hide
        Todd Lipcon added a comment -

        I should of course note that this is only the first step. After this is committed, the idea is to make the haadmin -failover command line work in coordination with the ZKFC daemons to do a controlled failover. But in the meantime, it's disallowed so that users can't shoot themselves in the foot by running this command.

        Show
        Todd Lipcon added a comment - I should of course note that this is only the first step. After this is committed, the idea is to make the haadmin -failover command line work in coordination with the ZKFC daemons to do a controlled failover. But in the meantime, it's disallowed so that users can't shoot themselves in the foot by running this command.
        Hide
        Todd Lipcon added a comment -

        Here's a preliminary patch for this issue. It still needs a little cleanup/javadoc/etc, but wanted to make sure people agree this is the right direction before I finish it up.

        Here's a summary of the change:

        • Add a new flag dfs.ha.automatic-failover.enabled, which is set per-nameservice or globally
        • Add a new RequestInfo structure as a parameter to all the HAServiceProtocol methods. This currently just has one field, which indicates what type of client the request is on behalf of. It can either be a user (manual CLI failover), ZKFC (auto failover), or USER_FORCE – indicating that it's a user who wants to avoid this safety check.
        • In the NN, if auto-failover is enabled, disallow HA requests from users. If it's not enabled, disallow HA requests from ZKFCs.
        • In the ZKFC, disallow startup if auto-failover is disabled

        In addition to the unit tests, I ran the following manual tests, on a secure cluster.

        1) did not enable auto failover config
        2) ran failovers using haadmin command, succesfully
        3) Tried to run bin/hdfs zkfc, got expected error:

        12/04/05 20:53:38 INFO tools.DFSZKFailoverController: Failover controller configured for NameNode nameserviceId1.nn1
        12/04/05 20:53:38 FATAL ha.ZKFailoverController: Automatic failover is not enabled for NameNode at todd-w510/127.0.0.1:8021. Please ensure that automatic failover is enabled in the configuration before running the ZK failover controller.
        

        4) Enabled auto-failover in my config, but left NNs running. Got error when the ZKFC tried to make the local node active. TODO in future JIRA: it could abort at this point, when it sees an AccessControlException, since it's indicative of misconfiguration.

        5) Restarted NNs, so they picked up the new config.
        6) Ran ZKFC, it successfully made one of the NNs active. Verified automatic failover behavior by killing one of the NNs.
        7) Ran manual failover command, got expected error:

        12/04/05 20:58:31 ERROR ha.FailoverController: Unable to get service state for NameNode at todd-w510/127.0.0.1:8022: Manual HA control for this NameNode is disallowed, because automatic HA is enabled.
        

        Open questions: should we allow the non-mutative commands like monitorHealth and getServiceState to run when auto-failover is configured? My thinking is probably. If so, should we keep around the RequestInfo parameter on those calls? Or only include RequestInfo for the calls that trigger transitions?

        Show
        Todd Lipcon added a comment - Here's a preliminary patch for this issue. It still needs a little cleanup/javadoc/etc, but wanted to make sure people agree this is the right direction before I finish it up. Here's a summary of the change: Add a new flag dfs.ha.automatic-failover.enabled, which is set per-nameservice or globally Add a new RequestInfo structure as a parameter to all the HAServiceProtocol methods. This currently just has one field, which indicates what type of client the request is on behalf of. It can either be a user (manual CLI failover), ZKFC (auto failover), or USER_FORCE – indicating that it's a user who wants to avoid this safety check. In the NN, if auto-failover is enabled, disallow HA requests from users. If it's not enabled, disallow HA requests from ZKFCs. In the ZKFC, disallow startup if auto-failover is disabled In addition to the unit tests, I ran the following manual tests, on a secure cluster. 1) did not enable auto failover config 2) ran failovers using haadmin command, succesfully 3) Tried to run bin/hdfs zkfc, got expected error: 12/04/05 20:53:38 INFO tools.DFSZKFailoverController: Failover controller configured for NameNode nameserviceId1.nn1 12/04/05 20:53:38 FATAL ha.ZKFailoverController: Automatic failover is not enabled for NameNode at todd-w510/127.0.0.1:8021. Please ensure that automatic failover is enabled in the configuration before running the ZK failover controller. 4) Enabled auto-failover in my config, but left NNs running. Got error when the ZKFC tried to make the local node active. TODO in future JIRA: it could abort at this point, when it sees an AccessControlException, since it's indicative of misconfiguration. 5) Restarted NNs, so they picked up the new config. 6) Ran ZKFC, it successfully made one of the NNs active. Verified automatic failover behavior by killing one of the NNs. 7) Ran manual failover command, got expected error: 12/04/05 20:58:31 ERROR ha.FailoverController: Unable to get service state for NameNode at todd-w510/127.0.0.1:8022: Manual HA control for this NameNode is disallowed, because automatic HA is enabled. Open questions: should we allow the non-mutative commands like monitorHealth and getServiceState to run when auto-failover is configured? My thinking is probably. If so, should we keep around the RequestInfo parameter on those calls? Or only include RequestInfo for the calls that trigger transitions?

          People

          • Assignee:
            Todd Lipcon
            Reporter:
            Todd Lipcon
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development