Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.3.0
    • Fix Version/s: 3.5.0
    • Component/s: quorum
    • Labels:
      None

      Description

      Requirement: functionality that will reconfigure
      a OBSERVER to become a voting member and vice versa.

      Example of usage:

      1. Maintain the Quorum size without changing the cluster size - in a 5
      node cluster with 2 observers, I decide to decommission a voting
      member. Then, I would like to configure one of my observers to be a
      follower without any down time.

      2. Added a new server to the cluster that has better resources than
      one of the voting peers. Make the new node as voting peer and the old
      one as observer.

      3. Reduce the size of voting member for performance reasons.

      Fix to ZOOKEEPER-107 might automatically give us this functionality.
      It will be good to confirm that, and if needed, highlight work
      that might be needed in addition to ZOOKEEPER-107.

        Issue Links

          Activity

          Hide
          nmorado Nomar Morado added a comment -

          I am looking at QuorumPeer and trying to figure out how to get ZooKeeper to be able to execute the reconfig API to change role of a peer.

          It does not seem straightforward.

          I have 4 nodes - one is observer - and in some cases i wanted to be able to promote it to participant and vice versa.

          Can someone please help?

          Thanks.

          Show
          nmorado Nomar Morado added a comment - I am looking at QuorumPeer and trying to figure out how to get ZooKeeper to be able to execute the reconfig API to change role of a peer. It does not seem straightforward. I have 4 nodes - one is observer - and in some cases i wanted to be able to promote it to participant and vice versa. Can someone please help? Thanks.
          Hide
          john.jian.fang Jian Fang added a comment -

          BTW, I wonder if it is possible for zookeeper to manage the role change by itself in the future. For example, I can specify that I only expect 5 participants in normal operations. The zookeeper peers can make an election on which peers should be participants and which should be observers by themselves. Also do the election once participants are lost. In this way, I can simply start up 8 zookeeper instances, for instance. And I don't need any management tool or code to babysit the zookeeper quorum. I think this should dramatically reduce the operational overhead especially when people run zookeeper in cloud because a cloud instance could be gone forever and at any time.

          Show
          john.jian.fang Jian Fang added a comment - BTW, I wonder if it is possible for zookeeper to manage the role change by itself in the future. For example, I can specify that I only expect 5 participants in normal operations. The zookeeper peers can make an election on which peers should be participants and which should be observers by themselves. Also do the election once participants are lost. In this way, I can simply start up 8 zookeeper instances, for instance. And I don't need any management tool or code to babysit the zookeeper quorum. I think this should dramatically reduce the operational overhead especially when people run zookeeper in cloud because a cloud instance could be gone forever and at any time.
          Hide
          john.jian.fang Jian Fang added a comment -

          Alexander, thanks for your clarification. I think I should use more participants in odd number to tolerant multiple node failures before only two participants are left.

          Show
          john.jian.fang Jian Fang added a comment - Alexander, thanks for your clarification. I think I should use more participants in odd number to tolerant multiple node failures before only two participants are left.
          Hide
          shralex Alexander Shraer added a comment -

          I mean leader proposals

          Show
          shralex Alexander Shraer added a comment - I mean leader proposals
          Hide
          shralex Alexander Shraer added a comment -

          Yeah I was talking just about participants. Only participants are involved in voting. So what I mean is that having 6 participants doesn't add to the fault tolerance compared to 5 participants. In case you have X servers you need to decide how many concurrent failures you want to tolerate (t), make 2t+1 servers participants and the rest X - (2t + 1) observers. BTW when a server fails its still in the ensemble logically.

          The "voting" in zookeeper are just ACKs of server proposals.

          Show
          shralex Alexander Shraer added a comment - Yeah I was talking just about participants. Only participants are involved in voting. So what I mean is that having 6 participants doesn't add to the fault tolerance compared to 5 participants. In case you have X servers you need to decide how many concurrent failures you want to tolerate (t), make 2t+1 servers participants and the rest X - (2t + 1) observers. BTW when a server fails its still in the ensemble logically. The "voting" in zookeeper are just ACKs of server proposals.
          Hide
          john.jian.fang Jian Fang added a comment -

          Thanks Hongchao and Alexander for your quick response.

          Alexander, I am still a bit confused while you said 5 servers can tolerate a failure of 2 and 6 servers can tolerate only 2 failures. Do the 5 servers and 6 servers include observers as well? From documents, seems only participants are involved in the voting, right? Do you mean 5 participants and 6 participants? If that is true, do you suggest I set all 6 peers to be participants instead of 3 participants and 3 observers?

          Also, let us consider we have 4 participants, do we have a scenario that 2 peers have the same vote, but different from the other two? Or this is not possible because zookeeper always uses a leader?

          Thanks again.

          Show
          john.jian.fang Jian Fang added a comment - Thanks Hongchao and Alexander for your quick response. Alexander, I am still a bit confused while you said 5 servers can tolerate a failure of 2 and 6 servers can tolerate only 2 failures. Do the 5 servers and 6 servers include observers as well? From documents, seems only participants are involved in the voting, right? Do you mean 5 participants and 6 participants? If that is true, do you suggest I set all 6 peers to be participants instead of 3 participants and 3 observers? Also, let us consider we have 4 participants, do we have a scenario that 2 peers have the same vote, but different from the other two? Or this is not possible because zookeeper always uses a leader? Thanks again.
          Hide
          shralex Alexander Shraer added a comment -

          also see ZOOKEEPER-1660 for documentation.

          > If I understand correctly, zookeeper must keep the number of participants to be an odd number to avoid a tie
          > condition.

          no, it doesn't matter for zookeeper whether its odd or even. Its recommended to keep it odd since zookeeper needs a majority of servers up and running to work, so for example if you have 5 servers you can tolerate a failure of 2. But if you have 6 servers you can also tolerate only 2 failures, so in terms of fault tolerance having 6 vs having 5 doesn't give you anything.

          If one participant out of 3 is down you can't tolerate any more failures - any other failure will bring your ensemble down. You can do a reconfig in which you remove the faulty server and change an observer into a participant.

          In the docs notice the corner case explained in Sec 4.2.6 (Changing an observer into a follower) - in some cases you may need to first remove the observer (logically, using a reconfig) and then add it back as a participant. Usually you can do just one command though.

          Show
          shralex Alexander Shraer added a comment - also see ZOOKEEPER-1660 for documentation. > If I understand correctly, zookeeper must keep the number of participants to be an odd number to avoid a tie > condition. no, it doesn't matter for zookeeper whether its odd or even. Its recommended to keep it odd since zookeeper needs a majority of servers up and running to work, so for example if you have 5 servers you can tolerate a failure of 2. But if you have 6 servers you can also tolerate only 2 failures, so in terms of fault tolerance having 6 vs having 5 doesn't give you anything. If one participant out of 3 is down you can't tolerate any more failures - any other failure will bring your ensemble down. You can do a reconfig in which you remove the faulty server and change an observer into a participant. In the docs notice the corner case explained in Sec 4.2.6 (Changing an observer into a follower) - in some cases you may need to first remove the observer (logically, using a reconfig) and then add it back as a participant. Usually you can do just one command though.
          Hide
          hdeng Hongchao Deng added a comment -

          https://github.com/apache/zookeeper/blob/trunk/src/java/test/org/apache/zookeeper/test/ReconfigTest.java#L485

          You just need to append the role (e.g. "observer" or "participant") to the string and ZK will parse it.

          Show
          hdeng Hongchao Deng added a comment - https://github.com/apache/zookeeper/blob/trunk/src/java/test/org/apache/zookeeper/test/ReconfigTest.java#L485 You just need to append the role (e.g. "observer" or "participant") to the string and ZK will parse it.
          Hide
          john.jian.fang Jian Fang added a comment -

          I am using 3.5.0-alpha from the central Maven repository, but it is not clear to me which API I should call to make this change. I saw some test cases in Zookeeper to create a ZK client and then call the following method in Zookeeper,

          public byte[] reconfig(List<String> joiningServers, List<String> leavingServers, List<String> newMembers, long fromConfig, Stat stat) throws KeeperException, InterruptedException

          How do I specify the roles for the peers then?

          Thanks.

          Show
          john.jian.fang Jian Fang added a comment - I am using 3.5.0-alpha from the central Maven repository, but it is not clear to me which API I should call to make this change. I saw some test cases in Zookeeper to create a ZK client and then call the following method in Zookeeper, public byte[] reconfig(List<String> joiningServers, List<String> leavingServers, List<String> newMembers, long fromConfig, Stat stat) throws KeeperException, InterruptedException How do I specify the roles for the peers then? Thanks.
          Hide
          hdeng Hongchao Deng added a comment -

          Jian Fang

          You can do this in 3.5.X versions or later.

          Show
          hdeng Hongchao Deng added a comment - Jian Fang You can do this in 3.5.X versions or later.
          Hide
          john.jian.fang Jian Fang added a comment -

          Hi,

          I have a use case to dynamically change a role of a peer from observer to participant. Here is my scenario.

          I run the zookeeper quorum in cloud and some nodes may be terminated by the cloud. The lost node could be replaced with a new node after some time. If I understand correctly, zookeeper must keep the number of participants to be an odd number to avoid a tie condition. For example, assume that I have 3 participants and 3 observers. If I lost one participant, usually how long could zookeeper survive with only two 2 participants? To solve the problem with two participants, I am thinking of changing one observer to be a participant to keep the number of participants to be 3. Would this be a valid use case for zookeeper?

          Since this ticket has already been closed, I like to confirm if this is doable in 3.5.0 and what is the API to change the roles programmatically if it is doable.

          Thanks in advance.

          Show
          john.jian.fang Jian Fang added a comment - Hi, I have a use case to dynamically change a role of a peer from observer to participant. Here is my scenario. I run the zookeeper quorum in cloud and some nodes may be terminated by the cloud. The lost node could be replaced with a new node after some time. If I understand correctly, zookeeper must keep the number of participants to be an odd number to avoid a tie condition. For example, assume that I have 3 participants and 3 observers. If I lost one participant, usually how long could zookeeper survive with only two 2 participants? To solve the problem with two participants, I am thinking of changing one observer to be a participant to keep the number of participants to be 3. Would this be a valid use case for zookeeper? Since this ticket has already been closed, I like to confirm if this is doable in 3.5.0 and what is the API to change the roles programmatically if it is doable. Thanks in advance.
          Hide
          shralex Alexander Shraer added a comment -

          This was solved as part of ZOOKEEPER-107.

          Show
          shralex Alexander Shraer added a comment - This was solved as part of ZOOKEEPER-107 .
          Hide
          rakeshr Rakesh R added a comment -

          Hi Alexander Shraer, From the code I could see reconfig request is changing the learnertype and restarting the leader election. Now voting member can change to non-voting member and vice-versa.

          Am I missing anything, Is this still a problem?

          Show
          rakeshr Rakesh R added a comment - Hi Alexander Shraer , From the code I could see reconfig request is changing the learnertype and restarting the leader election. Now voting member can change to non-voting member and vice-versa. Am I missing anything, Is this still a problem?
          Hide
          vishalmlst Vishal Kher added a comment -

          Hi Alex,

          Agreed. Thats why I suggested that in the reconfig algorithm leader(M) should wait to receive ack from a quorum of followers in M' (instead of just quorum of M'). The same way leader(M) waits to receive ack from followers in M. So it is a small change in the algorithm. What do you think?

          Any feedback on my earlier comment regarding restarting the process and difficulty of implementation?

          Thanks.

          Show
          vishalmlst Vishal Kher added a comment - Hi Alex, Agreed. Thats why I suggested that in the reconfig algorithm leader(M) should wait to receive ack from a quorum of followers in M' (instead of just quorum of M'). The same way leader(M) waits to receive ack from followers in M. So it is a small change in the algorithm. What do you think? Any feedback on my earlier comment regarding restarting the process and difficulty of implementation? Thanks.
          Hide
          shralex Alexander Shraer added a comment -

          What I mean is that when an Observer in M needs to become a follower in M', we need it to participate and ack during the reconfiguration to ensure that state is transferred properly to M'.

          Show
          shralex Alexander Shraer added a comment - What I mean is that when an Observer in M needs to become a follower in M', we need it to participate and ack during the reconfiguration to ensure that state is transferred properly to M'.
          Hide
          vishalmlst Vishal Kher added a comment -

          Hi Alex,

          The reconfiguration algorithm that I've started implementing assumes that new servers joining the system are connected as non-voting followers - this allows them to receive operations during the reconfiguration as they are proposed, and then they can remain followers in the new configuration. Your proposal for reconfiguration also assumed that you need to collect a quorum of acks from M', which you can't do if members of M' are Observers in M.

          Even if we consider OBSERVERs in M', do we need to wait to get a ack from them? We can just wait for quorum of FOLLOWERs in M' to ack.

          AFAIK, in order to convert an observer to a follower the process would need to be terminated and a new follower process created. So this is not about leader election (the intention is not to run leader election at all, unless a failure occurs during reconfiguration).

          I am not clear why this is needed. Can you elaborate on this?
          If a follower becomes a observer (or viceversa), it will notice that once the membership is changed. Then it can shutdown its Learner object and go back to the LOOKING state. It will then rejoin the cluster as either FOLLOWER/OBSERVER.

          I still think that with small changes to 107 we can achieve this feature as well.

          -Vishal

          Show
          vishalmlst Vishal Kher added a comment - Hi Alex, The reconfiguration algorithm that I've started implementing assumes that new servers joining the system are connected as non-voting followers - this allows them to receive operations during the reconfiguration as they are proposed, and then they can remain followers in the new configuration. Your proposal for reconfiguration also assumed that you need to collect a quorum of acks from M', which you can't do if members of M' are Observers in M. Even if we consider OBSERVERs in M', do we need to wait to get a ack from them? We can just wait for quorum of FOLLOWERs in M' to ack. AFAIK, in order to convert an observer to a follower the process would need to be terminated and a new follower process created. So this is not about leader election (the intention is not to run leader election at all, unless a failure occurs during reconfiguration). I am not clear why this is needed. Can you elaborate on this? If a follower becomes a observer (or viceversa), it will notice that once the membership is changed. Then it can shutdown its Learner object and go back to the LOOKING state. It will then rejoin the cluster as either FOLLOWER/OBSERVER. I still think that with small changes to 107 we can achieve this feature as well. -Vishal
          Hide
          shralex Alexander Shraer added a comment -

          Hi Vishal,

          The reconfiguration algorithm that I've started implementing assumes that new servers joining the system are connected as non-voting followers - this allows them to receive operations during the reconfiguration as they are proposed, and then they can remain followers in the new configuration. Your proposal for reconfiguration also assumed that you need to collect a quorum of acks from M', which you can't do if members of M' are Observers in M.

          AFAIK, in order to convert an observer to a follower the process would need to be terminated and a new follower process created. So this is not about leader election (the intention is not to run leader election at all, unless a failure occurs during reconfiguration).

          Alex

          Show
          shralex Alexander Shraer added a comment - Hi Vishal, The reconfiguration algorithm that I've started implementing assumes that new servers joining the system are connected as non-voting followers - this allows them to receive operations during the reconfiguration as they are proposed, and then they can remain followers in the new configuration. Your proposal for reconfiguration also assumed that you need to collect a quorum of acks from M', which you can't do if members of M' are Observers in M. AFAIK, in order to convert an observer to a follower the process would need to be terminated and a new follower process created. So this is not about leader election (the intention is not to run leader election at all, unless a failure occurs during reconfiguration). Alex
          Hide
          vishalmlst Vishal Kher added a comment -

          Hi Alex,

          In terms of the current implementation, it seems to be much more complex to convert an OBSERVER to a follower and vice versa.
          In order to do that the process would have to be terminated and a new follower process created (a similar change is needed
          to convert a leader to a follower, but not to convert a non-voting follower to a follower and vice versa).

          If M' (in ZOOKEEPER-107) represents generic ZK configuration, why cannot 107 directly resolve this jira? After leader election, a peer creates leader/follower/observer instances depending upon its current role. Considering this, why do you think that the implementation will be complex? Also, why do we need to terminate the process if QuorumPeer is creating the correct instances after FLE?

          Thanks
          -Vishal

          Show
          vishalmlst Vishal Kher added a comment - Hi Alex, In terms of the current implementation, it seems to be much more complex to convert an OBSERVER to a follower and vice versa. In order to do that the process would have to be terminated and a new follower process created (a similar change is needed to convert a leader to a follower, but not to convert a non-voting follower to a follower and vice versa). If M' (in ZOOKEEPER-107 ) represents generic ZK configuration, why cannot 107 directly resolve this jira? After leader election, a peer creates leader/follower/observer instances depending upon its current role. Considering this, why do you think that the implementation will be complex? Also, why do we need to terminate the process if QuorumPeer is creating the correct instances after FLE? Thanks -Vishal
          Hide
          shralex Alexander Shraer added a comment -

          Hi Vishal,

          With regard to ZOOKEEPER-107, the current plan is as follows: when adding a server,
          it will first be connected to the old configuration as a non-voting follower, and then become a follower.
          In terms of the current implementation, it seems to be much more complex to convert an OBSERVER to a follower and vice versa.
          In order to do that the process would have to be terminated and a new follower process created (a similar change is needed
          to convert a leader to a follower, but not to convert a non-voting follower to a follower and vice versa).

          Regards,
          Alex

          Show
          shralex Alexander Shraer added a comment - Hi Vishal, With regard to ZOOKEEPER-107 , the current plan is as follows: when adding a server, it will first be connected to the old configuration as a non-voting follower, and then become a follower. In terms of the current implementation, it seems to be much more complex to convert an OBSERVER to a follower and vice versa. In order to do that the process would have to be terminated and a new follower process created (a similar change is needed to convert a leader to a follower, but not to convert a non-voting follower to a follower and vice versa). Regards, Alex

            People

            • Assignee:
              shralex Alexander Shraer
              Reporter:
              vishalmlst Vishal Kher
            • Votes:
              2 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development