ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-1031

Introduce virtual cluster IP and start that cluster IP on the host running ZK leader

    Details

    • Type: Wish Wish
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 3.3.3
    • Fix Version/s: 4.0.0
    • Component/s: leaderElection, quorum
    • Labels:
      None

      Description

      It would be useful to enable a way to specify a virtual (floating) IP for the ZK cluster (say in zoo.cfg). The ZK leader will start this IP on one of its interfaces. If the leadership changes, the cluster IP will be taken over by the new leader. This IP can be used to identify the ZK leader and send administrative commands/query to the leader. For example,

      • a ZK client can get the list of ZK servers in the configuration by sending a request to the server running this IP address. The client just needs to know one IP address. Availability of cluster automatically ensures availability of
        the IP address.
      • To reconfigure ZK configuration, a client can send reconfig request to the server on this IP and keep retrying until the request succeeds or fails.

      Implementation issues:
      1. The old ZK leader that has lost leadership should be able to somehow give up the virtual IP address. Otherwise, it could lead to collisions. One solution is to self reboot. A system property can be used to specify ways to unplumb the cluster IP
      2. Cross-platform support.
      3. Refreshing ARP caches

        Activity

        Hide
        Ruslan Dautkhanov added a comment -

        >> Implementation issues:
        >> 1. The old ZK leader that has lost leadership should be able to somehow give up the virtual IP address. Otherwise, it could >> lead to collisions. One solution is to self reboot. A system property can be used to specify ways to unplumb the cluster IP

        Something like Single Client Access Name ("SCAN" from Oracle terminology) could solve this issue.
        The way SCAN works - it's just a DNS name that resolves to three (or more) virtual IP addresses.
        So it gives both HA and load balancing.

        Each of the ZK servers may take one virtual IP address, and so ZK client connects to first IP in the list (if client will
        be aware that DNS can respond multiple IPs to one DNS name), if it does not work - try next one and so on.
        If ZK client will just rely on OS to do DNS name resolution, then it will still work as DNS servers or OS will give
        IPs in round-robin fashion.

        Now if ZK client always must talk only to active leader, then ZK server that client happened to talk to is not leader,
        then ZK server can just send response who is the active leader.

        It works very similar in Oracle database "clouds" (or RAC clusters) - SCAN listeners who received connection request redirect clients to a database instance that is less loaded (in case of ZK it might be just ZK-leader or any other
        logic that makes sense for ZK).

        Oracle Single Access Client Name explained - http://www.oracle.com/technetwork/products/clustering/overview/scan-129069.pdf

        Show
        Ruslan Dautkhanov added a comment - >> Implementation issues: >> 1. The old ZK leader that has lost leadership should be able to somehow give up the virtual IP address. Otherwise, it could >> lead to collisions. One solution is to self reboot. A system property can be used to specify ways to unplumb the cluster IP Something like Single Client Access Name ("SCAN" from Oracle terminology) could solve this issue. The way SCAN works - it's just a DNS name that resolves to three (or more) virtual IP addresses. So it gives both HA and load balancing. Each of the ZK servers may take one virtual IP address, and so ZK client connects to first IP in the list (if client will be aware that DNS can respond multiple IPs to one DNS name), if it does not work - try next one and so on. If ZK client will just rely on OS to do DNS name resolution, then it will still work as DNS servers or OS will give IPs in round-robin fashion. Now if ZK client always must talk only to active leader, then ZK server that client happened to talk to is not leader, then ZK server can just send response who is the active leader. It works very similar in Oracle database "clouds" (or RAC clusters) - SCAN listeners who received connection request redirect clients to a database instance that is less loaded (in case of ZK it might be just ZK-leader or any other logic that makes sense for ZK). Oracle Single Access Client Name explained - http://www.oracle.com/technetwork/products/clustering/overview/scan-129069.pdf
        Hide
        Neil Fahey added a comment -

        The following 2 projects in combination can provide this type of capability they might provide some reference points

        http://www.spread.org/apps.html
        http://www.backhand.org/wackamole/

        Show
        Neil Fahey added a comment - The following 2 projects in combination can provide this type of capability they might provide some reference points http://www.spread.org/apps.html http://www.backhand.org/wackamole/
        Hide
        Alexander Shraer added a comment -

        virtual IP might not be easy to implement if the real ip can switch subnets - suppose a new leader is in a different subnet than the previous one (and also different from the subnet of the virtual IP). There seem to be types of virtual IP that support this, but I'm not sure how general this is.

        Show
        Alexander Shraer added a comment - virtual IP might not be easy to implement if the real ip can switch subnets - suppose a new leader is in a different subnet than the previous one (and also different from the subnet of the virtual IP). There seem to be types of virtual IP that support this, but I'm not sure how general this is.
        Hide
        Vishal Kher added a comment -

        Yes, but which follower should start the IP? There needs to be some level of agreement (like election). So it is easier to run it on the leader. Also, the node running this IP should not be behind the leader. Otherwise, fetching administartive info (e.g., configruation info) may return stale results.

        Show
        Vishal Kher added a comment - Yes, but which follower should start the IP? There needs to be some level of agreement (like election). So it is easier to run it on the leader. Also, the node running this IP should not be behind the leader. Otherwise, fetching administartive info (e.g., configruation info) may return stale results.
        Hide
        Benjamin Reed added a comment -

        a leader needs to be running on the virtual IP. it just needs to be a machine with an accurate view of the system. obviously the leader has such a view, but so do the followers of the leader.

        Show
        Benjamin Reed added a comment - a leader needs to be running on the virtual IP. it just needs to be a machine with an accurate view of the system. obviously the leader has such a view, but so do the followers of the leader.
        Vishal Kher created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Vishal Kher
          • Votes:
            2 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:

              Development