Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-4022

ZooKeeper client session establishment deficiency

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.4.11, 3.4.13, 3.4.14
    • None
    • java client
    • None

    Description

      Here I want to share some deficiency of ZooKeeper client connection deficiency we debugged and met in large scale operation.

      • Dead IP. Let us say one Zookeeper server is dead. The connection string just has one DNS name that can be resolved to N IPs. For >= 3.4.13 ZooKeeper client, HostProvider would size() would be 1 and next() go resolve the single DNS name which contains one bad IP of N IPs. There is 1/N chance to use this dead host and can't establish TCP connection. Next try, you still have 1/N chance to hit the same IP. So on and so forth till application level timeout. For a large number of clients, there are bound to be some application level session establishment failure. Here we probably need make sure second round of try we will exclude the previously tried IP address.
      • TCP connection timeout. If the observer size is very large say M. The TCP connection timeout is set as initial session timeout divided by HostProvider.size(). If you have a hundred observers, this can cause cross data center TCP connection not being able to established. This is especially problem for ZooKeeper version < =3.4.11. As the ZooKeeper (client) would call DNS resolving first and one connection string (DNS name) can be mapped to 100 IP address. 
      • IP address of ZooKeeper server (observers) configuration can't be picked up by client timely: This issue is mostly affecting older version of Zookeeper. As they ZooKeeper (client) object would only resolve DNS name once upon construction. Say after running for a month, IT gradually adding more servers to the meet traffic growth. The newly added ip to the DNS name won't be seen. If IT retired some servers, the client would still try to connect to them and may cause session timeout etc. 

      Attachments

        Activity

          People

            Unassigned Unassigned
            kaisun2000 Kai Sun
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: