Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-1856

zookeeper C-client can fail to switch from a dead server in a 3+ server ensemble if the client only has a 2 server list.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • None
    • c client
    • None

    Description

      If a client has a 2 server list, and is currently connected to the last server in that list, and that server then goes offline, the addrvec_next() call handle_error() will push the client to the start of the list and terminate the connection.

      Then, the zoo_cycle_next_server() call in zookeeper_interest will be called in response to the connection failure, and the client will cycle back to the failed server.

      In this way, a client who has a list of only 2 servers can get stuck on the one failed server. This would only be an issue in an ensemble larger than 2 of course, because failing 1 out of 2 would lead to quorum loss anyway.

      There are other harmonics possible if every other server in the list is failed, but this is simplest to reproduce in a 3 server ensemble where the client only knows about 2 servers, one of which then fails. There are probably some elegant fixes here, but I think the simplest is to add a flag to track whether a server has been accessed before, and if it hasn't, don't call zoo_cycle_next_server() at the top of the zookeeper_interest() function.

      Attachments

        1. ZOOKEEPER-1856.patch
          1 kB
          Michi Mutsuzaki

        Issue Links

          Activity

            People

              michim Michi Mutsuzaki
              dutch Dutch T. Meyer
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: