ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-1676

C client zookeeper_interest returning ZOK on Connection Loss

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Not A Problem
    • Affects Version/s: 3.4.3
    • Fix Version/s: 3.4.9, 3.5.2
    • Component/s: c client
    • Labels:
      None
    • Environment:

      All

    • Release Note:
      Ok, let's resolve this one and document it.

      Description

      I have a fairly simple single-threaded C client set up – single-threaded
      because we are embedding zk in the node.js/libuv runtime – which consists of
      the following algorithm:

      zookeeper_interest(); select();
      // perform zookeeper api calls
      zookeeper_process();

      I've noticed that zookeeper_interest in the C client never returns error if it
      is unable to connect to the zk server.

      From the spec of the zookeeper_interest API, I see that zookeeper_interest is
      supposed to return ZCONNECTIONLOSS when disconnected from the client. However,
      digging into the code, I see that the client is making a non-blocking connect
      call
      https://github.com/apache/zookeeper/blob/trunk/src/c/src/zookeeper.c#L1596-1613
      , and returning ZOK
      https://github.com/apache/zookeeper/blob/trunk/src/c/src/zookeeper.c#L1684

      If we assume that the server is not up, this will mean that the subsequent
      select() call would return 0, since the fd is not ready, and future calls to
      zookeeper_interest will always return 0 and not the expected ZCONNECTIONLOSS.
      Thus an upstream client will never be aware that the connection is lost.

      I don't think this is the expected behavior. I have temporarily patched the zk
      C client such that zookeeper_interest will return ZCONNECTIONLOSS if it's still
      unable to connect after session_timeout has been exceeded.

      I have included a patch for the client which fixes this for release 3.4.3 6b35e96 in this branch: https://github.com/yunong/zookeeper/tree/release-3.4.3-patched Here's the patch https://gist.github.com/yunong/efe869a0345867d54adf

      For more information, please see this email thread. http://mail-archives.apache.org/mod_mbox/zookeeper-dev/201211.mbox/%3C11A8E7C3-4DDE-45D8-ABEC-A8A4D32CF647@gmail.com%3E

        Issue Links

          Activity

          Hide
          Michi Mutsuzaki added a comment -

          Hi Yunong,

          Thank you for the patch. Could you please upload your patch here? This page describes how to submit patches:

          http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute

          Thanks!
          --Michi

          Show
          Michi Mutsuzaki added a comment - Hi Yunong, Thank you for the patch. Could you please upload your patch here? This page describes how to submit patches: http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute Thanks! --Michi
          Hide
          Patrick Hunt added a comment -

          Yunong Xiao this sounds like an important one to fix, can you submit a patch per Michi's comment?

          Show
          Patrick Hunt added a comment - Yunong Xiao this sounds like an important one to fix, can you submit a patch per Michi's comment?
          Hide
          Flavio Junqueira added a comment -

          I missed initially that the bug reports the use of the single-thread C client. I have checked the multi-thread one and the transition of states CONNECTED->CONNECTING->CONNECTED in my client code as I stop and restart the server works fine. I'll try to reproduce the problem with the single-thread client.

          Show
          Flavio Junqueira added a comment - I missed initially that the bug reports the use of the single-thread C client. I have checked the multi-thread one and the transition of states CONNECTED->CONNECTING->CONNECTED in my client code as I stop and restart the server works fine. I'll try to reproduce the problem with the single-thread client.
          Hide
          Flavio Junqueira added a comment -

          I'm not convinced that this is an issue. I have tested the single-threaded client and I get the state changes correctly via the default watcher. That's the right way to observe the state changes rather than relying on the return value of zookeeper_interest. A call to zookeeper_process will make sure to deliver the events. See this for an example:

          https://github.com/apache/zookeeper/blob/trunk/src/c/src/cli.c

          I could use a second pair of eyes to confirm this, perhaps Yunong Xiao if still around.

          Show
          Flavio Junqueira added a comment - I'm not convinced that this is an issue. I have tested the single-threaded client and I get the state changes correctly via the default watcher. That's the right way to observe the state changes rather than relying on the return value of zookeeper_interest . A call to zookeeper_process will make sure to deliver the events. See this for an example: https://github.com/apache/zookeeper/blob/trunk/src/c/src/cli.c I could use a second pair of eyes to confirm this, perhaps Yunong Xiao if still around.
          Hide
          Michael Han added a comment -

          Flavio Junqueira I can confirm that using watcher can get expected connection events. Here is my test case using single thread ZK library https://goo.gl/hql4B1. I've tried to start / stop server before / after my tests run and the state changes from watcher are expected. The return code of zookeeper_interest though is not reliable (for example, zookeeper_interest will return OK when the server is dead). Maybe we should add some documentations on the zookeeper_interest to indicate the return code of it should NOT be used for deciding connection status between client and server.

          Show
          Michael Han added a comment - Flavio Junqueira I can confirm that using watcher can get expected connection events. Here is my test case using single thread ZK library https://goo.gl/hql4B1 . I've tried to start / stop server before / after my tests run and the state changes from watcher are expected. The return code of zookeeper_interest though is not reliable (for example, zookeeper_interest will return OK when the server is dead). Maybe we should add some documentations on the zookeeper_interest to indicate the return code of it should NOT be used for deciding connection status between client and server.
          Hide
          Flavio Junqueira added a comment -

          Michael Han You're right, unfortunately we don't have good online documentation for the C client, and I'm totally for documenting it. I'm thinking that perhaps we should have a documentation jira for the C client and have one of the tasks be to document the use of the single-thread C client. I'd say that we resolve this issue so that we don't have it as a blocker anymore for 3.5.2. We can alternatively keep it open, link it to the documentation jira, but drop the priority. What do you think?

          Btw, I used this code to test the behavior of the single-thread C client:

          https://github.com/fpj/zookeeper-book-example/blob/master/src/main/c/master.c

          Show
          Flavio Junqueira added a comment - Michael Han You're right, unfortunately we don't have good online documentation for the C client, and I'm totally for documenting it. I'm thinking that perhaps we should have a documentation jira for the C client and have one of the tasks be to document the use of the single-thread C client. I'd say that we resolve this issue so that we don't have it as a blocker anymore for 3.5.2. We can alternatively keep it open, link it to the documentation jira, but drop the priority. What do you think? Btw, I used this code to test the behavior of the single-thread C client: https://github.com/fpj/zookeeper-book-example/blob/master/src/main/c/master.c
          Hide
          Michael Han added a comment -

          Link to documentation JIRA sub-task.

          Show
          Michael Han added a comment - Link to documentation JIRA sub-task.
          Hide
          Michael Han added a comment -

          Flavio Junqueira I think we can resolve this. I also created ZOOKEEPER-2431 and ZOOKEEPER-2432 as a follow up to improve documentation.

          Show
          Michael Han added a comment - Flavio Junqueira I think we can resolve this. I also created ZOOKEEPER-2431 and ZOOKEEPER-2432 as a follow up to improve documentation.

            People

            • Assignee:
              Yunong Xiao
              Reporter:
              Yunong Xiao
            • Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development