Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-3650

zero or overflown xid disrupts the session

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.4.14, 3.5.6
    • Fix Version/s: None
    • Component/s: c client, java client, server
    • Labels:
      None

      Description

      This is a follow-up ticket forĀ https://github.com/alexguan/node-zookeeper-client/issues/100

      I found that above nodejs ZK client (it's 100% pure JS implementation) starts XID counter from 0 in requests which leads to really strange behaviour when throttling happens on ZK server side - please check it out for more details - that's interesting.

      Above client will be fixed I hope, but actually, problem is still valid for native Java (partially) and C clients' implementation:

      Java client: Actually here I see change was made to avoid xid overflow (ZOOKEEPER-3253), but I don't see it's merged into latest 3.4.14 release - is there any plans for this change to make into 3.4.x release?

      C client: overflow is not checked and it has another problem with starting value, here is the code for single-threaded implementation (MT variant uses same logic):

      // make sure the static xid is initialized before any threads started
      __attribute__((constructor)) int32_t get_xid()
      {
          static int32_t xid = -1;
          if (xid == -1) {
              xid = time(0);
          }
          return fetch_and_add(&xid,1);
      }
      

      starting value is chosen to be time(0) which is current Unix epoch time. It will overflow in the future on its own, making C client (and all implementations using this library as a dependency) completely out of order some day (and I can even tell you exact date ). And as the time passes, this window (time() .. overflow) shrinks every day, making range available for xid values smaller and smaller... so problems will start happen earlier for clients making large number of requests without session reestablishment.

      One more thing to note here:

      Why XID=0 is considered as invalid value (check above ticket for details)? It's not stated anywhere, except ZK server code itself which decrements queued requests only for positive XID (below excerpt is from tip of the master branch):

          // will be called from zkServer.processPacket
          public void decrOutstandingAndCheckThrottle(ReplyHeader h) {
              if (h.getXid() <= 0) {
                  return;
              }
              if (!zkServer.shouldThrottle(outstandingCount.decrementAndGet())) {
                  enableRecv();
              }
          }
      

      Apart from that, requests with xid=0 are getting through fine. Can we change condition to h.getXid() < 0 ?

      And last one: is there some documentation fo ZK wire protocol, explaining fields' meaning and allowed values? I did not find one and had to reverse engineer the logic...
      Maybe it's a good idea to create it, so there will not be such misunderstandings and discrepancies in clients' implementations?

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                lobachpavel Pavel Lobach
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: