Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-1561

Zookeeper client may hang on a server restart

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 3.5.0
    • Fix Version/s: 3.5.0
    • Component/s: java client
    • Labels:
      None

      Description

      In the doIO method of ClientCnxnSocketNIO

       if (p != null) {
                          outgoingQueue.removeFirstOccurrence(p);
                          updateLastSend();
                          if ((p.requestHeader != null) &&
                                  (p.requestHeader.getType() != OpCode.ping) &&
                                  (p.requestHeader.getType() != OpCode.auth)) {
                              p.requestHeader.setXid(cnxn.getXid());
                          }
                          p.createBB();
                          ByteBuffer pbb = p.bb;
                          sock.write(pbb);
                          if (!pbb.hasRemaining()) {
                              sentCount++;
                              if (p.requestHeader != null
                                      && p.requestHeader.getType() != OpCode.ping
                                      && p.requestHeader.getType() != OpCode.auth) {
                                  pending.add(p);
                              }
                          }
      

      When the sock.write(pbb) method throws an exception, the packet will not be cleanup(not in outgoingQueue nor in pendingQueue). If the client wait for it, it will wait forever...

        Issue Links

          Activity

          Hide
          fanster.z Jacky007 added a comment -

          It was fixed in ZOOKEEPER-1560.

          Show
          fanster.z Jacky007 added a comment - It was fixed in ZOOKEEPER-1560 .
          Hide
          ekoontz Eugene Koontz added a comment -

          It's a good time to revisit this now that ZOOKEEPER-1560 is fixed.

          Show
          ekoontz Eugene Koontz added a comment - It's a good time to revisit this now that ZOOKEEPER-1560 is fixed.
          Show
          ekoontz Eugene Koontz added a comment - Marshall McMullen (@marshall) mentions in ZOOKEEPER-107 ( https://issues.apache.org/jira/browse/ZOOKEEPER-107?focusedCommentId=13476346&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13476346 ) that this test failure: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1230//testReport/ might be due to ZOOKEEPER-1561 - console output might be useful for creating test case.
          Hide
          ekoontz Eugene Koontz added a comment -

          I should say in my last sentence above, "client will hang, if bug exists" - client should clearly not hang if client code is functioning correctly.

          Show
          ekoontz Eugene Koontz added a comment - I should say in my last sentence above, "client will hang, if bug exists" - client should clearly not hang if client code is functioning correctly.
          Hide
          ekoontz Eugene Koontz added a comment -

          I'd like to help with this if I can - first step would be a unit test that exposes it. If I understand from @Jacky007's description, I think that the test would be:

          1. Start a client and server
          2. Client waits till server comes up.
          3. Stop the server.
          4. Client sends a packet to the server (e.g. "get /").
          5. Restart the server.

          Client should hang at step 4. Test should detect the hang somehow and fail the test.

          Show
          ekoontz Eugene Koontz added a comment - I'd like to help with this if I can - first step would be a unit test that exposes it. If I understand from @Jacky007's description, I think that the test would be: 1. Start a client and server 2. Client waits till server comes up. 3. Stop the server. 4. Client sends a packet to the server (e.g. "get /"). 5. Restart the server. Client should hang at step 4. Test should detect the hang somehow and fail the test.
          Hide
          shralex Alexander Shraer added a comment -

          Might be duplicate with ZOOKEEPER-1560 that discovered a different problem with the same block of code

          Show
          shralex Alexander Shraer added a comment - Might be duplicate with ZOOKEEPER-1560 that discovered a different problem with the same block of code
          Hide
          ekoontz Eugene Koontz added a comment -

          This was discovered in process of working on ZOOKEEPER-107.

          Show
          ekoontz Eugene Koontz added a comment - This was discovered in process of working on ZOOKEEPER-107 .
          Hide
          ekoontz Eugene Koontz added a comment -

          Code mentioned in description was added in ZOOKEEPER-1437.

          Show
          ekoontz Eugene Koontz added a comment - Code mentioned in description was added in ZOOKEEPER-1437 .

            People

            • Assignee:
              Unassigned
              Reporter:
              fanster.z Jacky007
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development