ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-1159

ClientCnxn does not propagate session expiration indication

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Blocker Blocker
    • Resolution: Unresolved
    • Affects Version/s: 3.4.0
    • Fix Version/s: 3.5.0
    • Component/s: java client
    • Labels:
      None

      Description

      ClientCnxn does not always propagate session expiration indication up to clients. If a reconnection attempt fails because the session has since expired, the KeeperCode is still Disconnected, but shouldn't it be set to Expired? Perhaps like so:

      --- a/src/java/main/org/apache/zookeeper/ClientCnxn.java
      +++ b/src/java/main/org/apache/zookeeper/ClientCnxn.java
      @@ -1160,6 +1160,7 @@ public class ClientCnxn {
                           clientCnxnSocket.doTransport(to, pendingQueue, outgoingQueue);
       
                       } catch (Exception e) {
      +                    Event.KeeperState eventState = Event.KeeperState.Disconnected;
                           if (closing) {
                               if (LOG.isDebugEnabled()) {
                                   // closing so this is expected
      @@ -1172,6 +1173,7 @@ public class ClientCnxn {
                               // this is ugly, you have a better way speak up
                               if (e instanceof SessionExpiredException) {
                                   LOG.info(e.getMessage() + ", closing socket connection");
      +                            eventState = Event.KeeperState.Expired;
                               } else if (e instanceof SessionTimeoutException) {
                                   LOG.info(e.getMessage() + RETRY_CONN_MSG);
                               } else if (e instanceof EndOfStreamException) {
      @@ -1191,7 +1193,7 @@ public class ClientCnxn {
                               if (state.isAlive()) {
                                   eventThread.queueEvent(new WatchedEvent(
                                           Event.EventType.None,
      -                                    Event.KeeperState.Disconnected,
      +                                    eventState,
                                           null));
                               }
                               clientCnxnSocket.updateNow();
      

      This affects HBase. HBase master and region server processes will shut down by design if their session has expired, but will attempt to reconnect if they think they have been disconnected. The above prevents proper termination.

        Issue Links

          Activity

          Andrew Purtell created issue -
          Andrew Purtell made changes -
          Field Original Value New Value
          Link This issue blocks HBASE-4235 [ HBASE-4235 ]
          Mahadev konar made changes -
          Fix Version/s 3.4.0 [ 12314469 ]
          Priority Major [ 3 ] Blocker [ 1 ]
          Hide
          Camille Fournier added a comment -

          Oh I had a user actually hit this bug I believe but I couldn't reproduce it for the life of me. I'll take a look at this for 3.4.

          Show
          Camille Fournier added a comment - Oh I had a user actually hit this bug I believe but I couldn't reproduce it for the life of me. I'll take a look at this for 3.4.
          Hide
          Camille Fournier added a comment -

          Now I remember my own analysis here. The only way to get a SessionExpired exception there so far as I can tell is if onConnected detects a SessionExpired. At which point it sets the state variable to CLOSED, so the "if(state.isAlive())" check will fail and it will never call that queueEvent.

          This is about where my digging into this issue ended. I couldn't reproduce this error and nothing in the code seems to me that it should be able to get that SessionExpired exception and not set the state to CLOSED, which will then always show a SESSIONEXPIRED exception to the client. If you can reproduce this error though, perhaps it is worth more investigation.

          Show
          Camille Fournier added a comment - Now I remember my own analysis here. The only way to get a SessionExpired exception there so far as I can tell is if onConnected detects a SessionExpired. At which point it sets the state variable to CLOSED, so the "if(state.isAlive())" check will fail and it will never call that queueEvent. This is about where my digging into this issue ended. I couldn't reproduce this error and nothing in the code seems to me that it should be able to get that SessionExpired exception and not set the state to CLOSED, which will then always show a SESSIONEXPIRED exception to the client. If you can reproduce this error though, perhaps it is worth more investigation.
          Hide
          Matthias Spycher added a comment -

          It's possible that the send thread dies, but the state of the connection remains alive (due to race conditions). In that case, we queue a Disconnected event:
          <code>
          ...
          cleanup();
          clientCnxnSocket.close();
          if (state.isAlive())

          { eventThread.queueEvent(new WatchedEvent(Event.EventType.None, Event.KeeperState.Disconnected, null)); }

          ZooTrace.logTraceMessage(LOG, ZooTrace.getTextTraceLevel(),
          "SendThread exitedloop.");
          }
          </code>
          I would expect a Expired event here. But improved state management would be even better.

          Show
          Matthias Spycher added a comment - It's possible that the send thread dies, but the state of the connection remains alive (due to race conditions). In that case, we queue a Disconnected event: <code> ... cleanup(); clientCnxnSocket.close(); if (state.isAlive()) { eventThread.queueEvent(new WatchedEvent(Event.EventType.None, Event.KeeperState.Disconnected, null)); } ZooTrace.logTraceMessage(LOG, ZooTrace.getTextTraceLevel(), "SendThread exitedloop."); } </code> I would expect a Expired event here. But improved state management would be even better.
          Hide
          Camille Fournier added a comment -

          Can you show the race condition that illustrates this?

          Show
          Camille Fournier added a comment - Can you show the race condition that illustrates this?
          Hide
          Andrew Purtell added a comment -

          The patch provided in the initial report fixes a problem we observed in our production.

          Show
          Andrew Purtell added a comment - The patch provided in the initial report fixes a problem we observed in our production.
          Hide
          Matthias Spycher added a comment -

          Looking at the current implementation, if a SessionExpiredException was thrown, then the expired event is enqueued, as is the event-of-death for the event thread; and isAlive() is false. So, as indicated by Camille, your patch doesn't appear to have the affect you expect.

          As for the race conditions. A call to close() will put the client in the closing mode, but the state may end up being CLOSED, CONNECTING or CONNECTED as the send thread doesn't enforce strict state transitions. The client will close, but state.isAlive() may still return true after the send thread has died. Note that this isn't easy to test, but you can step through it with a debugger to see how CLOSED can easily be stomped by CONNECTING (especially with the random sleep happening after a test for closing when isFirstConnect is false).
          Another, rather exotic case (hopefully doesn't ever happen), is when a Throwable isn't caught, e.g. an error that is not an Exception.

          In any case, I'm still not sure whether a dying send-thread should indicate the session has expired. I looked at the state diagram, but it seems to conflict with what I see in the code.

          Show
          Matthias Spycher added a comment - Looking at the current implementation, if a SessionExpiredException was thrown, then the expired event is enqueued, as is the event-of-death for the event thread; and isAlive() is false. So, as indicated by Camille, your patch doesn't appear to have the affect you expect. As for the race conditions. A call to close() will put the client in the closing mode, but the state may end up being CLOSED, CONNECTING or CONNECTED as the send thread doesn't enforce strict state transitions. The client will close, but state.isAlive() may still return true after the send thread has died. Note that this isn't easy to test, but you can step through it with a debugger to see how CLOSED can easily be stomped by CONNECTING (especially with the random sleep happening after a test for closing when isFirstConnect is false). Another, rather exotic case (hopefully doesn't ever happen), is when a Throwable isn't caught, e.g. an error that is not an Exception. In any case, I'm still not sure whether a dying send-thread should indicate the session has expired. I looked at the state diagram, but it seems to conflict with what I see in the code.
          Hide
          Matthias Spycher added a comment -

          As I understand it, the problem here may be that a disconnected client cannot discover that its session has expired. Only the server can declare a session expired which on the client side leads to the SessionExpiredException, but only when the client is connected.
          If this assumption is correct, I'm not sure how best to address it.

          Show
          Matthias Spycher added a comment - As I understand it, the problem here may be that a disconnected client cannot discover that its session has expired. Only the server can declare a session expired which on the client side leads to the SessionExpiredException, but only when the client is connected. If this assumption is correct, I'm not sure how best to address it.
          Camille Fournier made changes -
          Fix Version/s 3.5.0 [ 12316644 ]
          Fix Version/s 3.4.0 [ 12314469 ]
          Hide
          Jordan Zimmerman added a comment -

          FWIW - here's a little test that shows the problem:

          TestingServer           server = new TestingServer();   // uses Curator's TestingServer
          final CountDownLatch    connectedLatch = new CountDownLatch(1);
          final CountDownLatch    expiredLatch = new CountDownLatch(1);
          Watcher                 watcher = new Watcher()
          {
              @Override
              public void process(WatchedEvent event)
              {
                  System.out.println(event);
                  if ( event.getState() == Event.KeeperState.SyncConnected )
                  {
                      connectedLatch.countDown();
                  }
                  else if ( event.getState() == Event.KeeperState.Expired )
                  {
                      expiredLatch.countDown();
                  }
              }
          };
          ZooKeeper       zk = new ZooKeeper(connectString, 5000, watcher);
          connectedLatch.await();
          zk.exists("/", true);
          
          server.close();
          
          if ( !expiredLatch.await(10, TimeUnit.SECONDS) )
          {
              Assert.fail("Expired never happened");
          }
          
          Show
          Jordan Zimmerman added a comment - FWIW - here's a little test that shows the problem: TestingServer server = new TestingServer(); // uses Curator's TestingServer final CountDownLatch connectedLatch = new CountDownLatch(1); final CountDownLatch expiredLatch = new CountDownLatch(1); Watcher watcher = new Watcher() { @Override public void process(WatchedEvent event) { System .out.println(event); if ( event.getState() == Event.KeeperState.SyncConnected ) { connectedLatch.countDown(); } else if ( event.getState() == Event.KeeperState.Expired ) { expiredLatch.countDown(); } } }; ZooKeeper zk = new ZooKeeper(connectString, 5000, watcher); connectedLatch.await(); zk.exists( "/" , true ); server.close(); if ( !expiredLatch.await(10, TimeUnit.SECONDS) ) { Assert.fail( "Expired never happened" ); }
          Hide
          Jordan Zimmerman added a comment -

          ^^^^ Ignore my code above. I'm mistaken.

          Show
          Jordan Zimmerman added a comment - ^^^^ Ignore my code above. I'm mistaken.

            People

            • Assignee:
              Unassigned
              Reporter:
              Andrew Purtell
            • Votes:
              6 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:

                Development