ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-795

eventThread isn't shutdown after a connection "session expired" event coming

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 3.3.1
    • Fix Version/s: 3.3.2, 3.4.0
    • Component/s: java client
    • Labels:
      None
    • Environment:

      ubuntu 10.04

      Description

      Hi,

      I notice a problem with the eventThread located in ClientCnxn.java file.
      The eventThread isn't shutdown after a connection "session expired" event coming (i.e. never receive EventOfDeath).

      When a session timeout occurs and the session is marked as expired, the connexion is fully closed (socket, SendThread...) expect for the eventThread.
      As a result, if i create a new zookeeper object and connect through it, I got a zombi thread which will never be kill (as for the previous zookeeper object, the state is already close, calling close again don't do anything).

      So everytime I will create a new zookeeper connection after a expired session, I will have a one more zombi EventThread.

      How to reproduce :

      • Start a zookeeper client connection in debug mode
      • Pause the jvm enough time to the expired event occur
      • Watch for example with jvisualvm the list of threads, the sendThread is succesfully killed, but the EventThread go to wait state for a infinity of time
      • if you reopen a new zookeeper connection, and do again the previous steps, another EventThread will be present in infinite wait state
      1. ZOOKEEPER-795.patch
        2 kB
        Sergey Doroshenko
      2. ZOOKEEPER-795.patch
        4 kB
        Benjamin Reed
      3. ExpiredSessionThreadLeak.java
        8 kB
        Daniel Lord

        Issue Links

          Activity

          Hide
          Patrick Hunt added a comment -

          A second user reported this issue to me.

          When we fix this we should also address the event thread naming. Where the user creates a new session from within the event thread context we end up with a very long thread name (we keep appending the thread name to the parent thread... so main-event-event-event-event, etc...)

          Show
          Patrick Hunt added a comment - A second user reported this issue to me. When we fix this we should also address the event thread naming. Where the user creates a new session from within the event thread context we end up with a very long thread name (we keep appending the thread name to the parent thread... so main-event-event-event-event, etc...)
          Hide
          Daniel Lord added a comment -

          Here is a test class that illustrates the problem using Thread#suspend and Thread#resume to simulate long GC times or other halted JVM problems. There is a simple wrapper around a ZooKeeper client handle that will connect another client in the event that its managed client receives an Expired event. The longer that the test program runs the more runaway EventThreads will be created and displayed.

          The only dependencies are on the ZooKeeper libraries.

          Show
          Daniel Lord added a comment - Here is a test class that illustrates the problem using Thread#suspend and Thread#resume to simulate long GC times or other halted JVM problems. There is a simple wrapper around a ZooKeeper client handle that will connect another client in the event that its managed client receives an Expired event. The longer that the test program runs the more runaway EventThreads will be created and displayed. The only dependencies are on the ZooKeeper libraries.
          Hide
          Sergey Doroshenko added a comment -

          This simple patch fixes both issues.
          No tests included, verified with Daniel's class. Not sure if we need yet another 20-seconds test for this.

          Show
          Sergey Doroshenko added a comment - This simple patch fixes both issues. No tests included, verified with Daniel's class. Not sure if we need yet another 20-seconds test for this.
          Hide
          Mahadev konar added a comment -

          ben, can you take a look at this?

          Show
          Mahadev konar added a comment - ben, can you take a look at this?
          Hide
          Sergey Doroshenko added a comment -

          Has anybody had a chance to look at the patch?

          Show
          Sergey Doroshenko added a comment - Has anybody had a chance to look at the patch?
          Hide
          Benjamin Reed added a comment -

          i've added a test. (added to the existing session expiration test, so it shouldn't add any running time to the tests)

          Show
          Benjamin Reed added a comment - i've added a test. (added to the existing session expiration test, so it shouldn't add any running time to the tests)
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12450795/ZOOKEEPER-795.patch
          against trunk revision 980576.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/101/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/101/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/101/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12450795/ZOOKEEPER-795.patch against trunk revision 980576. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/101/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/101/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/101/console This message is automatically generated.
          Hide
          Mahadev konar added a comment -

          I have committed this to trunk. but the patch does not apply 3.3.2.

          ben can you provide a patch for 3.3.2?

          thanks

          Show
          Mahadev konar added a comment - I have committed this to trunk. but the patch does not apply 3.3.2. ben can you provide a patch for 3.3.2? thanks
          Hide
          Mahadev konar added a comment -

          ben, pinging again, can you provide a patch for 3.3.2?

          Show
          Mahadev konar added a comment - ben, pinging again, can you provide a patch for 3.3.2?
          Hide
          Patrick Hunt added a comment -

          This is blocking a release candidate for 3.3.2, if we can get this in soon I'll start running through the release process.

          Show
          Patrick Hunt added a comment - This is blocking a release candidate for 3.3.2, if we can get this in soon I'll start running through the release process.
          Hide
          Benjamin Reed added a comment -

          Committed revision 986470. in branch 3.3

          Show
          Benjamin Reed added a comment - Committed revision 986470. in branch 3.3

            People

            • Assignee:
              Sergey Doroshenko
              Reporter:
              mathieu barcikowski
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development