ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-820

update c unit tests to ensure "zombie" java server processes don't cause failure

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 3.3.1
    • Fix Version/s: 3.3.2, 3.4.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      When the c unit tests are run sometimes the server doesn't shutdown at the end of the test, this causes subsequent tests (hudson esp) to fail.

      1) we should try harder to make the server shut down at the end of the test, I suspect this is related to test failing/cleanup
      2) before the tests are run we should see if the old server is still running and try to shut it down

      1. ZOOKEEPER-820.patch
        2 kB
        Michi Mutsuzaki
      2. ZOOKEEPER-820.patch
        2 kB
        Michi Mutsuzaki
      3. ZOOKEEPER-820-1.patch
        2 kB
        Michi Mutsuzaki
      4. ZOOKEEPER-820.patch
        2 kB
        Michi Mutsuzaki

        Activity

        Patrick Hunt created issue -
        Hide
        Patrick Hunt added a comment -

        Mahadev suggested this for item 2 from the description:

        > I think we could some more standard tools like netstat to getthe process
        > using that port and try killing it. This would be less error prone nd more
        > reliable than other tools.

        Show
        Patrick Hunt added a comment - Mahadev suggested this for item 2 from the description: > I think we could some more standard tools like netstat to getthe process > using that port and try killing it. This would be less error prone nd more > reliable than other tools.
        Michi Mutsuzaki made changes -
        Field Original Value New Value
        Assignee Michi Mutsuzaki [ michim ]
        Hide
        Michi Mutsuzaki added a comment -

        What's the 'correct' way to stop zookeeper server process? Right now the c unit test does kill -9. Is there a way to do clean shutdown?

        --Michi

        Show
        Michi Mutsuzaki added a comment - What's the 'correct' way to stop zookeeper server process? Right now the c unit test does kill -9. Is there a way to do clean shutdown? --Michi
        Hide
        Michi Mutsuzaki added a comment -

        This patch is for branch-3.3.

        I changed 2 things.

        1. Instead of using a file to keep track of pid, use lsof to check if a process is using port 22181.
        2. Fixed invalid return code checking. This was causing the unit test to go into a loop for 2 minutes in certain cases.

        --Michi

        Show
        Michi Mutsuzaki added a comment - This patch is for branch-3.3. I changed 2 things. 1. Instead of using a file to keep track of pid, use lsof to check if a process is using port 22181. 2. Fixed invalid return code checking. This was causing the unit test to go into a loop for 2 minutes in certain cases. --Michi
        Michi Mutsuzaki made changes -
        Attachment ZOOKEEPER-820.patch [ 12452987 ]
        Michi Mutsuzaki made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Michi Mutsuzaki added a comment -

        One more change. I like to change the sleep time to from 5 seconds 1 second. Most of the time, 1 second is enough for the server to start up (at least on my box).

        --Michi

        Show
        Michi Mutsuzaki added a comment - One more change. I like to change the sleep time to from 5 seconds 1 second. Most of the time, 1 second is enough for the server to start up (at least on my box). --Michi
        Michi Mutsuzaki made changes -
        Attachment ZOOKEEPER-820-1.patch [ 12452989 ]
        Hide
        Mahadev konar added a comment -

        The only comment I have is that these scripts might not work on cygwin. Let me try and check how lsof works on cygwin windows.

        Show
        Mahadev konar added a comment - The only comment I have is that these scripts might not work on cygwin. Let me try and check how lsof works on cygwin windows.
        Patrick Hunt made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Patrick Hunt made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Benjamin Reed added a comment -

        +1 this looks good to me. did you try it on cygwin?

        Show
        Benjamin Reed added a comment - +1 this looks good to me. did you try it on cygwin?
        Hide
        Michi Mutsuzaki added a comment -

        Cygwin doesn't have lsof.

        Most of the time, 'zombie' process is caused by the invalid return code checking (it keeps starting the server process in a loop for 2 minutes). I think it's ok to use pid file to keep track of the server process.

        I'll submit a new patch.

        --Michi

        Show
        Michi Mutsuzaki added a comment - Cygwin doesn't have lsof. Most of the time, 'zombie' process is caused by the invalid return code checking (it keeps starting the server process in a loop for 2 minutes). I think it's ok to use pid file to keep track of the server process. I'll submit a new patch. --Michi
        Hide
        Michi Mutsuzaki added a comment -

        Reverted back to use pid file instead of lsof.

        Show
        Michi Mutsuzaki added a comment - Reverted back to use pid file instead of lsof.
        Michi Mutsuzaki made changes -
        Attachment ZOOKEEPER-820.patch [ 12455878 ]
        Hide
        Mahadev konar added a comment -

        thsi does fix the if problem in the script (which took me sometime to realise was backwards ). Michi, can you also add the lsof check as follows:

        if which lsof > /dev/null 2>&1; then "run the command to kill the process using lsof"; else "we dont do anything"; fi
        

        Note that this is in addition to using pid checks. We can do the following -

        on stop server:

        • kill using pid file first
        • kill using lsof, if any such process if present.
        Show
        Mahadev konar added a comment - thsi does fix the if problem in the script (which took me sometime to realise was backwards ). Michi, can you also add the lsof check as follows: if which lsof > /dev/ null 2>&1; then "run the command to kill the process using lsof" ; else "we dont do anything" ; fi Note that this is in addition to using pid checks. We can do the following - on stop server: kill using pid file first kill using lsof, if any such process if present.
        Hide
        Patrick Hunt added a comment -

        Cancelling patch - needs to be updated for Mahadev's most recent comment.

        Show
        Patrick Hunt added a comment - Cancelling patch - needs to be updated for Mahadev's most recent comment.
        Patrick Hunt made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Michi Mutsuzaki added a comment -

        Uses which to check if lsof command is present. If it is, use it to see if there is a process listening on port 22181 and kill it.

        --Michi

        Show
        Michi Mutsuzaki added a comment - Uses which to check if lsof command is present. If it is, use it to see if there is a process listening on port 22181 and kill it. --Michi
        Michi Mutsuzaki made changes -
        Attachment ZOOKEEPER-820.patch [ 12457400 ]
        Michi Mutsuzaki made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Patrick Hunt added a comment -

        +1, committed to trunk/branch33. Thanks Michi!

        Show
        Patrick Hunt added a comment - +1, committed to trunk/branch33. Thanks Michi!
        Patrick Hunt made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Resolution Fixed [ 1 ]
        Hide
        Hudson added a comment -

        Integrated in ZooKeeper-trunk #973 (See https://hudson.apache.org/hudson/job/ZooKeeper-trunk/973/)
        ZOOKEEPER-820. update c unit tests to ensure "zombie" java server processes dont cause failure

        Show
        Hudson added a comment - Integrated in ZooKeeper-trunk #973 (See https://hudson.apache.org/hudson/job/ZooKeeper-trunk/973/ ) ZOOKEEPER-820 . update c unit tests to ensure "zombie" java server processes dont cause failure
        Mahadev konar made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Patch Available Patch Available Open Open
        42d 17h 42m 2 Patrick Hunt 06/Oct/10 18:03
        Open Open Patch Available Patch Available
        50d 14h 9m 3 Michi Mutsuzaki 18/Oct/10 00:52
        Patch Available Patch Available Resolved Resolved
        2d 18h 53m 1 Patrick Hunt 20/Oct/10 19:46
        Resolved Resolved Closed Closed
        399d 36m 1 Mahadev konar 23/Nov/11 19:22

          People

          • Assignee:
            Michi Mutsuzaki
            Reporter:
            Patrick Hunt
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development