ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-820

update c unit tests to ensure "zombie" java server processes don't cause failure

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 3.3.1
    • Fix Version/s: 3.3.2, 3.4.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      When the c unit tests are run sometimes the server doesn't shutdown at the end of the test, this causes subsequent tests (hudson esp) to fail.

      1) we should try harder to make the server shut down at the end of the test, I suspect this is related to test failing/cleanup
      2) before the tests are run we should see if the old server is still running and try to shut it down

      1. ZOOKEEPER-820-1.patch
        2 kB
        Michi Mutsuzaki
      2. ZOOKEEPER-820.patch
        2 kB
        Michi Mutsuzaki
      3. ZOOKEEPER-820.patch
        2 kB
        Michi Mutsuzaki
      4. ZOOKEEPER-820.patch
        2 kB
        Michi Mutsuzaki

        Activity

        Hide
        Patrick Hunt added a comment -

        Mahadev suggested this for item 2 from the description:

        > I think we could some more standard tools like netstat to getthe process
        > using that port and try killing it. This would be less error prone nd more
        > reliable than other tools.

        Show
        Patrick Hunt added a comment - Mahadev suggested this for item 2 from the description: > I think we could some more standard tools like netstat to getthe process > using that port and try killing it. This would be less error prone nd more > reliable than other tools.
        Hide
        Michi Mutsuzaki added a comment -

        What's the 'correct' way to stop zookeeper server process? Right now the c unit test does kill -9. Is there a way to do clean shutdown?

        --Michi

        Show
        Michi Mutsuzaki added a comment - What's the 'correct' way to stop zookeeper server process? Right now the c unit test does kill -9. Is there a way to do clean shutdown? --Michi
        Hide
        Michi Mutsuzaki added a comment -

        This patch is for branch-3.3.

        I changed 2 things.

        1. Instead of using a file to keep track of pid, use lsof to check if a process is using port 22181.
        2. Fixed invalid return code checking. This was causing the unit test to go into a loop for 2 minutes in certain cases.

        --Michi

        Show
        Michi Mutsuzaki added a comment - This patch is for branch-3.3. I changed 2 things. 1. Instead of using a file to keep track of pid, use lsof to check if a process is using port 22181. 2. Fixed invalid return code checking. This was causing the unit test to go into a loop for 2 minutes in certain cases. --Michi
        Hide
        Michi Mutsuzaki added a comment -

        One more change. I like to change the sleep time to from 5 seconds 1 second. Most of the time, 1 second is enough for the server to start up (at least on my box).

        --Michi

        Show
        Michi Mutsuzaki added a comment - One more change. I like to change the sleep time to from 5 seconds 1 second. Most of the time, 1 second is enough for the server to start up (at least on my box). --Michi
        Hide
        Mahadev konar added a comment -

        The only comment I have is that these scripts might not work on cygwin. Let me try and check how lsof works on cygwin windows.

        Show
        Mahadev konar added a comment - The only comment I have is that these scripts might not work on cygwin. Let me try and check how lsof works on cygwin windows.
        Hide
        Benjamin Reed added a comment -

        +1 this looks good to me. did you try it on cygwin?

        Show
        Benjamin Reed added a comment - +1 this looks good to me. did you try it on cygwin?
        Hide
        Michi Mutsuzaki added a comment -

        Cygwin doesn't have lsof.

        Most of the time, 'zombie' process is caused by the invalid return code checking (it keeps starting the server process in a loop for 2 minutes). I think it's ok to use pid file to keep track of the server process.

        I'll submit a new patch.

        --Michi

        Show
        Michi Mutsuzaki added a comment - Cygwin doesn't have lsof. Most of the time, 'zombie' process is caused by the invalid return code checking (it keeps starting the server process in a loop for 2 minutes). I think it's ok to use pid file to keep track of the server process. I'll submit a new patch. --Michi
        Hide
        Michi Mutsuzaki added a comment -

        Reverted back to use pid file instead of lsof.

        Show
        Michi Mutsuzaki added a comment - Reverted back to use pid file instead of lsof.
        Hide
        Mahadev konar added a comment -

        thsi does fix the if problem in the script (which took me sometime to realise was backwards ). Michi, can you also add the lsof check as follows:

        if which lsof > /dev/null 2>&1; then "run the command to kill the process using lsof"; else "we dont do anything"; fi
        

        Note that this is in addition to using pid checks. We can do the following -

        on stop server:

        • kill using pid file first
        • kill using lsof, if any such process if present.
        Show
        Mahadev konar added a comment - thsi does fix the if problem in the script (which took me sometime to realise was backwards ). Michi, can you also add the lsof check as follows: if which lsof > /dev/ null 2>&1; then "run the command to kill the process using lsof" ; else "we dont do anything" ; fi Note that this is in addition to using pid checks. We can do the following - on stop server: kill using pid file first kill using lsof, if any such process if present.
        Hide
        Patrick Hunt added a comment -

        Cancelling patch - needs to be updated for Mahadev's most recent comment.

        Show
        Patrick Hunt added a comment - Cancelling patch - needs to be updated for Mahadev's most recent comment.
        Hide
        Michi Mutsuzaki added a comment -

        Uses which to check if lsof command is present. If it is, use it to see if there is a process listening on port 22181 and kill it.

        --Michi

        Show
        Michi Mutsuzaki added a comment - Uses which to check if lsof command is present. If it is, use it to see if there is a process listening on port 22181 and kill it. --Michi
        Hide
        Patrick Hunt added a comment -

        +1, committed to trunk/branch33. Thanks Michi!

        Show
        Patrick Hunt added a comment - +1, committed to trunk/branch33. Thanks Michi!
        Hide
        Hudson added a comment -

        Integrated in ZooKeeper-trunk #973 (See https://hudson.apache.org/hudson/job/ZooKeeper-trunk/973/)
        ZOOKEEPER-820. update c unit tests to ensure "zombie" java server processes dont cause failure

        Show
        Hudson added a comment - Integrated in ZooKeeper-trunk #973 (See https://hudson.apache.org/hudson/job/ZooKeeper-trunk/973/ ) ZOOKEEPER-820 . update c unit tests to ensure "zombie" java server processes dont cause failure

          People

          • Assignee:
            Michi Mutsuzaki
            Reporter:
            Patrick Hunt
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development