Hadoop Common
  1. Hadoop Common
  2. HADOOP-8326

test-patch can leak processes in some cases

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 3.0.0
    • Fix Version/s: None
    • Component/s: test
    • Labels:
      None
    • Target Version/s:

      Description

      test-patch.sh can leak processes in some cases. These leaked processes can cause subsequent tests to fail because they are holding resources, like ports that the others may need to execute correctly.

      1. HADOOP-8326.txt
        3 kB
        Robert Joseph Evans
      2. HADOOP-8326.txt
        3 kB
        Robert Joseph Evans

        Activity

        Hide
        Robert Joseph Evans added a comment -

        Sadly there is no easy way to detect these processes. This is because when we do a fork/exec through the default container executor most of the time we will also do a setsid on them, so that the session id is different from the original. This also makes it so that the process group id is different. We have effectively made these processes daemons. They even end up having init as their PPID when we are done.

        I think the best way to handle this is to be conservative about what we kill off so that we kill off everything in our own process group, and then also any daemon java processes owned by the current user, and that in their arguments they have a reference to the current working directory in some way. This matches all of the errant processes that I have seen on the build machines.

        Show
        Robert Joseph Evans added a comment - Sadly there is no easy way to detect these processes. This is because when we do a fork/exec through the default container executor most of the time we will also do a setsid on them, so that the session id is different from the original. This also makes it so that the process group id is different. We have effectively made these processes daemons. They even end up having init as their PPID when we are done. I think the best way to handle this is to be conservative about what we kill off so that we kill off everything in our own process group, and then also any daemon java processes owned by the current user, and that in their arguments they have a reference to the current working directory in some way. This matches all of the errant processes that I have seen on the build machines.
        Hide
        Robert Joseph Evans added a comment -

        This patch will try to kill off errant processes. It also installs a trap before running any maven commands to try and kill them off even when hudson aborts the build.

        Show
        Robert Joseph Evans added a comment - This patch will try to kill off errant processes. It also installs a trap before running any maven commands to try and kill them off even when hudson aborts the build.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12524926/HADOOP-8326.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/897//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12524926/HADOOP-8326.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/897//console This message is automatically generated.
        Hide
        Tom White added a comment -

        +1 for the direction.

        Should we take this opportunity to remove the lines that try to kill left-over processes from previous runs? That doesn't work well on Jenkins where several builds may be running on the same machine at any one time.

        Is it possible to use ps instead of pgrep, since it may not be installed on developer machines.

        Show
        Tom White added a comment - +1 for the direction. Should we take this opportunity to remove the lines that try to kill left-over processes from previous runs? That doesn't work well on Jenkins where several builds may be running on the same machine at any one time. Is it possible to use ps instead of pgrep, since it may not be installed on developer machines.
        Hide
        Robert Joseph Evans added a comment -

        Canceling patch to address Tom's comments.

        Show
        Robert Joseph Evans added a comment - Canceling patch to address Tom's comments.
        Hide
        Robert Joseph Evans added a comment -

        This patch no longer uses pgrep, just ps, grep and awk. I have tested the code manually and verified that it works as expected. I have not tested it with jenkins nor am I able to unless I get more permissions to edit the jenkins setup.

        Show
        Robert Joseph Evans added a comment - This patch no longer uses pgrep, just ps, grep and awk. I have tested the code manually and verified that it works as expected. I have not tested it with jenkins nor am I able to unless I get more permissions to edit the jenkins setup.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12525175/HADOOP-8326.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        -1 javadoc. The javadoc tool appears to have generated 16 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in .

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/911//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/911//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525175/HADOOP-8326.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. -1 javadoc. The javadoc tool appears to have generated 16 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/911//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/911//console This message is automatically generated.
        Hide
        Robert Joseph Evans added a comment -

        This is still happening and on hadoop3 there are currently 8+ leaked processes. I don't think we are seeing the test failures we saw before because all of the mini clusters use ephemeral ports now.

        I would like to check in this change but I am a bit nervous about doing so because I may be killing off more then I should. I am killing off everything in the same process group. So if I run

        test-patch.sh ... | less

        less is getting killed too. I don't really know for sure if this will cause problems for Jenkins or not, it depends on how Jenkins is launching test-patch.

        Show
        Robert Joseph Evans added a comment - This is still happening and on hadoop3 there are currently 8+ leaked processes. I don't think we are seeing the test failures we saw before because all of the mini clusters use ephemeral ports now. I would like to check in this change but I am a bit nervous about doing so because I may be killing off more then I should. I am killing off everything in the same process group. So if I run test-patch.sh ... | less less is getting killed too. I don't really know for sure if this will cause problems for Jenkins or not, it depends on how Jenkins is launching test-patch.
        Hide
        Robert Joseph Evans added a comment -

        I don't really see this happening. It is not a simple problem to solve, and I don't want to risk breaking Jenkins, so I will just leave it for now.

        Show
        Robert Joseph Evans added a comment - I don't really see this happening. It is not a simple problem to solve, and I don't want to risk breaking Jenkins, so I will just leave it for now.

          People

          • Assignee:
            Robert Joseph Evans
            Reporter:
            Robert Joseph Evans
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development