Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4834

ProcfsBasedProcessTree doesn't track daemonized processes

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.7.2, 3.0.0-alpha1
    • Fix Version/s: 2.8.0, 2.7.3, 3.0.0-alpha1
    • Component/s: nodemanager
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Currently the algorithm uses ppid from /proc/<pid>/stat which can be 1 if a child process has daemonized itself. This causes potentially large processes from not being monitored.

      session id might be a better choice since that's what we use to signal the container during teardown.

        Issue Links

          Activity

          Hide
          gsohn Grant Sohn added a comment -

          Regarding the 'kill – ...':

          Ubuntu kill doesn't support "--" but recommends killall instead.

          Show
          gsohn Grant Sohn added a comment - Regarding the 'kill – ...': Ubuntu kill doesn't support "--" but recommends killall instead.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Closing the JIRA as part of 2.7.3 release.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Closing the JIRA as part of 2.7.3 release.
          Hide
          jlowe Jason Lowe added a comment -

          Thanks, Nathan! I committed this to trunk, branch-2, branch-2.8, and branch-2.7.

          Show
          jlowe Jason Lowe added a comment - Thanks, Nathan! I committed this to trunk, branch-2, branch-2.8, and branch-2.7.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #9707 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9707/)
          YARN-4834. ProcfsBasedProcessTree doesn't track daemonized processes. (jlowe: rev c6b48391680c1b81a86aabc3ad4c725bfade6d2e)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ProcfsBasedProcessTree.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestProcfsBasedProcessTree.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9707 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9707/ ) YARN-4834 . ProcfsBasedProcessTree doesn't track daemonized processes. (jlowe: rev c6b48391680c1b81a86aabc3ad4c725bfade6d2e) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ProcfsBasedProcessTree.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestProcfsBasedProcessTree.java
          Hide
          jlowe Jason Lowe added a comment -

          +1 to using the session ID to track the processes for a container. Ideally if we're using cgroups we should use that instead, but in the interim this would be a significant improvement.

          I'm not thrilled with keeping the double-pass logic leftover from the parent-child tree calculations. The session ID approach would only require a single pass. However that's a significant rewrite of the code, and I can appreciate keeping the changes to a minimum to fix this bug. We can file a followup JIRA to simplify the logic.

          +1 lgtm. Will commit this tomorrow if there are no objections.

          Show
          jlowe Jason Lowe added a comment - +1 to using the session ID to track the processes for a container. Ideally if we're using cgroups we should use that instead, but in the interim this would be a significant improvement. I'm not thrilled with keeping the double-pass logic leftover from the parent-child tree calculations. The session ID approach would only require a single pass. However that's a significant rewrite of the code, and I can appreciate keeping the changes to a minimum to fix this bug. We can file a followup JIRA to simplify the logic. +1 lgtm. Will commit this tomorrow if there are no objections.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 10s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 6m 38s trunk passed
          +1 compile 0m 22s trunk passed with JDK v1.8.0_77
          +1 compile 0m 26s trunk passed with JDK v1.7.0_95
          +1 checkstyle 0m 19s trunk passed
          +1 mvnsite 0m 31s trunk passed
          +1 mvneclipse 0m 12s trunk passed
          +1 findbugs 1m 8s trunk passed
          +1 javadoc 0m 26s trunk passed with JDK v1.8.0_77
          +1 javadoc 0m 34s trunk passed with JDK v1.7.0_95
          +1 mvninstall 0m 26s the patch passed
          +1 compile 0m 21s the patch passed with JDK v1.8.0_77
          +1 javac 0m 21s the patch passed
          +1 compile 0m 24s the patch passed with JDK v1.7.0_95
          +1 javac 0m 24s the patch passed
          -1 checkstyle 0m 18s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: patch generated 6 new + 35 unchanged - 0 fixed = 41 total (was 35)
          +1 mvnsite 0m 28s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 1m 18s the patch passed
          +1 javadoc 0m 25s the patch passed with JDK v1.8.0_77
          +1 javadoc 0m 32s the patch passed with JDK v1.7.0_95
          +1 unit 2m 1s hadoop-yarn-common in the patch passed with JDK v1.8.0_77.
          +1 unit 2m 17s hadoop-yarn-common in the patch passed with JDK v1.7.0_95.
          +1 asflicense 0m 17s Patch does not generate ASF License warnings.
          20m 39s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:fbe3e86
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12797089/YARN-4834.001.patch
          JIRA Issue YARN-4834
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux bcba00a3970f 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 9174645
          Default Java 1.7.0_95
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/10966/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt
          JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10966/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/10966/console
          Powered by Apache Yetus 0.2.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 10s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 6m 38s trunk passed +1 compile 0m 22s trunk passed with JDK v1.8.0_77 +1 compile 0m 26s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 19s trunk passed +1 mvnsite 0m 31s trunk passed +1 mvneclipse 0m 12s trunk passed +1 findbugs 1m 8s trunk passed +1 javadoc 0m 26s trunk passed with JDK v1.8.0_77 +1 javadoc 0m 34s trunk passed with JDK v1.7.0_95 +1 mvninstall 0m 26s the patch passed +1 compile 0m 21s the patch passed with JDK v1.8.0_77 +1 javac 0m 21s the patch passed +1 compile 0m 24s the patch passed with JDK v1.7.0_95 +1 javac 0m 24s the patch passed -1 checkstyle 0m 18s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common: patch generated 6 new + 35 unchanged - 0 fixed = 41 total (was 35) +1 mvnsite 0m 28s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 1m 18s the patch passed +1 javadoc 0m 25s the patch passed with JDK v1.8.0_77 +1 javadoc 0m 32s the patch passed with JDK v1.7.0_95 +1 unit 2m 1s hadoop-yarn-common in the patch passed with JDK v1.8.0_77. +1 unit 2m 17s hadoop-yarn-common in the patch passed with JDK v1.7.0_95. +1 asflicense 0m 17s Patch does not generate ASF License warnings. 20m 39s Subsystem Report/Notes Docker Image:yetus/hadoop:fbe3e86 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12797089/YARN-4834.001.patch JIRA Issue YARN-4834 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux bcba00a3970f 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 9174645 Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/10966/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/10966/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common Console output https://builds.apache.org/job/PreCommit-YARN-Build/10966/console Powered by Apache Yetus 0.2.0 http://yetus.apache.org This message was automatically generated.
          Hide
          nroberts Nathan Roberts added a comment -

          As a note, we were seeing this with slider applications. I didn't investigate far enough to know if all slider applications escape or if this was just a a characteristic of this particular application.

          Show
          nroberts Nathan Roberts added a comment - As a note, we were seeing this with slider applications. I didn't investigate far enough to know if all slider applications escape or if this was just a a characteristic of this particular application.
          Hide
          nroberts Nathan Roberts added a comment -

          Simple fix that falls back to sessionID if process has become owned by init. Seemed safest low risk change.

          Other options might be:

          • Only use sessionID to build process tree
          • Use container cgroup (cgroup.procs) if available/configured.
          Show
          nroberts Nathan Roberts added a comment - Simple fix that falls back to sessionID if process has become owned by init. Seemed safest low risk change. Other options might be: Only use sessionID to build process tree Use container cgroup (cgroup.procs) if available/configured.

            People

            • Assignee:
              nroberts Nathan Roberts
              Reporter:
              nroberts Nathan Roberts
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development