Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5197

RM leaks containers if running container disappears from node update

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.7.2, 2.6.4
    • Fix Version/s: 2.8.0, 2.6.5, 2.7.4
    • Component/s: resourcemanager
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Once a node reports a container running in a status update, the corresponding RMNodeImpl will track the container in its launchedContainers map. If the node somehow misses sending the completed container status to the RM and the container simply disappears from subsequent heartbeats, the container will leak in launchedContainers forever and the container completion event will not be sent to the scheduler.

      1. YARN-5197.001.patch
        14 kB
        Jason Lowe
      2. YARN-5197.002.patch
        14 kB
        Jason Lowe
      3. YARN-5197.003.patch
        14 kB
        Jason Lowe
      4. YARN-5197-branch-2.8.003.patch
        14 kB
        Jason Lowe
      5. YARN-5197-branch-2.7.003.patch
        13 kB
        Jason Lowe

        Issue Links

          Activity

          Hide
          djp Junping Du added a comment -

          I think the patch goes to branch-2.8 as well. Add 2.8 in fix version.

          Show
          djp Junping Du added a comment - I think the patch goes to branch-2.8 as well. Add 2.8 in fix version.
          Hide
          jlowe Jason Lowe added a comment -

          is this possible that container info disappear from node update?

          Yes it is definitely possible since we've seen it in practice. If the application is tearing down the RM will tell the NM to clean up the application. There are scenarios where the NM can fail to report a completed container for an application that is being cleaned up, since it's removing all the app state and the containers that go with it. Since the app is cleaning up, there's no AM around to ack. And if the NM never reports a completion event for a container then RMNodeImpl clearly leaks in the launchedContainers map without this patch.

          The patch also covers the corner case where the NM failed to record state for a container somehow (I/O error or other state store failure) and reconnected with partial state. In that scenario the RM will properly detect that the container is no longer being tracked by the NM and report the completion to the application (as well as preventing the leak in launchedContainers).

          Show
          jlowe Jason Lowe added a comment - is this possible that container info disappear from node update? Yes it is definitely possible since we've seen it in practice. If the application is tearing down the RM will tell the NM to clean up the application. There are scenarios where the NM can fail to report a completed container for an application that is being cleaned up, since it's removing all the app state and the containers that go with it. Since the app is cleaning up, there's no AM around to ack. And if the NM never reports a completion event for a container then RMNodeImpl clearly leaks in the launchedContainers map without this patch. The patch also covers the corner case where the NM failed to record state for a container somehow (I/O error or other state store failure) and reconnected with partial state. In that scenario the RM will properly detect that the container is no longer being tracked by the NM and report the completion to the application (as well as preventing the leak in launchedContainers).
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          Committed to branch-2.6 and branch-2.7. thanks Jason Lowe for the patch!!

          Show
          rohithsharma Rohith Sharma K S added a comment - Committed to branch-2.6 and branch-2.7. thanks Jason Lowe for the patch!!
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          Thanks Jason Lowe for providing branch patches. HadoopQA results failures are unrelated to the patch. I will go-ahead committing the patch to branches.

          Show
          rohithsharma Rohith Sharma K S added a comment - Thanks Jason Lowe for providing branch patches. HadoopQA results failures are unrelated to the patch. I will go-ahead committing the patch to branches.
          Hide
          sandflee sandflee added a comment -

          Hi, Jason Lowe, is this possible that container info disappear from node update? since NM only remove containers when AM acks container complete msg. correct me if I missed some thing, thanks!

          Show
          sandflee sandflee added a comment - Hi, Jason Lowe , is this possible that container info disappear from node update? since NM only remove containers when AM acks container complete msg. correct me if I missed some thing, thanks!
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 29s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 6m 54s branch-2.7 passed
          +1 compile 0m 34s branch-2.7 passed with JDK v1.8.0_91
          +1 compile 0m 32s branch-2.7 passed with JDK v1.7.0_101
          +1 checkstyle 0m 19s branch-2.7 passed
          +1 mvnsite 0m 43s branch-2.7 passed
          +1 mvneclipse 0m 16s branch-2.7 passed
          -1 findbugs 1m 11s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager in branch-2.7 has 1 extant Findbugs warnings.
          +1 javadoc 0m 22s branch-2.7 passed with JDK v1.8.0_91
          +1 javadoc 0m 24s branch-2.7 passed with JDK v1.7.0_101
          +1 mvninstall 0m 31s the patch passed
          +1 compile 0m 28s the patch passed with JDK v1.8.0_91
          +1 javac 0m 28s the patch passed
          +1 compile 0m 29s the patch passed with JDK v1.7.0_101
          +1 javac 0m 29s the patch passed
          -1 checkstyle 0m 20s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 3 new + 209 unchanged - 0 fixed = 212 total (was 209)
          +1 mvnsite 0m 38s the patch passed
          +1 mvneclipse 0m 15s the patch passed
          -1 whitespace 0m 0s The patch has 2138 line(s) that end in whitespace. Use git apply --whitespace=fix.
          -1 whitespace 0m 42s The patch 76 line(s) with tabs.
          +1 findbugs 1m 20s the patch passed
          +1 javadoc 0m 18s the patch passed with JDK v1.8.0_91
          +1 javadoc 0m 23s the patch passed with JDK v1.7.0_101
          -1 unit 50m 38s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_91.
          -1 unit 50m 49s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_101.
          +1 asflicense 0m 15s The patch does not generate ASF License warnings.
          120m 20s



          Reason Tests
          JDK v1.8.0_91 Failed junit tests hadoop.yarn.server.resourcemanager.TestAMAuthorization
            hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
            hadoop.yarn.server.resourcemanager.TestClientRMTokens
          JDK v1.7.0_101 Failed junit tests hadoop.yarn.server.resourcemanager.TestAMAuthorization
            hadoop.yarn.server.resourcemanager.TestClientRMTokens



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:c420dfe
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12811983/YARN-5197-branch-2.7.003.patch
          JIRA Issue YARN-5197
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 17c85863ad3c 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision branch-2.7 / e19cd05
          Default Java 1.7.0_101
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_91 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_101
          findbugs v3.0.0
          findbugs https://builds.apache.org/job/PreCommit-YARN-Build/12083/artifact/patchprocess/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-warnings.html
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/12083/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          whitespace https://builds.apache.org/job/PreCommit-YARN-Build/12083/artifact/patchprocess/whitespace-eol.txt
          whitespace https://builds.apache.org/job/PreCommit-YARN-Build/12083/artifact/patchprocess/whitespace-tabs.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/12083/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_91.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/12083/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_101.txt
          unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/12083/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_91.txt https://builds.apache.org/job/PreCommit-YARN-Build/12083/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_101.txt
          JDK v1.7.0_101 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/12083/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/12083/console
          Powered by Apache Yetus 0.3.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 29s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 6m 54s branch-2.7 passed +1 compile 0m 34s branch-2.7 passed with JDK v1.8.0_91 +1 compile 0m 32s branch-2.7 passed with JDK v1.7.0_101 +1 checkstyle 0m 19s branch-2.7 passed +1 mvnsite 0m 43s branch-2.7 passed +1 mvneclipse 0m 16s branch-2.7 passed -1 findbugs 1m 11s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager in branch-2.7 has 1 extant Findbugs warnings. +1 javadoc 0m 22s branch-2.7 passed with JDK v1.8.0_91 +1 javadoc 0m 24s branch-2.7 passed with JDK v1.7.0_101 +1 mvninstall 0m 31s the patch passed +1 compile 0m 28s the patch passed with JDK v1.8.0_91 +1 javac 0m 28s the patch passed +1 compile 0m 29s the patch passed with JDK v1.7.0_101 +1 javac 0m 29s the patch passed -1 checkstyle 0m 20s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 3 new + 209 unchanged - 0 fixed = 212 total (was 209) +1 mvnsite 0m 38s the patch passed +1 mvneclipse 0m 15s the patch passed -1 whitespace 0m 0s The patch has 2138 line(s) that end in whitespace. Use git apply --whitespace=fix. -1 whitespace 0m 42s The patch 76 line(s) with tabs. +1 findbugs 1m 20s the patch passed +1 javadoc 0m 18s the patch passed with JDK v1.8.0_91 +1 javadoc 0m 23s the patch passed with JDK v1.7.0_101 -1 unit 50m 38s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_91. -1 unit 50m 49s hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_101. +1 asflicense 0m 15s The patch does not generate ASF License warnings. 120m 20s Reason Tests JDK v1.8.0_91 Failed junit tests hadoop.yarn.server.resourcemanager.TestAMAuthorization   hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart   hadoop.yarn.server.resourcemanager.TestClientRMTokens JDK v1.7.0_101 Failed junit tests hadoop.yarn.server.resourcemanager.TestAMAuthorization   hadoop.yarn.server.resourcemanager.TestClientRMTokens Subsystem Report/Notes Docker Image:yetus/hadoop:c420dfe JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12811983/YARN-5197-branch-2.7.003.patch JIRA Issue YARN-5197 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 17c85863ad3c 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision branch-2.7 / e19cd05 Default Java 1.7.0_101 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_91 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_101 findbugs v3.0.0 findbugs https://builds.apache.org/job/PreCommit-YARN-Build/12083/artifact/patchprocess/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-warnings.html checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/12083/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt whitespace https://builds.apache.org/job/PreCommit-YARN-Build/12083/artifact/patchprocess/whitespace-eol.txt whitespace https://builds.apache.org/job/PreCommit-YARN-Build/12083/artifact/patchprocess/whitespace-tabs.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/12083/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_91.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/12083/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_101.txt unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/12083/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_91.txt https://builds.apache.org/job/PreCommit-YARN-Build/12083/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_101.txt JDK v1.7.0_101 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/12083/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/12083/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
          Hide
          jlowe Jason Lowe added a comment -

          Thanks for the review and commit, Rohith! Here are patches for branch-2.8 and branch-2.7. I believe the 2.7 patch will work on 2.6 as well.

          Show
          jlowe Jason Lowe added a comment - Thanks for the review and commit, Rohith! Here are patches for branch-2.8 and branch-2.7. I believe the 2.7 patch will work on 2.6 as well.
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          Committed to trunk/branch-2!! Thanks Jason Lowe for your contributions.
          For branch-2.8,branch-2.7 and branch-2.6, need to rebase the patch, would you rebase the patch please?

          Show
          rohithsharma Rohith Sharma K S added a comment - Committed to trunk/branch-2!! Thanks Jason Lowe for your contributions. For branch-2.8,branch-2.7 and branch-2.6, need to rebase the patch, would you rebase the patch please?
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-trunk-Commit #9948 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9948/)
          YARN-5197. RM leaks containers if running container disappears from node (rohithsharmaks: rev e0f4620cc7db3db4b781e6042ab7dd754af28f18)

          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
          • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #9948 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9948/ ) YARN-5197 . RM leaks containers if running container disappears from node (rohithsharmaks: rev e0f4620cc7db3db4b781e6042ab7dd754af28f18) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockNM.java
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 12s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 7m 21s trunk passed
          +1 compile 0m 34s trunk passed
          +1 checkstyle 0m 22s trunk passed
          +1 mvnsite 0m 39s trunk passed
          +1 mvneclipse 0m 14s trunk passed
          +1 findbugs 1m 1s trunk passed
          +1 javadoc 0m 20s trunk passed
          +1 mvninstall 0m 31s the patch passed
          +1 compile 0m 29s the patch passed
          +1 javac 0m 29s the patch passed
          +1 checkstyle 0m 18s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 0 new + 150 unchanged - 1 fixed = 150 total (was 151)
          +1 mvnsite 0m 35s the patch passed
          +1 mvneclipse 0m 11s the patch passed
          -1 whitespace 0m 0s The patch has 20 line(s) that end in whitespace. Use git apply --whitespace=fix.
          +1 findbugs 1m 10s the patch passed
          +1 javadoc 0m 17s the patch passed
          -1 unit 32m 16s hadoop-yarn-server-resourcemanager in the patch failed.
          +1 asflicense 0m 17s The patch does not generate ASF License warnings.
          47m 27s



          Reason Tests
          Failed junit tests hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisher



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:2c91fd8
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12809463/YARN-5197.003.patch
          JIRA Issue YARN-5197
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 414e02c617a6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 0bbb4dd
          Default Java 1.8.0_91
          findbugs v3.0.0
          whitespace https://builds.apache.org/job/PreCommit-YARN-Build/11975/artifact/patchprocess/whitespace-eol.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/11975/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/11975/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/11975/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/11975/console
          Powered by Apache Yetus 0.3.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 12s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 7m 21s trunk passed +1 compile 0m 34s trunk passed +1 checkstyle 0m 22s trunk passed +1 mvnsite 0m 39s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 1m 1s trunk passed +1 javadoc 0m 20s trunk passed +1 mvninstall 0m 31s the patch passed +1 compile 0m 29s the patch passed +1 javac 0m 29s the patch passed +1 checkstyle 0m 18s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 0 new + 150 unchanged - 1 fixed = 150 total (was 151) +1 mvnsite 0m 35s the patch passed +1 mvneclipse 0m 11s the patch passed -1 whitespace 0m 0s The patch has 20 line(s) that end in whitespace. Use git apply --whitespace=fix. +1 findbugs 1m 10s the patch passed +1 javadoc 0m 17s the patch passed -1 unit 32m 16s hadoop-yarn-server-resourcemanager in the patch failed. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 47m 27s Reason Tests Failed junit tests hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisher Subsystem Report/Notes Docker Image:yetus/hadoop:2c91fd8 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12809463/YARN-5197.003.patch JIRA Issue YARN-5197 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 414e02c617a6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 0bbb4dd Default Java 1.8.0_91 findbugs v3.0.0 whitespace https://builds.apache.org/job/PreCommit-YARN-Build/11975/artifact/patchprocess/whitespace-eol.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/11975/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/11975/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/11975/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/11975/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          -1 docker 0m 7s Docker failed to build yetus/hadoop:2c91fd8.



          Subsystem Report/Notes
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12809463/YARN-5197.003.patch
          JIRA Issue YARN-5197
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/11973/console
          Powered by Apache Yetus 0.3.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. -1 docker 0m 7s Docker failed to build yetus/hadoop:2c91fd8. Subsystem Report/Notes JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12809463/YARN-5197.003.patch JIRA Issue YARN-5197 Console output https://builds.apache.org/job/PreCommit-YARN-Build/11973/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
          Hide
          jlowe Jason Lowe added a comment -

          Thanks for the review, Rohith! I updated the patch to add the GUARANTEED check in findLostContainers.

          Show
          jlowe Jason Lowe added a comment - Thanks for the review, Rohith! I updated the patch to add the GUARANTEED check in findLostContainers.
          Hide
          rohithsharma Rohith Sharma K S added a comment -

          Overall patch looks good to me.
          One nit : In method findLostContainers , before adding to nodeContainers , can it be guarded with execution type for GUARANTEED?

          Show
          rohithsharma Rohith Sharma K S added a comment - Overall patch looks good to me. One nit : In method findLostContainers , before adding to nodeContainers , can it be guarded with execution type for GUARANTEED?
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 26s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 5m 59s trunk passed
          +1 compile 0m 28s trunk passed
          +1 checkstyle 0m 21s trunk passed
          +1 mvnsite 0m 32s trunk passed
          +1 mvneclipse 0m 12s trunk passed
          +1 findbugs 0m 51s trunk passed
          +1 javadoc 0m 19s trunk passed
          +1 mvninstall 0m 27s the patch passed
          +1 compile 0m 26s the patch passed
          +1 javac 0m 26s the patch passed
          +1 checkstyle 0m 18s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 0 new + 150 unchanged - 1 fixed = 150 total (was 151)
          +1 mvnsite 0m 33s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 9s the patch passed
          +1 javadoc 0m 19s the patch passed
          -1 unit 36m 53s hadoop-yarn-server-resourcemanager in the patch failed.
          +1 asflicense 0m 18s The patch does not generate ASF License warnings.
          50m 20s



          Reason Tests
          Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens
            hadoop.yarn.server.resourcemanager.TestAMAuthorization



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:2c91fd8
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12808391/YARN-5197.002.patch
          JIRA Issue YARN-5197
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 62f993b67157 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 35f255b
          Default Java 1.8.0_91
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-YARN-Build/11852/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/11852/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/11852/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/11852/console
          Powered by Apache Yetus 0.3.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 26s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 5m 59s trunk passed +1 compile 0m 28s trunk passed +1 checkstyle 0m 21s trunk passed +1 mvnsite 0m 32s trunk passed +1 mvneclipse 0m 12s trunk passed +1 findbugs 0m 51s trunk passed +1 javadoc 0m 19s trunk passed +1 mvninstall 0m 27s the patch passed +1 compile 0m 26s the patch passed +1 javac 0m 26s the patch passed +1 checkstyle 0m 18s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 0 new + 150 unchanged - 1 fixed = 150 total (was 151) +1 mvnsite 0m 33s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 9s the patch passed +1 javadoc 0m 19s the patch passed -1 unit 36m 53s hadoop-yarn-server-resourcemanager in the patch failed. +1 asflicense 0m 18s The patch does not generate ASF License warnings. 50m 20s Reason Tests Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens   hadoop.yarn.server.resourcemanager.TestAMAuthorization Subsystem Report/Notes Docker Image:yetus/hadoop:2c91fd8 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12808391/YARN-5197.002.patch JIRA Issue YARN-5197 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 62f993b67157 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 35f255b Default Java 1.8.0_91 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-YARN-Build/11852/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/11852/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/11852/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/11852/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
          Hide
          jlowe Jason Lowe added a comment -

          Updated the patch for the checkstyle issue. The test failures are tracked by HADOOP-12687.

          Show
          jlowe Jason Lowe added a comment - Updated the patch for the checkstyle issue. The test failures are tracked by HADOOP-12687 .
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 22s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 6m 22s trunk passed
          +1 compile 0m 28s trunk passed
          +1 checkstyle 0m 20s trunk passed
          +1 mvnsite 0m 32s trunk passed
          +1 mvneclipse 0m 11s trunk passed
          +1 findbugs 0m 52s trunk passed
          +1 javadoc 0m 20s trunk passed
          +1 mvninstall 0m 28s the patch passed
          +1 compile 0m 26s the patch passed
          +1 javac 0m 26s the patch passed
          -1 checkstyle 0m 19s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 150 unchanged - 1 fixed = 151 total (was 151)
          +1 mvnsite 0m 32s the patch passed
          +1 mvneclipse 0m 9s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 0m 56s the patch passed
          +1 javadoc 0m 17s the patch passed
          -1 unit 35m 26s hadoop-yarn-server-resourcemanager in the patch failed.
          +1 asflicense 0m 14s The patch does not generate ASF License warnings.
          48m 51s



          Reason Tests
          Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens
            hadoop.yarn.server.resourcemanager.TestAMAuthorization



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:2c91fd8
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12807836/YARN-5197.001.patch
          JIRA Issue YARN-5197
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 27076c539294 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 97e2449
          Default Java 1.8.0_91
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/11829/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          unit https://builds.apache.org/job/PreCommit-YARN-Build/11829/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/11829/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/11829/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/11829/console
          Powered by Apache Yetus 0.3.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 22s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 6m 22s trunk passed +1 compile 0m 28s trunk passed +1 checkstyle 0m 20s trunk passed +1 mvnsite 0m 32s trunk passed +1 mvneclipse 0m 11s trunk passed +1 findbugs 0m 52s trunk passed +1 javadoc 0m 20s trunk passed +1 mvninstall 0m 28s the patch passed +1 compile 0m 26s the patch passed +1 javac 0m 26s the patch passed -1 checkstyle 0m 19s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 150 unchanged - 1 fixed = 151 total (was 151) +1 mvnsite 0m 32s the patch passed +1 mvneclipse 0m 9s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 56s the patch passed +1 javadoc 0m 17s the patch passed -1 unit 35m 26s hadoop-yarn-server-resourcemanager in the patch failed. +1 asflicense 0m 14s The patch does not generate ASF License warnings. 48m 51s Reason Tests Failed junit tests hadoop.yarn.server.resourcemanager.TestClientRMTokens   hadoop.yarn.server.resourcemanager.TestAMAuthorization Subsystem Report/Notes Docker Image:yetus/hadoop:2c91fd8 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12807836/YARN-5197.001.patch JIRA Issue YARN-5197 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 27076c539294 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 97e2449 Default Java 1.8.0_91 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/11829/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit https://builds.apache.org/job/PreCommit-YARN-Build/11829/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt unit test logs https://builds.apache.org/job/PreCommit-YARN-Build/11829/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/11829/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/11829/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
          Hide
          jlowe Jason Lowe added a comment -

          RMNodeImpl checks the list of running containers on the node against launchedContainers but not vice-versa, so containers that disappear on the node are not detected. Here's a patch that detects when the RM thinks there are more containers running on the node than were reported and finds the containers that are lost. Each lost container generates a corresponding aborted completion event for the scheduler. The search for lost containers is only performed when one should be found, so it's low cost for the normal case.

          I updated MockNM as part of this patch since lots of tests were getting away with lazy mocking of a real NM. They were only specifying container state deltas in the heartbeat and sending empty heartbeats in-between those state changes. With this patch, the RM interprets those empty heartbeats as a loss of all actively running containers and broke those tests. The patch therefore also updates MockNM to track containers and continue reporting them until they have been marked completed just like a real node should. That was simpler to do than update all the users of MockNM to maintain their list of active container statuses explicitly.

          Show
          jlowe Jason Lowe added a comment - RMNodeImpl checks the list of running containers on the node against launchedContainers but not vice-versa, so containers that disappear on the node are not detected. Here's a patch that detects when the RM thinks there are more containers running on the node than were reported and finds the containers that are lost. Each lost container generates a corresponding aborted completion event for the scheduler. The search for lost containers is only performed when one should be found, so it's low cost for the normal case. I updated MockNM as part of this patch since lots of tests were getting away with lazy mocking of a real NM. They were only specifying container state deltas in the heartbeat and sending empty heartbeats in-between those state changes. With this patch, the RM interprets those empty heartbeats as a loss of all actively running containers and broke those tests. The patch therefore also updates MockNM to track containers and continue reporting them until they have been marked completed just like a real node should. That was simpler to do than update all the users of MockNM to maintain their list of active container statuses explicitly.

            People

            • Assignee:
              jlowe Jason Lowe
              Reporter:
              jlowe Jason Lowe
            • Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development