Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3987

am container complete msg ack to NM once RM receive it

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: resourcemanager
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      In our cluster we set max-am-attempts to a very very large num, and unfortunately our am crash after launched, leaving too many completed container(AM container) in NM. completed container is removed from NM and NMStateStore only if container complete is passed to AM, but if AM couldn't be launched, the completed AM container couldn't be cleaned, and may eat up NM heap memory.

      1. YARN-3987.001.patch
        2 kB
        sandflee
      2. YARN-3987.002.patch
        2 kB
        sandflee

        Activity

        Hide
        hadoopqa Hadoop QA added a comment -



        -1 overall



        Vote Subsystem Runtime Comment
        0 pre-patch 16m 34s Pre-patch trunk compilation is healthy.
        +1 @author 0m 0s The patch does not contain any @author tags.
        -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
        +1 javac 7m 45s There were no new javac warning messages.
        +1 javadoc 9m 40s There were no new javadoc warning messages.
        +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
        -1 checkstyle 0m 46s The applied patch generated 4 new checkstyle issues (total was 123, now 127).
        +1 whitespace 0m 0s The patch has no lines that end in whitespace.
        +1 install 1m 22s mvn install still works.
        +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
        +1 findbugs 1m 25s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
        +1 yarn tests 52m 20s Tests passed in hadoop-yarn-server-resourcemanager.
            90m 51s  



        Subsystem Report/Notes
        Patch URL http://issues.apache.org/jira/secure/attachment/12747573/YARN-3987.001.patch
        Optional Tests javadoc javac unit findbugs checkstyle
        git revision trunk / f170934
        checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/8692/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
        hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/8692/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8692/testReport/
        Java 1.7.0_55
        uname Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/8692/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 16m 34s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac 7m 45s There were no new javac warning messages. +1 javadoc 9m 40s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 0m 46s The applied patch generated 4 new checkstyle issues (total was 123, now 127). +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 22s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 1m 25s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 yarn tests 52m 20s Tests passed in hadoop-yarn-server-resourcemanager.     90m 51s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12747573/YARN-3987.001.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / f170934 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/8692/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/8692/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8692/testReport/ Java 1.7.0_55 uname Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/8692/console This message was automatically generated.
        Hide
        jianhe Jian He added a comment -

        Hi, sandflee, below code in the sendAMContainerToNM method should send the container complete msg over. Is getKeepContainersAcrossApplicationAttempts true in your case?

            if (!appAttempt.getSubmissionContext()
              .getKeepContainersAcrossApplicationAttempts()) {
              appAttempt.sendFinishedContainersToNM();
            }
          }
        
        Show
        jianhe Jian He added a comment - Hi, sandflee , below code in the sendAMContainerToNM method should send the container complete msg over. Is getKeepContainersAcrossApplicationAttempts true in your case? if (!appAttempt.getSubmissionContext() .getKeepContainersAcrossApplicationAttempts()) { appAttempt.sendFinishedContainersToNM(); } }
        Hide
        sandflee sandflee added a comment -

        yes, we set getKeepContainersAcrossApplicationAttempts true, thanks for your review.

        Show
        sandflee sandflee added a comment - yes, we set getKeepContainersAcrossApplicationAttempts true, thanks for your review.
        Hide
        jianhe Jian He added a comment -

        leaving too many completed container(AM container) in NM.

        At a single point of time,there should be only one AM instance in NM. Do you mean the old AM containers are not cleaned up ?

        If AM cannot be launched, the AM will expire in 10 mins, in which case the containers should also be cleanedup.

        Show
        jianhe Jian He added a comment - leaving too many completed container(AM container) in NM. At a single point of time,there should be only one AM instance in NM. Do you mean the old AM containers are not cleaned up ? If AM cannot be launched, the AM will expire in 10 mins, in which case the containers should also be cleanedup.
        Hide
        hadoopqa Hadoop QA added a comment -



        -1 overall



        Vote Subsystem Runtime Comment
        0 pre-patch 16m 5s Pre-patch trunk compilation is healthy.
        +1 @author 0m 0s The patch does not contain any @author tags.
        -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
        +1 javac 7m 43s There were no new javac warning messages.
        +1 javadoc 9m 38s There were no new javadoc warning messages.
        +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings.
        +1 checkstyle 0m 55s There were no new checkstyle issues.
        +1 whitespace 0m 0s The patch has no lines that end in whitespace.
        +1 install 1m 23s mvn install still works.
        +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse.
        +1 findbugs 1m 25s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
        +1 yarn tests 52m 26s Tests passed in hadoop-yarn-server-resourcemanager.
            90m 35s  



        Subsystem Report/Notes
        Patch URL http://issues.apache.org/jira/secure/attachment/12747672/YARN-3987.002.patch
        Optional Tests javadoc javac unit findbugs checkstyle
        git revision trunk / 69b0957
        hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/8699/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8699/testReport/
        Java 1.7.0_55
        uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/8699/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 16m 5s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac 7m 43s There were no new javac warning messages. +1 javadoc 9m 38s There were no new javadoc warning messages. +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 55s There were no new checkstyle issues. +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 23s mvn install still works. +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse. +1 findbugs 1m 25s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 yarn tests 52m 26s Tests passed in hadoop-yarn-server-resourcemanager.     90m 35s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12747672/YARN-3987.002.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 69b0957 hadoop-yarn-server-resourcemanager test log https://builds.apache.org/job/PreCommit-YARN-Build/8699/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8699/testReport/ Java 1.7.0_55 uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/8699/console This message was automatically generated.
        Hide
        sandflee sandflee added a comment -

        Yes the old AM container in NM aren't cleaned up. in our case, AM crashed after it starts, RM will create a new appAttempt and launch a new AM and will not expire, it leaves the complete container in NM memory and NM stateStore. we set max-am-attempt to a very large num so the completed am container in NM bombs.
        For AM completed container, RM could send ack msg to NM, seems no need to wait for new AM to pull complete msg. and your idea? Jian He

        Show
        sandflee sandflee added a comment - Yes the old AM container in NM aren't cleaned up. in our case, AM crashed after it starts, RM will create a new appAttempt and launch a new AM and will not expire, it leaves the complete container in NM memory and NM stateStore. we set max-am-attempt to a very large num so the completed am container in NM bombs. For AM completed container, RM could send ack msg to NM, seems no need to wait for new AM to pull complete msg. and your idea? Jian He
        Hide
        sandflee sandflee added a comment -

        AM crashes before it register to RM

        Show
        sandflee sandflee added a comment - AM crashes before it register to RM
        Hide
        jianhe Jian He added a comment -

        Committed to trunk and branch-2, thanks sandflee !

        Show
        jianhe Jian He added a comment - Committed to trunk and branch-2, thanks sandflee !
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-trunk-Commit #8296 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8296/)
        YARN-3987. Send AM container completed msg to NM once AM finishes. Contributed by sandflee (jianhe: rev 0a030546e24c55662a603bb63c9029ad0ccf43fc)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
        • hadoop-yarn-project/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8296 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8296/ ) YARN-3987 . Send AM container completed msg to NM once AM finishes. Contributed by sandflee (jianhe: rev 0a030546e24c55662a603bb63c9029ad0ccf43fc) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java hadoop-yarn-project/CHANGES.txt
        Hide
        sandflee sandflee added a comment -

        Thanks Jian He!

        Show
        sandflee sandflee added a comment - Thanks Jian He !
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #287 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/287/)
        YARN-3987. Send AM container completed msg to NM once AM finishes. Contributed by sandflee (jianhe: rev 0a030546e24c55662a603bb63c9029ad0ccf43fc)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
        • hadoop-yarn-project/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #287 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/287/ ) YARN-3987 . Send AM container completed msg to NM once AM finishes. Contributed by sandflee (jianhe: rev 0a030546e24c55662a603bb63c9029ad0ccf43fc) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java hadoop-yarn-project/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk #1017 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1017/)
        YARN-3987. Send AM container completed msg to NM once AM finishes. Contributed by sandflee (jianhe: rev 0a030546e24c55662a603bb63c9029ad0ccf43fc)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
        • hadoop-yarn-project/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #1017 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1017/ ) YARN-3987 . Send AM container completed msg to NM once AM finishes. Contributed by sandflee (jianhe: rev 0a030546e24c55662a603bb63c9029ad0ccf43fc) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java hadoop-yarn-project/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #284 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/284/)
        YARN-3987. Send AM container completed msg to NM once AM finishes. Contributed by sandflee (jianhe: rev 0a030546e24c55662a603bb63c9029ad0ccf43fc)

        • hadoop-yarn-project/CHANGES.txt
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #284 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/284/ ) YARN-3987 . Send AM container completed msg to NM once AM finishes. Contributed by sandflee (jianhe: rev 0a030546e24c55662a603bb63c9029ad0ccf43fc) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Mapreduce-trunk #2233 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2233/)
        YARN-3987. Send AM container completed msg to NM once AM finishes. Contributed by sandflee (jianhe: rev 0a030546e24c55662a603bb63c9029ad0ccf43fc)

        • hadoop-yarn-project/CHANGES.txt
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2233 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2233/ ) YARN-3987 . Send AM container completed msg to NM once AM finishes. Contributed by sandflee (jianhe: rev 0a030546e24c55662a603bb63c9029ad0ccf43fc) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk #2214 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2214/)
        YARN-3987. Send AM container completed msg to NM once AM finishes. Contributed by sandflee (jianhe: rev 0a030546e24c55662a603bb63c9029ad0ccf43fc)

        • hadoop-yarn-project/CHANGES.txt
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2214 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2214/ ) YARN-3987 . Send AM container completed msg to NM once AM finishes. Contributed by sandflee (jianhe: rev 0a030546e24c55662a603bb63c9029ad0ccf43fc) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #276 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/276/)
        YARN-3987. Send AM container completed msg to NM once AM finishes. Contributed by sandflee (jianhe: rev 0a030546e24c55662a603bb63c9029ad0ccf43fc)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
        • hadoop-yarn-project/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #276 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/276/ ) YARN-3987 . Send AM container completed msg to NM once AM finishes. Contributed by sandflee (jianhe: rev 0a030546e24c55662a603bb63c9029ad0ccf43fc) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java hadoop-yarn-project/CHANGES.txt

          People

          • Assignee:
            sandflee sandflee
            Reporter:
            sandflee sandflee
          • Votes:
            1 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development