Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3905

Application History Server UI NPEs when accessing apps run after RM restart

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.7.0, 2.8.0, 2.7.1
    • Fix Version/s: 2.8.0, 2.7.2, 3.0.0-alpha1
    • Component/s: timelineserver
    • Labels:
      None
    • Target Version/s:

      Description

      From the Application History URL (http://RmHostName:8188/applicationhistory), clicking on the application ID of an app that was run after the RM daemon has been restarted results in a 500 error:

      Sorry, got error 500
      Please consult RFC 2616 for meanings of the error code.
      

      The stack trace is as follows:

      2015-07-09 20:13:15,584 [2068024519@qtp-769046918-3] INFO applicationhistoryservice.FileSystemApplicationHistoryStore: Completed reading history information of all application attempts of application application_1436472584878_0001
      2015-07-09 20:13:15,591 [2068024519@qtp-769046918-3] ERROR webapp.AppBlock: Failed to read the AM container of the application attempt appattempt_1436472584878_0001_000001.
      java.lang.NullPointerException
              at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToContainerReport(ApplicationHistoryManagerImpl.java:206)
              at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainer(ApplicationHistoryManagerImpl.java:199)
              at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryClientService.getContainerReport(ApplicationHistoryClientService.java:205)
              at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:272)
              at org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:267)
              at java.security.AccessController.doPrivileged(Native Method)
              at javax.security.auth.Subject.doAs(Subject.java:415)
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
              at org.apache.hadoop.yarn.server.webapp.AppBlock.generateApplicationTable(AppBlock.java:266)
      ...
      
      1. YARN-3905.001.patch
        1 kB
        Eric Payne
      2. YARN-3905.002.patch
        2 kB
        Eric Payne

        Activity

        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2206 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2206/)
        YARN-3905. Application History Server UI NPEs when accessing apps run after RM restart (Eric Payne via jeagles) (jeagles: rev 7faae0e6fe027a3886d9f4e290b6a488a2c55b3a)

        • hadoop-yarn-project/CHANGES.txt
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2206 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2206/ ) YARN-3905 . Application History Server UI NPEs when accessing apps run after RM restart (Eric Payne via jeagles) (jeagles: rev 7faae0e6fe027a3886d9f4e290b6a488a2c55b3a) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #257 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/257/)
        YARN-3905. Application History Server UI NPEs when accessing apps run after RM restart (Eric Payne via jeagles) (jeagles: rev 7faae0e6fe027a3886d9f4e290b6a488a2c55b3a)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java
        • hadoop-yarn-project/CHANGES.txt
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #257 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/257/ ) YARN-3905 . Application History Server UI NPEs when accessing apps run after RM restart (Eric Payne via jeagles) (jeagles: rev 7faae0e6fe027a3886d9f4e290b6a488a2c55b3a) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java hadoop-yarn-project/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #249 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/249/)
        YARN-3905. Application History Server UI NPEs when accessing apps run after RM restart (Eric Payne via jeagles) (jeagles: rev 7faae0e6fe027a3886d9f4e290b6a488a2c55b3a)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java
        • hadoop-yarn-project/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #249 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/249/ ) YARN-3905 . Application History Server UI NPEs when accessing apps run after RM restart (Eric Payne via jeagles) (jeagles: rev 7faae0e6fe027a3886d9f4e290b6a488a2c55b3a) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java hadoop-yarn-project/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk #2187 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2187/)
        YARN-3905. Application History Server UI NPEs when accessing apps run after RM restart (Eric Payne via jeagles) (jeagles: rev 7faae0e6fe027a3886d9f4e290b6a488a2c55b3a)

        • hadoop-yarn-project/CHANGES.txt
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2187 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2187/ ) YARN-3905 . Application History Server UI NPEs when accessing apps run after RM restart (Eric Payne via jeagles) (jeagles: rev 7faae0e6fe027a3886d9f4e290b6a488a2c55b3a) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk #990 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/990/)
        YARN-3905. Application History Server UI NPEs when accessing apps run after RM restart (Eric Payne via jeagles) (jeagles: rev 7faae0e6fe027a3886d9f4e290b6a488a2c55b3a)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java
        • hadoop-yarn-project/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #990 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/990/ ) YARN-3905 . Application History Server UI NPEs when accessing apps run after RM restart (Eric Payne via jeagles) (jeagles: rev 7faae0e6fe027a3886d9f4e290b6a488a2c55b3a) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java hadoop-yarn-project/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #260 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/260/)
        YARN-3905. Application History Server UI NPEs when accessing apps run after RM restart (Eric Payne via jeagles) (jeagles: rev 7faae0e6fe027a3886d9f4e290b6a488a2c55b3a)

        • hadoop-yarn-project/CHANGES.txt
        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #260 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/260/ ) YARN-3905 . Application History Server UI NPEs when accessing apps run after RM restart (Eric Payne via jeagles) (jeagles: rev 7faae0e6fe027a3886d9f4e290b6a488a2c55b3a) hadoop-yarn-project/CHANGES.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-trunk-Commit #8180 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8180/)
        YARN-3905. Application History Server UI NPEs when accessing apps run after RM restart (Eric Payne via jeagles) (jeagles: rev 7faae0e6fe027a3886d9f4e290b6a488a2c55b3a)

        • hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java
        • hadoop-yarn-project/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8180 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8180/ ) YARN-3905 . Application History Server UI NPEs when accessing apps run after RM restart (Eric Payne via jeagles) (jeagles: rev 7faae0e6fe027a3886d9f4e290b6a488a2c55b3a) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java hadoop-yarn-project/CHANGES.txt
        Hide
        jeagles Jonathan Eagles added a comment -

        +1. Committing this patch Eric Payne.

        Show
        jeagles Jonathan Eagles added a comment - +1. Committing this patch Eric Payne .
        Hide
        hadoopqa Hadoop QA added a comment -



        -1 overall



        Vote Subsystem Runtime Comment
        -1 pre-patch 17m 16s Pre-patch trunk has 6 extant Findbugs (version 3.0.0) warnings.
        +1 @author 0m 0s The patch does not contain any @author tags.
        -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
        +1 javac 8m 31s There were no new javac warning messages.
        +1 javadoc 10m 23s There were no new javadoc warning messages.
        +1 release audit 0m 21s The applied patch does not increase the total number of release audit warnings.
        +1 checkstyle 0m 35s There were no new checkstyle issues.
        +1 whitespace 0m 0s The patch has no lines that end in whitespace.
        +1 install 1m 28s mvn install still works.
        +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse.
        +1 findbugs 1m 7s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
        +1 yarn tests 0m 25s Tests passed in hadoop-yarn-server-common.
            40m 44s  



        Subsystem Report/Notes
        Patch URL http://issues.apache.org/jira/secure/attachment/12745819/YARN-3905.002.patch
        Optional Tests javadoc javac unit findbugs checkstyle
        git revision trunk / 9b272cc
        Pre-patch Findbugs warnings https://builds.apache.org/job/PreCommit-YARN-Build/8572/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html
        hadoop-yarn-server-common test log https://builds.apache.org/job/PreCommit-YARN-Build/8572/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8572/testReport/
        Java 1.7.0_55
        uname Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/8572/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 pre-patch 17m 16s Pre-patch trunk has 6 extant Findbugs (version 3.0.0) warnings. +1 @author 0m 0s The patch does not contain any @author tags. -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac 8m 31s There were no new javac warning messages. +1 javadoc 10m 23s There were no new javadoc warning messages. +1 release audit 0m 21s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 35s There were no new checkstyle issues. +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 28s mvn install still works. +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse. +1 findbugs 1m 7s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 yarn tests 0m 25s Tests passed in hadoop-yarn-server-common.     40m 44s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12745819/YARN-3905.002.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 9b272cc Pre-patch Findbugs warnings https://builds.apache.org/job/PreCommit-YARN-Build/8572/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html hadoop-yarn-server-common test log https://builds.apache.org/job/PreCommit-YARN-Build/8572/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8572/testReport/ Java 1.7.0_55 uname Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/8572/console This message was automatically generated.
        Hide
        eepayne Eric Payne added a comment -

        Fixing checkstyle bug. I forgot to remove the now-unused ContainerID import.

        Show
        eepayne Eric Payne added a comment - Fixing checkstyle bug. I forgot to remove the now-unused ContainerID import.
        Hide
        hadoopqa Hadoop QA added a comment -



        -1 overall



        Vote Subsystem Runtime Comment
        -1 pre-patch 17m 14s Pre-patch trunk has 6 extant Findbugs (version 3.0.0) warnings.
        +1 @author 0m 0s The patch does not contain any @author tags.
        -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
        +1 javac 8m 29s There were no new javac warning messages.
        +1 javadoc 10m 23s There were no new javadoc warning messages.
        +1 release audit 0m 21s The applied patch does not increase the total number of release audit warnings.
        -1 checkstyle 0m 37s The applied patch generated 1 new checkstyle issues (total was 39, now 40).
        +1 whitespace 0m 0s The patch has no lines that end in whitespace.
        +1 install 1m 23s mvn install still works.
        +1 eclipse:eclipse 0m 35s The patch built with eclipse:eclipse.
        +1 findbugs 1m 9s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
        +1 yarn tests 0m 25s Tests passed in hadoop-yarn-server-common.
            40m 39s  



        Subsystem Report/Notes
        Patch URL http://issues.apache.org/jira/secure/attachment/12745708/YARN-3905.001.patch
        Optional Tests javadoc javac unit findbugs checkstyle
        git revision trunk / 0bda84f
        Pre-patch Findbugs warnings https://builds.apache.org/job/PreCommit-YARN-Build/8562/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html
        checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/8562/artifact/patchprocess/diffcheckstylehadoop-yarn-server-common.txt
        hadoop-yarn-server-common test log https://builds.apache.org/job/PreCommit-YARN-Build/8562/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8562/testReport/
        Java 1.7.0_55
        uname Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/8562/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 pre-patch 17m 14s Pre-patch trunk has 6 extant Findbugs (version 3.0.0) warnings. +1 @author 0m 0s The patch does not contain any @author tags. -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac 8m 29s There were no new javac warning messages. +1 javadoc 10m 23s There were no new javadoc warning messages. +1 release audit 0m 21s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 0m 37s The applied patch generated 1 new checkstyle issues (total was 39, now 40). +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 23s mvn install still works. +1 eclipse:eclipse 0m 35s The patch built with eclipse:eclipse. +1 findbugs 1m 9s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 yarn tests 0m 25s Tests passed in hadoop-yarn-server-common.     40m 39s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12745708/YARN-3905.001.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 0bda84f Pre-patch Findbugs warnings https://builds.apache.org/job/PreCommit-YARN-Build/8562/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/8562/artifact/patchprocess/diffcheckstylehadoop-yarn-server-common.txt hadoop-yarn-server-common test log https://builds.apache.org/job/PreCommit-YARN-Build/8562/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/8562/testReport/ Java 1.7.0_55 uname Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-YARN-Build/8562/console This message was automatically generated.
        Hide
        jeagles Jonathan Eagles added a comment -

        +1. Eric Payne, retargetting for 2.7.2 since 2.7.1 is already released.

        Show
        jeagles Jonathan Eagles added a comment - +1. Eric Payne , retargetting for 2.7.2 since 2.7.1 is already released.
        Hide
        eepayne Eric Payne added a comment -

        Unit testing is a little challenging, so I have not added those. However, I have tested successfully on a one-node cluster installation and on a 10-node secured cluster.

        Jonathan Eagles, would you like to take a look?

        Show
        eepayne Eric Payne added a comment - Unit testing is a little challenging, so I have not added those. However, I have tested successfully on a one-node cluster installation and on a 10-node secured cluster. Jonathan Eagles , would you like to take a look?
        Hide
        eepayne Eric Payne added a comment -

        org.apache.hadoop.yarn.server.webapp.AppBlock.generateApplicationTable constructs what it believes should be the AM container ID when creating a new GetContainerReportRequest.

                // AM container is always the first container of the attempt
                final GetContainerReportRequest request =
                    GetContainerReportRequest.newInstance(ContainerId.newContainerId(
                      appAttemptReport.getApplicationAttemptId(), 1));
        
        • After the RM is restarted, container IDs contain an e## string, which the above code doesn't take into consideration
        • The AM container is not always _000001 due to the way reservations work. We have seen "non-first" AM containers in practice.

        As a result of the above code, the container ID in the GetContainerReportRequest may not match the actual AM container ID before RM restart, and will not match those for jobs run after the RM is restarted.

        So, When ApplicationHistoryManagerImpl compares the ID of the passed container with it's cache from the history store, it can't find a match and throws the NPE.

        In AppBlock#generateApplicationTable, instead of constructing the AM's container ID, I suggest using appAttemptReport#getAMContainerId instead:

                final GetContainerReportRequest request =
                    GetContainerReportRequest.newInstance(
                            appAttemptReport.getAMContainerId());
        
        Show
        eepayne Eric Payne added a comment - org.apache.hadoop.yarn.server.webapp.AppBlock.generateApplicationTable constructs what it believes should be the AM container ID when creating a new GetContainerReportRequest . // AM container is always the first container of the attempt final GetContainerReportRequest request = GetContainerReportRequest.newInstance(ContainerId.newContainerId( appAttemptReport.getApplicationAttemptId(), 1)); After the RM is restarted, container IDs contain an e## string, which the above code doesn't take into consideration The AM container is not always _000001 due to the way reservations work. We have seen "non-first" AM containers in practice. As a result of the above code, the container ID in the GetContainerReportRequest may not match the actual AM container ID before RM restart, and will not match those for jobs run after the RM is restarted. So, When ApplicationHistoryManagerImpl compares the ID of the passed container with it's cache from the history store, it can't find a match and throws the NPE. In AppBlock#generateApplicationTable , instead of constructing the AM's container ID, I suggest using appAttemptReport#getAMContainerId instead: final GetContainerReportRequest request = GetContainerReportRequest.newInstance( appAttemptReport.getAMContainerId());

          People

          • Assignee:
            eepayne Eric Payne
            Reporter:
            eepayne Eric Payne
          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development