Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-7065

[RM UI] App status not getting updated in "All application" page

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • None
    • None
    • None

    Description

      Scenario:
      1) Run Spark Long Running application
      2) Do RM and NN failover randomly
      3) Validate App state in Yarn

      The Spark applications are finished. Yarn-cli returns correct status of yarn application.

      [hrt_qa@xxx hadoopqe]$ yarn application -status application_1503203977699_0014
      17/08/21 16:56:10 INFO client.AHSProxy: Connecting to Application History server at host1 xxx.xx.xx.x:10200
      17/08/21 16:56:10 INFO client.RequestHedgingRMFailoverProxyProvider: Looking for the active RM in [rm1, rm2]...
      17/08/21 16:56:10 INFO client.RequestHedgingRMFailoverProxyProvider: Found active RM [rm1]
      Application Report : 
      	Application-Id : application_1503203977699_0014
      	Application-Name : org.apache.spark.sql.execution.datasources.hbase.examples.LRJobForDataSources
      	Application-Type : SPARK
      	User : hrt_qa
      	Queue : default
      	Application Priority : null
      	Start-Time : 1503215983532
      	Finish-Time : 1503250203806
      	Progress : 0%
      	State : FAILED
      	Final-State : FAILED
      	Tracking-URL : https://host1:8090/cluster/app/application_1503203977699_0014
      	RPC Port : -1
      	AM Host : N/A
      	Aggregate Resource Allocation : 174722793 MB-seconds, 170603 vcore-seconds
      	Log Aggregation Status : SUCCEEDED
      	Diagnostics : Application application_1503203977699_0014 failed 20 times due to AM Container for appattempt_1503203977699_0014_000020 exited with  exitCode: 1
      For more detailed output, check the application tracking page: https://host1:8090/cluster/app/application_1503203977699_0014 Then click on links to logs of each attempt.
      Diagnostics: Exception from container-launch.
      Container id: container_e04_1503203977699_0014_20_000001
      Exit code: 1
      Stack trace: org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: Launch container failed
      	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:109)
      	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:89)
      	at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:392)
      	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
      	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      
      Shell output: main : command provided 1
      main : run as user is hrt_qa
      main : requested yarn user is hrt_qa
      Getting exit code file...
      Creating script paths...
      Writing pid file...
      Writing to tmp file /grid/0/hadoop/yarn/local/nmPrivate/application_1503203977699_0014/container_e04_1503203977699_0014_20_000001/container_e04_1503203977699_0014_20_000001.pid.tmp
      Writing to cgroup task files...
      Creating local dirs...
      Launching container...
      Getting exit code file...
      Creating script paths...
      
      
      Container exited with a non-zero exit code 1
      Failing this attempt. Failing the application.
      	Unmanaged Application : false
      	Application Node Label Expression : <Not set>
      	AM container Node Label Expression : <DEFAULT_PARTITION>

      However, RM UI "All application" page still shows the application in "RUNNING" State.
      https://host1:8090/cluster
      On clicking application_id ( https://host1:8090/cluster/app/application_1503203977699_0014) , it redirects to application page and there it shows correct application state = Failed.

      The App status is not getting updated on Yarn All Application page.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              yeshavora Yesha Vora
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: