Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-9237

NM should ignore sending finished apps to RM during RM fail-over

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.0.4, 3.3.0, 3.2.1, 3.1.3
    • yarn
    • None
    • Reviewed

    Description

      I found a lot of following log in active RM log file after doing failover RM

      2019-01-24 15:43:58,999 WARN org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Cannot get RMApp by appId=application_1542178952162_34746156, just added it to finishedApplications list for cleanup
      .....
      

      I looked forward RM logs and find this app had finished before hours

      2019-01-23 21:49:55,683 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1542178952162_34746156_000001 State change from FINAL_SAVING to FINISHING
      

      The reason of RM prints " Cannot get RMApp by appId" is as follows:
      1. RM failover
      2. NM reports all running apps to RM in register request
      3. The running apps are from NMContext, some apps may already finished
      4. In my cluster, yarn.log-aggregation-enable=false, yarn.nodemanager.log.retain-seconds=86400(1day), so app is kept in NMContext before app has finished for 24 hours
      5. My Yarn cluster runs 50k apps per day and 7k nodes, and NM will report many finished apps to RM.

      Attachments

        1. YARN-9237.002.patch
          1 kB
          Jiandan Yang
        2. YARN-9237.001.patch
          1 kB
          Jiandan Yang

        Issue Links

          Activity

            People

              yangjiandan Jiandan Yang
              yangjiandan Jiandan Yang
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: