Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-9237

NM should ignore sending finished apps to RM during RM fail-over

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.0.4, 3.3.0, 3.2.1, 3.1.3
    • Component/s: yarn
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      I found a lot of following log in active RM log file after doing failover RM

      2019-01-24 15:43:58,999 WARN org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Cannot get RMApp by appId=application_1542178952162_34746156, just added it to finishedApplications list for cleanup
      .....
      

      I looked forward RM logs and find this app had finished before hours

      2019-01-23 21:49:55,683 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1542178952162_34746156_000001 State change from FINAL_SAVING to FINISHING
      

      The reason of RM prints " Cannot get RMApp by appId" is as follows:
      1. RM failover
      2. NM reports all running apps to RM in register request
      3. The running apps are from NMContext, some apps may already finished
      4. In my cluster, yarn.log-aggregation-enable=false, yarn.nodemanager.log.retain-seconds=86400(1day), so app is kept in NMContext before app has finished for 24 hours
      5. My Yarn cluster runs 50k apps per day and 7k nodes, and NM will report many finished apps to RM.

        Attachments

        1. YARN-9237.001.patch
          1 kB
          Jiandan Yang
        2. YARN-9237.002.patch
          1 kB
          Jiandan Yang

          Activity

            People

            • Assignee:
              yangjiandan Jiandan Yang
              Reporter:
              yangjiandan Jiandan Yang
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: