Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3094

reset timer for liveness monitors after RM recovery

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      When RM restarts, it will recover RMAppAttempts and registry them to AMLivenessMonitor if they are not in final state. AM will time out in RM if the recover process takes long time due to some reasons(e.g. too many apps).

      In our system, we found the recover process took about 3 mins, and all AM time out.

      Attachments

        1. YARN-3094.2.patch
          6 kB
          Jun Gong
        2. YARN-3094.3.patch
          8 kB
          Jun Gong
        3. YARN-3094.4.patch
          8 kB
          Jun Gong
        4. YARN-3094.5.patch
          8 kB
          Jun Gong
        5. YARN-3094.patch
          3 kB
          Jun Gong

        Activity

          People

            hex108 Jun Gong
            hex108 Jun Gong
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: