Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3999

RM hangs on draining events

    XMLWordPrintableJSON

Details

    Description

      If external systems like ATS, or ZK becomes very slow, draining all the events take a lot of time. If this time becomes larger than 10 mins, all applications will expire. Fixes include:
      1. add a timeout and stop the dispatcher even if not all events are drained.
      2. Move ATS service out from RM active service so that RM doesn't need to wait for ATS to flush the events when transitioning to standby.
      3. Stop client-facing services (ClientRMService etc.) first so that clients get fast notification that RM is stopping/transitioning.

      Attachments

        1. YARN-3999-branch-2.6.1.txt
          26 kB
          Vinod Kumar Vavilapalli
        2. YARN-3999-branch-2.7.patch
          26 kB
          Jian He
        3. YARN-3999.5.patch
          29 kB
          Jian He
        4. YARN-3999.4.patch
          28 kB
          Jian He
        5. YARN-3999.3.patch
          26 kB
          Jian He
        6. YARN-3999.2.patch
          25 kB
          Jian He
        7. YARN-3999.2.patch
          23 kB
          Jian He
        8. YARN-3999.1.patch
          23 kB
          Jian He
        9. YARN-3999.patch
          7 kB
          Jian He
        10. YARN-3999.patch
          7 kB
          Jian He

        Issue Links

          Activity

            People

              jianhe Jian He
              jianhe Jian He
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: