Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3091 [Umbrella] Improve and fix locks of RM scheduler
  3. YARN-3136

getTransferredContainers can be a bottleneck during AM registration

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      While examining RM stack traces on a busy cluster I noticed a pattern of AMs stuck waiting for the scheduler lock trying to call getTransferredContainers. The scheduler lock is highly contended, especially on a large cluster with many nodes heartbeating, and it would be nice if we could find a way to eliminate the need to grab this lock during this call. We've already done similar work during AM allocate calls to make sure they don't needlessly grab the scheduler lock, and it would be good to do so here as well, if possible.

      Attachments

        1. 00010-YARN-3136.patch
          7 kB
          Sunil G
        2. 00011-YARN-3136.patch
          7 kB
          Sunil G
        3. 00012-YARN-3136.patch
          7 kB
          Sunil G
        4. 00013-YARN-3136.patch
          7 kB
          Jian He
        5. 0001-YARN-3136.patch
          5 kB
          Sunil G
        6. 0002-YARN-3136.patch
          6 kB
          Sunil G
        7. 0003-YARN-3136.patch
          7 kB
          Sunil G
        8. 0004-YARN-3136.patch
          7 kB
          Sunil G
        9. 0005-YARN-3136.patch
          12 kB
          Sunil G
        10. 0006-YARN-3136.patch
          7 kB
          Sunil G
        11. 0007-YARN-3136.patch
          8 kB
          Sunil G
        12. 0008-YARN-3136.patch
          9 kB
          Sunil G
        13. 0009-YARN-3136.patch
          7 kB
          Sunil G
        14. YARN-3136.branch-2.7.patch
          8 kB
          Wangda Tan

        Activity

          People

            sunilg Sunil G
            jlowe Jason Darrell Lowe
            Votes:
            0 Vote for this issue
            Watchers:
            15 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: