Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-556 [Umbrella] RM Restart phase 2 - Work preserving restart
  3. YARN-2456

Possible livelock in CapacityScheduler when RM is recovering apps

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.6.0
    • Component/s: resourcemanager
    • Labels:
      None

      Description

      Consider this scenario:
      1. RM is configured with a single queue and only one application can be active at a time.
      2. Submit App1 which uses up the queue's whole capacity
      3. Submit App2 which remains pending.
      4. Restart RM.
      5. App2 is recovered before App1, so App2 is added to the activeApplications list. Now App1 remains pending (because of max-active-app limit)
      6. All containers of App1 are now recovered when NM registers, and use up the whole queue capacity again.
      7. Since the queue is full, App2 cannot proceed to allocate AM container.
      8. In the meanwhile, App1 cannot proceed to become active because of the max-active-app limit

        Attachments

        1. YARN-2456.2.patch
          4 kB
          Jian He
        2. YARN-2456.1.patch
          5 kB
          Jian He

          Activity

            People

            • Assignee:
              jianhe Jian He
              Reporter:
              jianhe Jian He
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: