Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5139

[Umbrella] Move YARN scheduler towards global scheduler

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Existing YARN scheduler is based on node heartbeat. This can lead to sub-optimal decisions because scheduler can only look at one node at the time when scheduling resources.

      Pseudo code of existing scheduling logic looks like:

      for node in allNodes:
         Go to parentQueue
            Go to leafQueue
              for application in leafQueue.applications:
                 for resource-request in application.resource-requests
                    try to schedule on node
      

      Considering future complex resource placement requirements, such as node constraints (give me "a && b || c") or anti-affinity (do not allocate HBase regionsevers and Storm workers on the same host), we may need to consider moving YARN scheduler towards global scheduling.

        Attachments

        1. Explanantions of Global Scheduling (YARN-5139) Implementation.pdf
          283 kB
          Wangda Tan
        2. wip-1.YARN-5139.patch
          76 kB
          Wangda Tan
        3. wip-2.YARN-5139.patch
          95 kB
          Wangda Tan
        4. wip-3.YARN-5139.patch
          133 kB
          Wangda Tan
        5. wip-4.YARN-5139.patch
          447 kB
          Wangda Tan
        6. wip-5.YARN-5139.patch
          386 kB
          Wangda Tan
        7. YARN-5139.000.patch
          402 kB
          Wangda Tan
        8. YARN-5139-Concurrent-scheduling-performance-report.pdf
          96 kB
          Wangda Tan
        9. YARN-5139-Global-Schedulingd-esign-and-implementation-notes.pdf
          245 kB
          Wangda Tan
        10. YARN-5139-Global-Schedulingd-esign-and-implementation-notes-v2.pdf
          285 kB
          Wangda Tan

          Issue Links

          1.
          Add global scheduler interface definition and update CapacityScheduler to use it. Sub-task Resolved Wangda Tan  
          2.
          Update AppSchedulingInfo to use SchedulingPlacementSet Sub-task Resolved Wangda Tan  
          3.
          Introduce api independent PendingAsk to replace usage of ResourceRequest within Scheduler classes Sub-task Resolved Wangda Tan  
          4.
          Update javadocs of new added APIs / classes of scheduler/AppSchedulingInfo Sub-task Open Wangda Tan  
          5.
          Should consider utilization of each ResourceType on node while scheduling Sub-task Open Qi Zhu  
          6.
          Global scheduler applies to Fair scheduler Sub-task Open Zhaohui Xin  
          7.
          Rename PlacementSet and SchedulingPlacementSet Sub-task Resolved Wangda Tan  
          8.
          Additional changes to make SchedulingPlacementSet agnostic to ResourceRequest / placement algorithm Sub-task Resolved Wangda Tan  
          9.
          Delay scheduling should be an individual policy instead of part of scheduler implementation Sub-task Open Tao Yang  
          10.
          Add multi-node lookup mechanism and pluggable nodes sorting policies to optimize placement decision Sub-task Resolved Sunil G  
          11.
          Introduce scheduler specific environment variable support in ApplicationSubmissionContext for better scheduling placement configurations Sub-task Resolved Sunil G  
          12.
          Pending backlog for async allocation threads should be configurable Sub-task Resolved Tao Yang  
          13.
          The capacity scheduler logs too frequently seriously affecting performance Sub-task Open YunFan Zhou  
          14.
          Resource leak caused by a reserved container being released more than once under async scheduling Sub-task Resolved Tao Yang  
          15.
          Support dynamic policy updates in Capacity Scheduler Sub-task Open Qi Zhu  
          16.
          Exclude lagged/unhealthy/decommissioned nodes in async allocating thread Sub-task Resolved Qi Zhu

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h 20m
          17.
          Add multi-thread asynchronous scheduling to fair scheduler Sub-task Open Unassigned  
          18.
          Use threadPool to handle async scheduling threads Sub-task Open Aihua Xu  
          19.
          Skip schedule on not heartbeated nodes in Multi Node Placement Sub-task Resolved Prabhu Joseph  
          20.
          Proactively relocate allocated containers from a stopped node Sub-task Open Tanu Ajmera  
          21.
          Reserved Containers not allocated from available space of other nodes in CandidateNodeSet in MultiNodePlacement Sub-task Resolved Prabhu Joseph  
          22.
          Reserved Containers not allocated from available space of other nodes in CandidateNodeSet in MultiNodePlacement (YARN-10259) Sub-task Resolved Prabhu Joseph  
          23.
          Support Multi Node Placement in SingleConstraintAppPlacementAllocator Sub-task Resolved Prabhu Joseph  
          24.
          Import logic of multi-node allocation in CapacityScheduler Sub-task Resolved Qi Zhu

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h 10m
          25.
          Merge YARN-8557 and YARN-10352, and rebase based YARN-10380. Sub-task Resolved Qi Zhu  

            Activity

              People

              • Assignee:
                leftnoteasy Wangda Tan
                Reporter:
                leftnoteasy Wangda Tan
              • Votes:
                5 Vote for this issue
                Watchers:
                86 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3.5h
                  3.5h