Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-45

[Preemption] Scheduler feedback to AM to release containers

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 2.1.0-beta
    • resourcemanager
    • None
    • Reviewed

    Description

      The ResourceManager strikes a balance between cluster utilization and strict enforcement of resource invariants in the cluster. Individual allocations of containers must be reclaimed- or reserved- to restore the global invariants when cluster load shifts. In some cases, the ApplicationMaster can respond to fluctuations in resource availability without losing the work already completed by that task (MAPREDUCE-4584). Supplying it with this information would be helpful for overall cluster utilization [1]. To this end, we want to establish a protocol for the RM to ask the AM to release containers.

      [1] http://research.yahoo.com/files/yl-2012-003.pdf

      Attachments

        1. YARN-45.patch
          15 kB
          Carlo Curino
        2. YARN-45.patch
          18 kB
          Carlo Curino
        3. YARN-45.patch
          14 kB
          Carlo Curino
        4. YARN-45.patch
          13 kB
          Carlo Curino
        5. YARN-45.patch
          16 kB
          Carlo Curino
        6. YARN-45.patch
          46 kB
          Carlo Curino
        7. YARN-45.1.patch
          48 kB
          Carlo Curino
        8. YARN-45_design_thoughts.pdf
          24 kB
          Carlo Curino

        Issue Links

        1.
        RM changes to support preemption for FairScheduler and CapacityScheduler Sub-task Closed Carlo Curino   Actions
        2.
        CapacityScheduler: support for preemption (using a capacity monitor) Sub-task Closed Carlo Curino   Actions
        3.
        FairScheduler: support for work-preserving preemption Sub-task Closed Carlo Curino   Actions
        4.
        Expose preemption warnings in AMRMClient Sub-task Open Sandy Ryza   Actions
        5.
        User guide for preemption Sub-task Resolved Unassigned   Actions
        6.
        Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins Sub-task Closed Sunil G   Actions
        7.
        Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy Sub-task Closed Sunil G   Actions
        8.
        observeOnly should be checked before any preemption computation started inside containerBasedPreemptOrKill() of ProportionalCapacityPreemptionPolicy.java Sub-task Resolved Unassigned

        0%

        Original Estimate - 1m
        Remaining Estimate - 1m
        Actions
        9.
        ProportionalCapacitPreemptionPolicy handling of corner cases... Sub-task Closed Carlo Curino   Actions
        10.
        Preemption message shouldn’t be created multiple times for same container-id in ProportionalCapacityPreemptionPolicy Sub-task Resolved Wangda Tan   Actions
        11.
        ClassCastException is thrown during preemption When a huge job is submitted to a queue B whose resources is used by a job in queueA Sub-task Closed Christopher Douglas   Actions
        12.
        ContainerExistStatus should define a status for preempted containers Sub-task Closed Alejandro Abdelnur   Actions
        13.
        Add preemption to CS Sub-task Resolved Arun Murthy   Actions
        14.
        Inconsistent picture of how a container was killed when querying RM and NM in case of preemption Sub-task Open Unassigned   Actions
        15.
        Preemption: Jobs are failing due to AMs are getting launched and killed multiple times Sub-task Resolved Unassigned   Actions
        16.
        Disable preemption at Queue level Sub-task Closed Eric Payne   Actions
        17.
        CS queue level preemption should respect user-limits Sub-task Open Mayank Bansal   Actions
        18.
        Preemption of AM containers shouldn't count towards AM failures Sub-task Closed Jian He   Actions
        19.
        Preemption can prevent progress in small queues Sub-task Open Wangda Tan   Actions
        20.
        Capacity scheduler preemption policy should respect yarn.scheduler.minimum-allocation-mb when computing resource of queues Sub-task Open Unassigned   Actions
        21.
        Indicate preemption timout along with the list of containers to AM (preemption message) Sub-task Open Sunil G   Actions
        22.
        Introducing CANCEL_PREEMPTION event to notify Scheduler and AM when a container is no longer to be preempted Sub-task Open Sunil G   Actions
        23.
        CapacityScheduler: Improve preemption to only kill containers that would satisfy the incoming request Sub-task Resolved Wangda Tan   Actions
        24.
        Refactor existing Preemption Policy of CS for easier adding new approach to select preemption candidates Sub-task Resolved Wangda Tan   Actions
        25.
        Do surgical preemption based on reserved container in CapacityScheduler Sub-task Resolved Wangda Tan   Actions

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            curino Carlo Curino
            cdouglas Christopher Douglas
            Votes:
            0 Vote for this issue
            Watchers:
            30 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 1m
                1m
                Remaining:
                Remaining Estimate - 1m
                1m
                Logged:
                Time Spent - Not Specified
                Not Specified

                Slack

                  Issue deployment