Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4189

Capacity Scheduler : Improve location preference waiting mechanism

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: capacity scheduler
    • Labels:
      None

      Description

      There're some issues with current Capacity Scheduler implementation of delay scheduling:

      1) Waiting time to allocate each container highly depends on cluster availability
      Currently, app can only increase missed-opportunity when a node has available resource AND it gets traversed by a scheduler. There’re lots of possibilities that an app doesn’t get traversed by a scheduler, for example:

      A cluster has 2 racks (rack1/2), each rack has 40 nodes. Node-locality-delay=40. An application prefers rack1. Node-heartbeat-interval=1s.
      Assume there are 2 nodes available on rack1, delay to allocate one container = 40 sec.
      If there are 20 nodes available on rack1, delay of allocating one container = 2 sec.

      2) It could violate scheduling policies (Fifo/Priority/Fair)

      Assume a cluster is highly utilized, an app (app1) has higher priority, it wants locality. And there’s another app (app2) has lower priority, but it doesn’t care about locality. When node heartbeats with available resource, app1 decides to wait, so app2 gets the available slot. This should be considered as a bug that we need to fix.

      The same problem could happen when we use FIFO/Fair queue policies.

      Another problem similar to this is related to preemption: when preemption policy preempts some resources from queue-A for queue-B (queue-A is over-satisfied and queue-B is under-satisfied). But queue-B is waiting for the node-locality-delay so queue-A will get resources back. In next round, preemption policy could preempt this resources again from queue-A.

      This JIRA is target to solve these problems.

        Issue Links

          Activity

          Hide
          xinxianyin Xianyin Xin added a comment -

          Hi Wangda Tan, just go through the doc. Please correct me if i am wrong. Can a container that marked as ALLOCATING_WAITING be occupied by other requests? I'm afraid ALLOCATING_WAITING would decline the utilization of the cluster. In a cluster with many nodes and many jobs, it's uneasy to make the most of jobs satisfied with their allocations, especially in the app-oriented allocation mechanism(which maps the newly available resource to appropriate apps). Once a customer asks us "why we can get 100% locality in MR1 but only up to 60%~70% after making various of optimizations?". So we can guess there are quite a percents of resource requests which are not satisfied with their allocations in a cluster, thus there would be many containers experience the ALLOCATING_WAITING phase which makes many resource idle for a period of time.

          Show
          xinxianyin Xianyin Xin added a comment - Hi Wangda Tan , just go through the doc. Please correct me if i am wrong. Can a container that marked as ALLOCATING_WAITING be occupied by other requests? I'm afraid ALLOCATING_WAITING would decline the utilization of the cluster. In a cluster with many nodes and many jobs, it's uneasy to make the most of jobs satisfied with their allocations, especially in the app-oriented allocation mechanism(which maps the newly available resource to appropriate apps). Once a customer asks us "why we can get 100% locality in MR1 but only up to 60%~70% after making various of optimizations?". So we can guess there are quite a percents of resource requests which are not satisfied with their allocations in a cluster, thus there would be many containers experience the ALLOCATING_WAITING phase which makes many resource idle for a period of time.
          Hide
          leftnoteasy Wangda Tan added a comment -

          Xianyin Xin,

          Thanks for looking at the doc, however, I think the approach in the doc shouldn't decline the utilization:

          Assume we limit the maximum waiting time for each container is X sec, and average container execution time is Y sec. It will be fine If X << Y.

          In my mind, X is a value close to node heartbeat interval and Y is from minutes to hours.

          I don't have any data to prove if my thoughts is true, we need to do some benchmark tests before using it in practice.

          Show
          leftnoteasy Wangda Tan added a comment - Xianyin Xin , Thanks for looking at the doc, however, I think the approach in the doc shouldn't decline the utilization: Assume we limit the maximum waiting time for each container is X sec, and average container execution time is Y sec. It will be fine If X << Y. In my mind, X is a value close to node heartbeat interval and Y is from minutes to hours. I don't have any data to prove if my thoughts is true, we need to do some benchmark tests before using it in practice.
          Hide
          xinxianyin Xianyin Xin added a comment -

          Wangda Tan, convincing analysis. It's fine X << Y and X is close to the heartbeat interval, so, should we limit X to avoid users deploy it freely?

          Show
          xinxianyin Xianyin Xin added a comment - Wangda Tan , convincing analysis. It's fine X << Y and X is close to the heartbeat interval, so, should we limit X to avoid users deploy it freely?
          Hide
          leftnoteasy Wangda Tan added a comment -

          Xianyin Xin, I mentioned this in design doc:

          To avoid application set a very high delay (such as 10 min), we shall have a global max­container­delay to cap the delay to avoid resource wastage.

          Show
          leftnoteasy Wangda Tan added a comment - Xianyin Xin , I mentioned this in design doc: To avoid application set a very high delay (such as 10 min), we shall have a global max­ container ­delay to cap the delay to avoid resource wastage.

            People

            • Assignee:
              leftnoteasy Wangda Tan
              Reporter:
              leftnoteasy Wangda Tan
            • Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated:

                Development