Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None

      Description

      The current schedulers in Hadoop all examine a single job on every heartbeat when choosing which tasks to assign, choosing the job based on FIFO or fair sharing. There are inherent limitations to this approach. For example, if the job at the front of the queue is small (e.g. 10 maps, in a cluster of 100 nodes), then on average it will launch only one local map on the first 10 heartbeats while it is at the head of the queue. This leads to very poor locality for small jobs. Instead, we need a more "global" view of scheduling that can look at multiple jobs. To resolve the locality problem, we will use the following algorithm:

      • If the job at the head of the queue has no node-local task to launch, skip it and look through other jobs.
      • If a job has waited at least T1 seconds while being skipped, also allow it to launch rack-local tasks.
      • If a job has waited at least T2 > T1 seconds, also allow it to launch off-rack tasks.
        This algorithm improves locality while bounding the delay that any job experiences in launching a task.

      It turns out that whether waiting is useful depends on how many tasks are left in the job - the probability of getting a heartbeat from a node with a local task - and on whether the job is CPU or IO bound. Thus there may be logic for removing the wait on the last few tasks in the job.

      As a related issue, once we allow global scheduling, we can launch multiple tasks per heartbeat, as in HADOOP-3136. The initial implementation of HADOOP-3136 adversely affected performance because it only launched multiple tasks from the same job, but with the wait rule above, we will only do this for jobs that are allowed to launch non-local tasks.

      1. mapreduce-548-v4.patch
        59 kB
        Matei Zaharia
      2. mapreduce-548-v3.patch
        59 kB
        Matei Zaharia
      3. mapreduce-548-v2.patch
        59 kB
        Matei Zaharia
      4. mapreduce-548-v1.patch
        58 kB
        Matei Zaharia
      5. mapreduce-548.patch
        56 kB
        Matei Zaharia
      6. hadoop-4667-v2.patch
        55 kB
        Matei Zaharia
      7. hadoop-4667-v1b.patch
        56 kB
        Matei Zaharia
      8. hadoop-4667-v1.patch
        56 kB
        Matei Zaharia
      9. HADOOP-4667_api.patch
        5 kB
        Arun C Murthy
      10. fs-global-v0.patch
        67 kB
        Matei Zaharia

        Issue Links

          Activity

          Matei Zaharia created issue -
          Matei Zaharia made changes -
          Field Original Value New Value
          Attachment fs-global-v0.patch [ 12397383 ]
          Matei Zaharia made changes -
          Description The current schedulers in Hadoop all examine a single job on every heartbeat when choosing which tasks to assign, choosing the job based on FIFO or fair sharing. There are inherent limitations to this approach. For example, if the job at the front of the queue is small (e.g. 10 maps, in a cluster of 100 nodes), then on average it will launch only one local map on the first 10 heartbeats while it is at the head of the queue. This leads to very poor locality for small jobs. Instead, we need a more "global" view of scheduling that can look at multiple jobs. To resolve the locality problem, we will use the following algorithm:
          - If the job at the head of the queue has no local task to launch, skip it and look through other jobs.
          - If a job has been skipped for at least T seconds while waiting for a local task, stop skipping it and allow it to launch non-local tasks.
          - If no job can launch a task at all, return to the head of the queue and launch a non-local task from the first job.
          This algorithm improves locality while bounding the delay that any job experiences in launching a task.

          We will actually provide two values of T - one for data-local tasks and a longer wait for rack-local tasks. It also turns out that whether waiting is useful depends on how many tasks are left in the job - the probability of getting a heartbeat from a node with a local task. Thus there may be logic for removing the wait on the last few tasks in the job.

          As a related issue, once we allow global scheduling, we can launch multiple tasks per heartbeat, as in HADOOP-3136. The initial implementation of HADOOP-3136 adversely affected performance because it only launched multiple tasks from the same job, but with the wait rule above, we will only do this for jobs that are allowed to launch non-local tasks.
          The current schedulers in Hadoop all examine a single job on every heartbeat when choosing which tasks to assign, choosing the job based on FIFO or fair sharing. There are inherent limitations to this approach. For example, if the job at the front of the queue is small (e.g. 10 maps, in a cluster of 100 nodes), then on average it will launch only one local map on the first 10 heartbeats while it is at the head of the queue. This leads to very poor locality for small jobs. Instead, we need a more "global" view of scheduling that can look at multiple jobs. To resolve the locality problem, we will use the following algorithm:
          - If the job at the head of the queue has no node-local task to launch, skip it and look through other jobs.
          - If a job has waited at least T1 seconds while being skipped, also allow it to launch rack-local tasks.
          - If a job has waited at least T2 > T1 seconds, also allow it to launch off-rack tasks.
          This algorithm improves locality while bounding the delay that any job experiences in launching a task.

          It turns out that whether waiting is useful depends on how many tasks are left in the job - the probability of getting a heartbeat from a node with a local task - and on whether the job is CPU or IO bound. Thus there may be logic for removing the wait on the last few tasks in the job.

          As a related issue, once we allow global scheduling, we can launch multiple tasks per heartbeat, as in HADOOP-3136. The initial implementation of HADOOP-3136 adversely affected performance because it only launched multiple tasks from the same job, but with the wait rule above, we will only do this for jobs that are allowed to launch non-local tasks.
          Arun C Murthy made changes -
          Attachment HADOOP-4667_api.patch [ 12398375 ]
          Matei Zaharia made changes -
          Attachment hadoop-4667-v1.patch [ 12399600 ]
          Matei Zaharia made changes -
          Attachment hadoop-4667-v1b.patch [ 12399790 ]
          Matei Zaharia made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Matei Zaharia made changes -
          Attachment hadoop-4667-v2.patch [ 12400652 ]
          Owen O'Malley made changes -
          Link This issue is blocked by HADOOP-5089 [ HADOOP-5089 ]
          Owen O'Malley made changes -
          Link This issue is blocked by HADOOP-5089 [ HADOOP-5089 ]
          Owen O'Malley made changes -
          Link This issue is related to HADOOP-5089 [ HADOOP-5089 ]
          Chris Douglas made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Owen O'Malley made changes -
          Project Hadoop Common [ 12310240 ] Hadoop Map/Reduce [ 12310941 ]
          Key HADOOP-4667 MAPREDUCE-548
          Component/s contrib/fair-share [ 12312456 ]
          Matei Zaharia made changes -
          Attachment mapreduce-548.patch [ 12412446 ]
          Matei Zaharia made changes -
          Assignee Matei Zaharia [ matei ]
          Matei Zaharia made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Matei Zaharia made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Matei Zaharia made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Fix Version/s 0.21.0 [ 12314045 ]
          Matei Zaharia made changes -
          Attachment mapreduce-548-v1.patch [ 12412509 ]
          Matei Zaharia made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Matei Zaharia made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Matei Zaharia made changes -
          Attachment mapreduce-548-v2.patch [ 12412935 ]
          Matei Zaharia made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Matei Zaharia made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Arun C Murthy made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Matei Zaharia made changes -
          Attachment mapreduce-548-v3.patch [ 12413886 ]
          Matei Zaharia made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Matei Zaharia made changes -
          Attachment mapreduce-548-v4.patch [ 12414243 ]
          Matei Zaharia made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Matei Zaharia made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Matei Zaharia made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Duplicate [ 3 ]
          Tom White made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Matei Zaharia
              Reporter:
              Matei Zaharia
            • Votes:
              0 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development