Hadoop Common
  1. Hadoop Common
  2. HADOOP-142

failed tasks should be rescheduled on different hosts after other jobs


    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.1.1
    • Fix Version/s: 0.2.0
    • Component/s: None
    • Labels:


      Currently when tasks fail, they are usually rerun immediately on the same host. This causes problems in a couple of ways.
      1.The task is more likely to fail on the same host.
      2.If there is cleanup code (such as clearing pendingCreates) it does not always run immediately, leading to cascading failures.

      For a first pass, I propose that when a task fails, we start the scan for new tasks to launch at the following task of the same type (within that job). So if maps[99] fails, when we are looking to assign new map tasks from this job, we scan like maps[100]...maps[N], maps[0]..,maps[99].

      A more involved change would avoid running tasks on nodes where it has failed before. This is a little tricky, because you don't want to prevent re-excution of tasks on 1 node clusters and the job tracker needs to schedule one task tracker at a time.


        Owen O'Malley created issue -
        Owen O'Malley made changes -
        Field Original Value New Value
        Attachment no-repeat-failures.patch [ 12325520 ]
        Doug Cutting made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Doug Cutting made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Doug Cutting made changes -
        Workflow jira [ 12361184 ] no reopen closed [ 12373077 ]
        Doug Cutting made changes -
        Workflow no reopen closed [ 12373077 ] no-reopen-closed [ 12373413 ]
        Doug Cutting made changes -
        Workflow no-reopen-closed [ 12373413 ] no-reopen-closed, patch-avail [ 12377724 ]
        Owen O'Malley made changes -
        Component/s mapred [ 12310690 ]


          • Assignee:
            Owen O'Malley
            Owen O'Malley
          • Votes:
            0 Vote for this issue
            0 Start watching this issue


            • Created: