Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-1472

Timed-out tasks are marked as 'KILLED' rather than as 'FAILED' which means the framework doesn't fail a TIP with 4 or more timed-out attempts

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 0.13.0
    • 0.14.0
    • None
    • None

    Description

      Timed-out tasks (and also tasks which fail with FSError) are marked as KILLED rather than as FAILED. The major issue with this is that post HADOOP-1050 only FAILED task-attempts are considered to decide if the TIP has failed, and hence there exists a corner case where a TIP which has 4 timed-out tasks isn't marked as FAILED and thus the job keeps running too...

      Considering this is a corner-case and is going to entail not-too-insignificant changes to TaskTracker's control-flow (ugly as it is right now), I'm proposing to fix this either for 0.13.1 (if need be) or better: 0.14.

      Thoughts?

      Attachments

        1. HADOOP-1472_1_20070608.patch
          5 kB
          Arun Murthy
        2. HADOOP-1472_2_20070608.patch
          6 kB
          Arun Murthy
        3. HADOOP-1472_3_20070612.patch
          7 kB
          Arun Murthy

        Activity

          People

            acmurthy Arun Murthy
            acmurthy Arun Murthy
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: