Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-1930

Too many fetch-failures issue

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.15.0
    • Fix Version/s: 0.15.0
    • Component/s: None
    • Labels:
      None

      Description

      A job with 4000 maps on a 1400 node cluster (3 tasks per node allowed) had a lot (150) of 'Too many fetch-failures' map failures.

      From the jobtracker log it looks as if it got confused which tasktracker actually ran the task:

      (In the following log output, I replaced the corresponding tasktracker nodes with **node_assigned** and **node_fetch_attempt* and they are different)

      grep task_200709170247_0018_m_000009_0 hadoop-xxx-jobtracker-node.log.2007-09-19:

      2007-09-19 15:52:26,907 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200709170247_0018_m_000009_0' to tip tip_200709170247_0018_m_000009, for tracker 'tracker_**node_assigned_**:/127.0.0.1:54523'
      2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.TaskRunner: Saved output of task 'task_200709170247_0018_m_000009_0' to hdfs://location
      2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_200709170247_0018_m_000009_0' has completed tip_200709170247_0018_m_000009 successfully.
      2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_200709170247_0018_m_000009_0' has completed succesfully
      2007-09-19 16:21:07,825 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #1 for task task_200709170247_0018_m_000009_0
      2007-09-19 16:23:23,483 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #2 for task task_200709170247_0018_m_000009_0
      2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #3 for task task_200709170247_0018_m_000009_0
      2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.JobInProgress: Too many fetch-failures for output of task: task_200709170247_0018_m_000009_0 ... killing it
      2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200709170247_0018_m_000009_0: Too many fetch-failures
      2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_200709170247_0018_m_000009_0' has been lost.
      2007-09-19 16:25:07,184 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200709170247_0018_m_000009_0' from 'tracker_**node_fetch_attempt**:/127.0.0.1:48818'
      2007-09-19 21:40:00,235 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200709170247_0018_m_000009_0' from 'tracker_**node_fetch_attempt**:/127.0.0.1:48818'

        Attachments

        1. HADOOP-1930_1_20070922.patch
          2 kB
          Arun Murthy
        2. HADOOP-1930_2_20070925.patch
          6 kB
          Arun Murthy

          Activity

            People

            • Assignee:
              acmurthy Arun Murthy
              Reporter:
              ckunz Christian Kunz
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: