Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-934

TaskTracker sends duplicate status when updating task metrics throws exception

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.1
    • Fix Version/s: 0.11.0
    • Component/s: None
    • Labels:
      None

      Description

      When updating the TaskTracker metrics repeatedly throws an exception, the TaskTracker repeatedly sends status events to the JobTracker. This seems to get the JobTracker into a funny state.

      See annotated logs below

      TaskTracker log:
      2007-01-21 23:51:03,555 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction: task_0001_m_000000_0
      2007-01-21 23:51:05,669 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% writing -1539638179018996759@438272/1858392
      2007-01-21 23:51:06,722 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% wrote 7041109562936259497
      2007-01-21 23:51:07,784 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% wrote -3327068643288265619
      2007-01-21 23:51:08,819 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 0.0% wrote -53923196326936926
      2007-01-21 23:51:09,322 INFO org.apache.hadoop.mapred.TaskTracker: task_0001_m_000000_0 1.0% wrote -3407225283463353195
      2007-01-21 23:51:09,323 INFO org.apache.hadoop.mapred.TaskTracker: Task task_0001_m_000000_0 is done.
      2007-01-21 23:51:09,391 ERROR org.apache.hadoop.mapred.TaskTracker: Caught exception: org.apache.hadoop.metrics.MetricsException:
      at foo.Bar.setMetric(Unknown Source)
      at org.apache.hadoop.mapred.TaskTracker$TaskTrackerMetrics.completeTask(TaskTracker.java:135)
      at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:604)
      at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:495)
      at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:852)
      at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1494)
      2007-01-21 23:51:14,057 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'jobtrackerhost' with reponseId '21
      2007-01-21 23:51:14,060 ERROR org.apache.hadoop.mapred.TaskTracker: Caught exception: org.apache.hadoop.metrics.MetricsException:
      at foo.Bar.setMetric(Unknown Source)
      at org.apache.hadoop.mapred.TaskTracker$TaskTrackerMetrics.completeTask(TaskTracker.java:135)
      at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:604)
      at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:495)
      at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:852)
      at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1494)
      2007-01-21 23:51:14,060 INFO org.apache.hadoop.mapred.TaskTracker: Resending 'status' to 'jobtrackerhost' with reponseId '22
      2007-01-21 23:51:14,063 ERROR org.apache.hadoop.mapred.TaskTracker: Caught exception: org.apache.hadoop.metrics.MetricsException:
      at foo.Bar.setMetric(Unknown Source)
      at org.apache.hadoop.mapred.TaskTracker$TaskTrackerMetrics.completeTask(TaskTracker.java:135)
      at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:604)
      at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:495)
      at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:852)
      at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1494)
      ...
      [these messages continue forever with each status sent to jobtrackerhost having an increasing responseId]

      JobTracker log:
      2007-01-21 23:51:03,500 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_0001_m_000000
      2007-01-21 23:51:03,511 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_0001_m_000000_0' to tip tip_0001_m_000000, for tracker 'tracker:50050'
      2007-01-21 23:51:03,569 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_0001_r_000000
      2007-01-21 23:51:03,577 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_0001_r_000000_0' to tip tip_0001_r_000000, for tracker 'tracker2:50050'
      2007-01-21 23:51:09,386 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_0001_m_000000_0' has completed tip_0001_m_000000 successfully.
      2007-01-21 23:51:09,389 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_0001_m_000000_0' has completed.
      2007-01-21 23:51:14,059 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
      2007-01-21 23:51:14,062 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
      2007-01-21 23:51:14,064 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
      2007-01-21 23:51:14,066 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
      ... [there are hundreds of these identical messages]
      2007-01-21 23:51:16,441 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
      2007-01-21 23:51:16,442 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
      2007-01-21 23:51:16,444 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_0001_r_000000_0' has completed tip_0001_r_000000 successfully.
      2007-01-21 23:51:16,446 INFO org.apache.hadoop.mapred.TaskInProgress: Task 'task_0001_r_000000_0' has completed.
      2007-01-21 23:51:16,456 INFO org.apache.hadoop.mapred.JobInProgress: Job job_0001 has completed successfully.
      2007-01-21 23:51:16,459 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_0001_r_000000_0' from 'tracker2:50050'
      2007-01-21 23:51:16,460 WARN org.apache.hadoop.mapred.TaskInProgress: Recieved duplicate status update of 'SUCCEEDED' for 'task_0001_m_000000_0' of TIP 'tip_0001_m_000000'
      2007-01-21 23:51:16,460 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_0001_m_000000_0' from 'tracker:50050'
      2007-01-21 23:51:16,462 INFO org.apache.hadoop.mapred.JobTracker: Serious problem. While updating status, cannot find taskid task_0001_m_000000_0
      2007-01-21 23:51:16,463 INFO org.apache.hadoop.mapred.JobTracker: Serious problem. While updating status, cannot find taskid task_0001_m_000000_0
      2007-01-21 23:51:16,465 INFO org.apache.hadoop.mapred.JobTracker: Serious problem. While updating status, cannot find taskid task_0001_m_000000_0
      ...
      [these messages continue forever]

        Attachments

        1. HADOOP-934_20070129_1.patch
          0.9 kB
          Arun Murthy

          Activity

            People

            • Assignee:
              acmurthy Arun Murthy
              Reporter:
              nidaley Nigel Daley
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: