Issue Details (XML | Word | Printable)

Key: HADOOP-1332
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Blocker Blocker
Assignee: Arun C Murthy
Reporter: Nigel Daley
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Sporadic unit test failures (TestMiniMRClasspath, TestMiniMRLocalFS, TestMiniMRDFSCaching)

Created: 04/May/07 10:28 PM   Updated: 08/Jul/09 04:51 PM
Return to search
Component/s: None
Affects Version/s: 0.13.0
Fix Version/s: 0.13.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works copy-thread.patch 2007-05-30 05:46 AM Owen O'Malley 4 kB
Text File Licensed for inclusion in ASF works HADOOP-1332_1_20070529.patch 2007-05-29 11:49 AM Arun C Murthy 3 kB
Text File Licensed for inclusion in ASF works HADOOP-1332_2_20070530.patch 2007-05-29 08:48 PM Arun C Murthy 4 kB

Resolution Date: 31/May/07 07:15 PM


 Description  « Hide
Since April 22 I've been seeing sporadic failures of these tests on Windows:

The tests fail because they timeout.
They timeout because one of the task trackers doesn't go idle.
One of the task trackers doesn't go idle because a map fails and has to be killed.

Reordered and annotated tests logs (my comments are in parenthesis):

(map 0 executes and logs a 'done' and two 'completed' messages):
[junit] 2007-05-03 21:19:40,516 INFO mapred.JobInProgress (JobInProgress.java:findNewTask(653)) - Choosing cached task tip_0001_m_000000
[junit] 2007-05-03 21:19:40,516 INFO mapred.JobTracker (JobTracker.java:createTaskEntry(758)) - Adding task 'task_0001_m_000000_0' to tip tip_0001_m_000000, for tracker 'tracker_task000.com:2893'
[junit] 2007-05-03 21:19:40,516 INFO mapred.TaskTracker (TaskTracker.java:startNewTask(1071)) - LaunchTaskAction: task_0001_m_000000_0
[junit] 2007-05-03 21:19:45,655 INFO mapred.TaskTracker (TaskTracker.java:reportProgress(1284)) - task_0001_m_000000_0 1.0% hdfs://localhost:2882/testing/ext/input/part-0:0+10
[junit] 2007-05-03 21:19:46,201 INFO mapred.TaskTracker (TaskTracker.java:reportProgress(1284)) - task_0001_m_000000_0 1.0% hdfs://localhost:2882/testing/ext/input/part-0:0+10
[junit] 2007-05-03 21:19:46,201 INFO mapred.TaskTracker (TaskTracker.java:reportDone(1334)) - Task task_0001_m_000000_0 is done.
[junit] 2007-05-03 21:19:46,357 INFO mapred.JobInProgress (JobInProgress.java:completedTask(734)) - Task 'task_0001_m_000000_0' has completed tip_0001_m_000000 successfully.
[junit] 2007-05-03 21:19:46,357 INFO mapred.TaskInProgress (TaskInProgress.java:completedTask(475)) - Task 'task_0001_m_000000_0' has completed.

(map 2 executes and logs a 'done' message but no 'completed' messages. It is eventually killed):
[junit] 2007-05-03 21:19:40,594 INFO mapred.JobInProgress (JobInProgress.java:findNewTask(653)) - Choosing cached task tip_0001_m_000002
[junit] 2007-05-03 21:19:40,594 INFO mapred.JobTracker (JobTracker.java:createTaskEntry(758)) - Adding task 'task_0001_m_000002_0' to tip tip_0001_m_000002, for tracker 'tracker_task002.com:2902'
[junit] 2007-05-03 21:19:40,594 INFO mapred.TaskTracker (TaskTracker.java:startNewTask(1071)) - LaunchTaskAction: task_0001_m_000002_0
[junit] 2007-05-03 21:19:46,295 INFO mapred.TaskTracker (TaskTracker.java:reportProgress(1284)) - task_0001_m_000002_0 0.0% hdfs://localhost:2882/testing/ext/input/part-0:20+10
[junit] 2007-05-03 21:19:46,310 INFO mapred.TaskTracker (TaskTracker.java:reportDone(1334)) - Task task_0001_m_000002_0 is done.
...
[junit] 2007-05-03 21:29:52,957 INFO mapred.TaskTracker (TaskTracker.java:markUnresponsiveTasks(909)) - task_0001_m_000002_0: Task failed to report status for 606 seconds. Killing.
(long thread dump)
(shutting down MiniMRCluster)
[junit] Waiting for task tracker tracker_task002.com:2902 to be idle.
[junit] Waiting for task tracker tracker_task002.com:2902 to be idle.
[junit] Waiting for task tracker tracker_task002.com:2902 to be idle.
[junit] Waiting for task tracker tracker_task002.com:2902 to be idle.
[junit] Waiting for task tracker tracker_task002.com:2902 to be idle.
[junit] Waiting for task tracker tracker_task002.com:2902 to be idle.
... (repeated until the test times out)



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Arun C Murthy made changes - 29/May/07 11:44 AM
Field Original Value New Value
Assignee Arun C Murthy [ acmurthy ]
Arun C Murthy made changes - 29/May/07 11:49 AM
Attachment HADOOP-1332_1_20070529.patch [ 12358416 ]
Arun C Murthy made changes - 29/May/07 11:49 AM
Status Open [ 1 ] Patch Available [ 10002 ]
Owen O'Malley made changes - 29/May/07 06:26 PM
Priority Major [ 3 ] Blocker [ 1 ]
Fix Version/s 0.13.0 [ 12312348 ]
Owen O'Malley made changes - 29/May/07 06:27 PM
Status Patch Available [ 10002 ] Open [ 1 ]
Arun C Murthy made changes - 29/May/07 08:48 PM
Attachment HADOOP-1332_2_20070530.patch [ 12358477 ]
Arun C Murthy made changes - 29/May/07 08:49 PM
Status Open [ 1 ] Patch Available [ 10002 ]
Nigel Daley made changes - 29/May/07 10:34 PM
Status Patch Available [ 10002 ] Open [ 1 ]
Owen O'Malley made changes - 30/May/07 05:46 AM
Attachment copy-thread.patch [ 12358509 ]
Arun C Murthy made changes - 30/May/07 06:11 AM
Status Open [ 1 ] Patch Available [ 10002 ]
Doug Cutting made changes - 31/May/07 07:15 PM
Status Patch Available [ 10002 ] Resolved [ 5 ]
Resolution Fixed [ 1 ]
Doug Cutting made changes - 08/Jun/07 08:40 PM
Status Resolved [ 5 ] Closed [ 6 ]
Owen O'Malley made changes - 08/Jul/09 04:51 PM
Component/s mapred [ 12310690 ]