Issue Details (XML | Word | Printable)

Key: HADOOP-3327
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Amareshwari Sriramadasu
Reporter: Runping Qi
Votes: 0
Watchers: 3
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Shuffling fetchers waited too long between map output fetch re-tries

Created: 30/Apr/08 05:51 PM   Updated: 30/Oct/09 03:24 AM
Component/s: None
Affects Version/s: None
Fix Version/s: 0.21.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works hadoop-3327-v1.patch 2008-07-11 10:29 AM Jothi Padmanabhan 14 kB
Text File Licensed for inclusion in ASF works hadoop-3327-v2.patch 2008-07-17 04:40 AM Jothi Padmanabhan 14 kB
Text File Licensed for inclusion in ASF works hadoop-3327-v3.patch 2008-07-23 01:27 PM Jothi Padmanabhan 15 kB
Text File Licensed for inclusion in ASF works hadoop-3327.patch 2008-06-26 05:58 AM Jothi Padmanabhan 15 kB
Text File Licensed for inclusion in ASF works patch-3327-1.txt 2009-02-03 11:05 AM Amareshwari Sriramadasu 11 kB
Text File Licensed for inclusion in ASF works patch-3327-2.txt 2009-02-04 11:39 AM Amareshwari Sriramadasu 10 kB
Text File Licensed for inclusion in ASF works patch-3327.txt 2009-01-16 11:49 AM Amareshwari Sriramadasu 11 kB
Issue Links:
Reference

Resolution Date: 05/Feb/09 05:44 AM


 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Runping Qi added a comment - 30/Apr/08 05:57 PM
A reducer seems to have trouble to fetch a map output segment:

A lot of exceptions like below in the reducer's log:

2008-04-30 17:27:32,155 WARN org.apache.hadoop.mapred.ReduceTask: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at sun.net.www.http.ChunkedInputStream.fastRead(ChunkedInputStream.java:221)
at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:662)
at java.io.FilterInputStream.read(FilterInputStream.java:116)
at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2364)
at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2359)
at org.apache.hadoop.mapred.MapOutputLocation.getFile(MapOutputLocation.java:205)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:828)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:777)

In the hadoop.log of the trask tracker hosting the map output, I saw a lot exception like:

2008-04-30 17:25:45,005 WARN org.apache.hadoop.mapred.TaskTracker: getMapOutput(task_200804301615_0003_m_000756_0,653) failed :
java.net.SocketException: Connection timed out

at org.mortbay.http.HttpOutputStream.write(HttpOutputStream.java:423)
at org.mortbay.jetty.servlet.ServletOut.write(ServletOut.java:54)
at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2353)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
at org.mortbay.http.HttpServer.service(HttpServer.java:954)
at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)

2008-04-30 17:25:45,005 WARN /: /mapOutput?job=job_200804301615_0003&map=task_200804301615_0003_m_000756_0&reduce=653:
java.lang.IllegalStateException: Committed
at org.mortbay.jetty.servlet.ServletHttpResponse.resetBuffer(ServletHttpResponse.java:212)
at org.mortbay.jetty.servlet.ServletHttpResponse.sendError(ServletHttpResponse.java:375)
at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2376)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
at org.mortbay.http.HttpServer.service(HttpServer.java:954)
at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)


Runping Qi added a comment - 30/Apr/08 08:55 PM - edited
Here are the related lines from the job tracker log:

2008-04-30 17:00:01,346 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200804301615_0003_m_000756_0' to tip tip_200804301615_0003_m_00075
6, for tracker 'tracker_xxxx'
2008-04-30 17:07:04,827 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_200804301615_0003_m_000756_0' has completed tip_200804301615_0003_m_00
0756 successfully.
2008-04-30 17:32:49,981 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #1 for task task_200804301615_0003_m_000756_0
2008-04-30 17:45:38,438 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #2 for task task_200804301615_0003_m_000756_0
2008-04-30 17:56:43,950 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #3 for task task_200804301615_0003_m_000756_0
2008-04-30 17:56:43,950 INFO org.apache.hadoop.mapred.JobInProgress: Too many fetch-failures for output of task: task_200804301615_0003_m_000756_0 ...
killing it
2008-04-30 17:56:43,950 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200804301615_0003_m_000756_0: Too many fetch-failures
2008-04-30 17:56:43,952 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200804301615_0003_m_000756_1' to tip tip_200804301615_0003_m_00075
6, for tracker 'tracker_xxxx
2008-04-30 17:56:45,377 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200804301615_0003_m_000756_0' from 'tracker_xxxx
2008-04-30 18:02:17,893 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_200804301615_0003_m_000756_1' has completed tip_200804301615_0003_m_00
0756 successfully.
2008-04-30 18:03:16,193 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200804301615_0003_m_000756_0' from 'tracker_xxxx
2008-04-30 18:03:16,471 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200804301615_0003_m_000756_1' from 'tracker_xxxx

The above lines show that there ere about 24 minutes time between the first notification of failuring to fetch the map output and the third notice.
That means the reducer waited for about 12 minutes between each re-tries!
The re-execution of the map took only about 7 minutes!
During that time interval between fetch failure notifications,
there were very few tasks active.


Raghu Angadi added a comment - 30/Apr/08 09:20 PM - edited
>2008-04-30 17:25:45,005 WARN org.apache.hadoop.mapred.TaskTracker: getMapOutput(task_200804301615_0003_m_000756_0,653) failed :
java.net.SocketException: Connection timed out
> -
> at org.mortbay.http.HttpOutputStream.write(HttpOutputStream.java:423)
> at org.mortbay.jetty.servlet.ServletOut.write(ServletOut.java:54)

"Connection timed out" error while writing indicates the root cause is mostly the same packet retransmission problem seen in HADOOP-3132.


Amar Kamat added a comment - 02/May/08 06:18 AM
As Runping mentioned that the map takes roughly 7mins and looking at the logs

2008-04-30 17:32:49,981 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #1 for task task_200804301615_0003_m_000756_0
2008-04-30 17:45:38,438 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #2 for task task_200804301615_0003_m_000756_0
2008-04-30 17:56:43,950 INFO org.apache.hadoop.mapred.JobInProgress: Failed fetch notification #3 for task task_200804301615_0003_m_000756_0

Consider the following
1) The read timeout for the shuffler is 3min
2) The total time for sending one fetch-failure-notification would be ~7min (determined by the map runtime)
3) For the first time the reducer will back of exponentially.

attempt # backoff timeout total-time
0 0 3 mins 3 min
1 4 sec 3 mins 4 sec + 6 min
2 8 sec 3 mins 12 sec + 9 min
3 16 3mins 28 sec + 12 min
4 32 3mins 60 sec + 15 min
5 64 3mins 124 sec + 18 min
6 128 3mins 252sec + 21min
7 256 3mins 508sec + 24min

i.e in total the reducer waits for 32.46 mins before sending the first failure notification.
4) After (3) the fetch will be attempted twice, each with 7/2 min backoff before sending the fetch-failure-notification.

attempt backoff timeout total-time
1 3.5 mins 3 mins 6.5 mins
2 3.5 mins 3 mins 13 mins

i.e the total of 13mins between the 2 nd and 3 rd failure notifications.


The problem is that in this case the read timeout becomes significant as compared to the total-backoff and the map runtime.


Runping Qi added a comment - 02/May/08 07:06 AM

Whoo, if a map output cannot be fetched for some reason, it will take at least 45minutes before the job tracker decides to re-execute the mapper?
That seems a really long time!


Amar Kamat added a comment - 02/May/08 07:14 AM
The above comment is our hypothesis. It matches incase of 2 nd and 3 rd attempts. Can you confirm the 1 st attempt. Plz check in the reducer logs to see the time required at the reducer to notify the first failure i.e time when the failure was reported - time when the fetch was first scheduled.

Runping Qi added a comment - 02/May/08 07:28 AM

How do you know when the fetch of a map output was scheduled first?

Why do you even need to confirm the time for first notification?
It is obvious that the re-try/backoff strategy is flawed.
Instead of following the schedule described above, the reducer should consider
how many outstanding map outputs it still needs.
If not many map outputs need to be fetched, the reducer should not back off that long.
Also, the job tracker should decide whether to re-execute a map based on how many fetch failure AND how busy the system is.
If there are very few running mappers, then it should re-execute maps more aggressively.


Amar Kamat added a comment - 02/May/08 08:03 AM

How do you know when the fetch of a map output was scheduled first?

Look for the first occurrence of 'Copying task_200804301615_0003_m_000756_0 output from' in the reducer logs.

Why do you even need to confirm the time for first notification?
It is obvious that the re-try/backoff strategy is flawed.

Agreed. But the main cause of the problem needs to detected and fixed. Its just the we should be sure that what we are fixing is really broken.


Runping Qi added a comment - 02/May/08 01:06 PM

It is pretty clear to me what is broken.
It is the re-try strategy that does not take the job/task progression state into account.
A simple heuristic as I outlined earlier will make a bug difference.


Devaraj Das added a comment - 02/May/08 02:06 PM
Maybe till we have fetched 90% of the map outputs, we should do exponential backoff and after that we switch to fixed time smaller backoffs. But in the case of multiple jobs running in the cluster this policy might not be ideal (since the same tasktrackers might be serving outputs from multiple jobs).

Jothi Padmanabhan added a comment - 29/May/08 10:28 AM
There are two possible optimizations to help mitigate the problem.

Optimization 1 (At Job tracker)
============

The Job tracker could decide on when to re execute a map based on the system
load. System load would be characterized by the total number of map slots
available across the whole cluster and the number of unfinished map tasks in
the queue.

For example,
Load = (Total Map Slots available - Total Unfinished Maps) / Total Map Slots

One possible strategy (Possible default vaues for x = 50%, y = 75%)
1. If (Load < x), re-execute on first fetch failure notification itself.
2. If (x < Load < y), re-execute on second fetch failure notification.
3. Always re-execute (irrespective of the system load) on third notification.

Optimization 2 (At reduce task)
===========

The strategy is to categorize the time outs (while fetching map outputs) as
either connection Timeout or Read Timeout and then handle each case
differently. Currently, there is no distinction and all timeouts are handled
the same way.

Handling Connection Timeouts
--------------------------------------------

1. Try connecting with the default timeout of 30s.
2. Follow the existing algorithm of Exponential backoff for retries. This
algorithm is provided below for quick reference.

Handling Read Timeouts
-----------------------------------
1. Read with a time out = MAX(3 minutes, map_run_time)
2. Back off for a value = (map_run_time/2)
3. Send notifications after every read time out.

Exponential Back Off Algorithm
########################
BACKOFF_INIT = 4000

maxFetchRetriesPerMap =
getClosestPowerOf2(map_run_time * 1000 / BACKOFF_INIT) + 1;

currentBackOff = (noFailedFetches <= maxFetchRetriesPerMap)
? BACKOFF_INIT

  • (1 << (noFailedFetches - 1))
    : (this.maxBackoff * 1000 / 2);

First notification after maxFetchRetriesPerMap attempts
Second notification after 2 more attempts
Third notification after another 2 attempts

Example scenarios for Optimization 2
Assumptions
Map run time = 5 mins
Only one reducer per node (All fetch failures will be from this task alone)

Case 1. Connect fails.

Existing algorithm:
1. First notification = At end of 6 (maxFetchRetriesPerMap) retries. The Exponential backoff algorithm
comes into play here.
Approx time = 3mins * 7 + (4+8+16+32+64+128) = 21 + 4.2 = 25.2 mins
^^^^^^^^^^^^^^^^^^
EBO
2. Second notification = After 2 attempts. Approx time = 25 + (3+2+3) = 33 mins.
(Back Off = 2 mins after maxFetchRetriesPerMap.)
2. Third notification = After another 2 attempts. Approx time = 41 mins

New algorithm:
1. First Notification = 4.2 mins + 30*7 = 7.5 mins
^^^^^^
EBO
2. Second Notification = 7.5 + (0.5+2.5+0.5) = 10.5 mins
3. Third Notification = 10.5 + (0.5+2.5+0.5) = 13.5 mins

Case 2. Connect successful, read fails.

Existing algorithm:
Same as Case 1, 41 mins for map re-execution.

New algorithm:
1. First Notification = 5 mins (read time out = map_run_time)
2. Second Notification = 5+2.5+5 = 12.5 mins (back off = map_run_time/2 = 2.5 mins)
3. Third Notification = 12.5 + 2.5 + 5 = 20 mins


Jothi Padmanabhan added a comment - 26/Jun/08 05:58 AM
Attaching patch for review

Hadoop QA added a comment - 26/Jun/08 09:19 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12384737/hadoop-3327.patch
against trunk revision 671563.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 core tests. The patch passed core unit tests.

-1 contrib tests. The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2745/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2745/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2745/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2745/console

This message is automatically generated.


Devaraj Das added a comment - 11/Jul/08 07:37 AM
Sorry this patch no longer applies cleanly. Pls regenerate the patch.

Devaraj Das added a comment - 11/Jul/08 07:38 AM
Sorry the assignment was updated by mistake

Jothi Padmanabhan added a comment - 11/Jul/08 10:29 AM
Patch for the latest trunk

Hadoop QA added a comment - 13/Jul/08 03:35 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12385858/hadoop-3327-v1.patch
against trunk revision 676069.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

-1 patch. The patch command could not apply the patch.

Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2852/console

This message is automatically generated.


Jothi Padmanabhan added a comment - 17/Jul/08 04:40 AM
Attaching patch again for the latest trunk. Hopefully, third time lucky!!

Hadoop QA added a comment - 17/Jul/08 06:59 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12386262/hadoop-3327-v2.patch
against trunk revision 677470.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2891/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2891/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2891/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2891/console

This message is automatically generated.


Jothi Padmanabhan added a comment - 18/Jul/08 04:13 AM
Did manual testing by hacking the code to simulate connection/read timeouts.

Devaraj Das added a comment - 23/Jul/08 12:18 PM
Sorry for the long turn-around on this one. There are two things that should be addressed:
1) Convert the error types to enum
2) There is a copy-paste error in an if-else clause (e.getClass() == ConnTimeoutException.class). The check should be for ReadTimeoutException in the else clause.

Jothi Padmanabhan added a comment - 23/Jul/08 01:28 PM
Attaching patch after incorporating the review comments

Hadoop QA added a comment - 24/Jul/08 03:48 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12386719/hadoop-3327-v3.patch
against trunk revision 679202.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

-1 core tests. The patch failed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2931/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2931/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2931/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2931/console

This message is automatically generated.


Devaraj Das added a comment - 25/Jul/08 12:09 PM
I just committed this. Thanks, Jothi!

Hudson added a comment - 22/Aug/08 12:34 PM

Jothi Padmanabhan added a comment - 26/Sep/08 08:18 AM
We observed that this patch, while reducing time for re execution of maps on failures, is impacting performance negatively for normal runs on regular clusters. Should we revert this patch till we come up with the correct solution?

Devaraj Das added a comment - 29/Sep/08 10:20 AM
I reverted the patch on both trunk and 0.19 branch.

Hudson added a comment - 29/Sep/08 03:27 PM

Amareshwari Sriramadasu added a comment - 09/Jan/09 04:36 AM
After discussion with Jothi and Devaraj, I propose the following approach :

1. If pendingCopies < 0.25 * numMaps, // towards the end of shuffle
fetchRetries = maxFetchRetriesPerMap/2;
// this will send first notification to JT in half the time of the existing algorithm.
// Also exponential back-off is half the number of times.

2. If failure is because of ReadTimeOut,
send notification to JT immediately.
towards the end of shuffle, back off for min (maxMapRunTime/2, current backoff);
else back off for maxMapRunTime/2.

3. At JT,
if freeMapSlots < 0.5 * totalMapSlots, re-execute the map after 3 notifications. (current algorithm)
else re-execute the map after 2 notifications.

Thoughts?


Amareshwari Sriramadasu added a comment - 16/Jan/09 11:49 AM
Attaching a patch for review, while i continue my testing.

I have done simple tests with patch. Results are as follows:
With read timeouts simulated for 4 map outputs :

  • On Single node cluster with 2 maps, 1 reducer and maxMapRuntime 6sec.
    • Job took 41 mins 4sec without the patch
    • Job took 22mins 2 sec with the patch
  • 20 node cluster with 50 maps, 6 reducers and maxMapRuntime 6sec
    • Job took 18 mins 23sec without the patch
    • Job took 6mins 32sec with the patch.
  • 20 node cluster with 50 maps, 6 reducers with maxMapRuntime 2 mins 33 sec
    • Job took 30mins 0sec without the patch
    • Job took 8mins 30sec with the patch

Amareshwari Sriramadasu added a comment - 20/Jan/09 10:26 AM
Some more numbers:
Job With trunk With patch
Sort on 200 nodes 1hrs,1mins,13secs 1hrs, 2mins, 8sec
Sort on 200 nodes with read timeouts simulated for 10 maps 5 during the start of shuffle and 5 during end of shuffle (maxMapRunTime= 3mins, 43sec) 2hrs, 5mins, 7sec 1hrs, 5mins, 37sec ( this is almost same as normal run!)
SortValidator on 200 nodes with read timeouts for 5 maps during the start of shuffle(maxMapRunTime = 18mins, 58sec) 2hrs 13mins 46sec 31mins, 51sec
Gridmix on 200 nodes 5329 sec 5187 sec
Gridmix on 400 nodes 3028 sec 2903 sec

These results show good improvement incase of fetch failures. And there is no performance degrading.


Amareshwari Sriramadasu added a comment - 20/Jan/09 10:29 AM
test-patch result :
     [exec]
     [exec] -1 overall.
     [exec]
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec]
     [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
     [exec]                         Please justify why no tests are needed for this patch.
     [exec]
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec]
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec]
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec]
     [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
     [exec]
     [exec]

It is not easy to write a testcase for this.

All core and contrib unit tests passed on my machine.


Jothi Padmanabhan added a comment - 21/Jan/09 05:22 AM
Looks good. A few points
  • Some comments on the changes in the code would be good.
  • The percentages that we use to decide maxNotifications and fetchRetriesPerMap should be configurable?
  • Since fetchRetriesPerMap is computed during every iteration as per the current copiedMapOutputs.size, it is possible that we might delay a notification to the JT by one failure. For example, consider maxFetchRetriesPerMap = 5 and numRetries=4. During the next failure numRetries = 5, and let us say we cross the threshold and reset fetchRetriesperMap = 2 (5/2). As per the existing logic, we would have sent a notification as numRetires = maxFetchRetriesPerMap. But with the new logic, we will wait as 5%2 != 0. But this is a corner case and probably can be overlooked.

Devaraj Das added a comment - 02/Feb/09 06:21 AM
Looks fine to me. We shouldn't have any new configuration overall.

Devaraj Das added a comment - 02/Feb/09 07:03 AM
Also, the change in JobInProgress is not required at this point IMO.

Amareshwari Sriramadasu added a comment - 03/Feb/09 11:05 AM
Patch with review comments incorporated.

Amareshwari Sriramadasu added a comment - 03/Feb/09 11:11 AM
Reduces the waitiing time between map output fetch re-tries during the end of the shuffle by notifying the JobTracker, aggressively.
ReadTimeOuts during shuffle are treated differently: Reducer notifies the JobTracker immediately for a readTimeout and backs off for more time.

Amareshwari Sriramadasu added a comment - 03/Feb/09 11:52 AM
test-patch and ant tests passed on my machine.

Hadoop QA added a comment - 03/Feb/09 03:09 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12399348/patch-3327-1.txt
against trunk revision 740237.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 Eclipse classpath. The patch retains Eclipse classpath integrity.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3790/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3790/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3790/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3790/console

This message is automatically generated.


Devaraj Das added a comment - 04/Feb/09 04:52 AM
Sorry, I just realized that we can avoid the class for handling read timeout exceptions, and instead have a thread-local variable that's set whenever a read timeout is seen...

Amareshwari Sriramadasu added a comment - 04/Feb/09 11:39 AM
attaching patch with the change sugggested by Devaraj

Hadoop QA added a comment - 04/Feb/09 04:01 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12399449/patch-3327-2.txt
against trunk revision 740532.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 Eclipse classpath. The patch retains Eclipse classpath integrity.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 core tests. The patch passed core unit tests.

-1 contrib tests. The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3796/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3796/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3796/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3796/console

This message is automatically generated.


Amareshwari Sriramadasu added a comment - 05/Feb/09 03:31 AM
contrib-test failure TestAgentConfig.testInitAdaptors_vs_Checkpoint is not related to the patch

Devaraj Das added a comment - 05/Feb/09 05:44 AM
I just committed this. Thanks, Amareshwari!
I should note that this particular patch just handles read timeouts better. There is good scope of future work here and follow-up issues should be raised (for e.g., how best to determine when to kill a (faulty)map/reduce task during shuffle).

Hudson added a comment - 16/Feb/09 05:00 PM

Robert Chansler added a comment - 09/Oct/09 04:17 AM
Editorial pass over all release notes prior to publication of 0.21. bug