Issue Details (XML | Word | Printable)

Key: HADOOP-4749
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: He Yongqiang
Reporter: Zheng Shao
Votes: 0
Watchers: 2
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

reducer should output input data size when shuffling is done

Created: 02/Dec/08 09:26 AM   Updated: 08/Jul/09 04:53 PM
Return to search
Component/s: None
Affects Version/s: 0.19.0
Fix Version/s: 0.20.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works 4749.patch 2008-12-08 07:42 AM He Yongqiang 2 kB
Issue Links:
Reference
 

Hadoop Flags: Reviewed
Release Note: Added a new counter REDUCE_INPUT_BYTES.
Resolution Date: 11/Dec/08 06:05 AM


 Description  « Hide
Sometimes we see a single slow reducer because of the load balancing problem. This information will be very useful to understand how imbalanced the load is.

Should be easy to fix I guess, since reducer should have all information needed at the end of the shuffling phase.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
He Yongqiang added a comment - 05/Dec/08 09:13 AM
At which point? just after the reducer's sort finished and before get to work?

Zheng Shao added a comment - 05/Dec/08 09:17 AM
I think right after shuffling is done and before sorting is done (when the reducer has got all mapper's output) we should already be able to know and output those information.

He Yongqiang added a comment - 05/Dec/08 12:57 PM
while the input data size of one reducer can be collected at the reducer side after the copy phase is done, it seems that the reducer's input record count can not be collected after copy phase is done.
Maybe these information can be collected at the map side?

Zheng Shao added a comment - 05/Dec/08 07:05 PM
Let's keep it simple and just output input data size for now. Sorting does take a long time (especially with load balance problem) so I don't want to wait till that is done.

He Yongqiang added a comment - 06/Dec/08 05:32 AM
Added a new counter REDUCE_INPUT_BYTES in Task class
Currently the counter is only updated when the mapper is not local.
Is there a need to update the counter if the mapper is local?

Zheng Shao added a comment - 08/Dec/08 07:25 AM
Thanks for the patch.

I think we can ignore the case that mapper is local, because load balance problem would not be interesting in that case.

Several suggestions:
1. Please use a new variable instead of moving bytesTransferred to class member level. There are other places that referenced bytesTransferred.
2. Make the patch into a single file (svn diff in the trunk directory);
3. Please click on "Submit patch" after "Attach file" is done;


He Yongqiang added a comment - 08/Dec/08 07:42 AM
According to Zheng Shao's suggentions, made some little modifications.

Zheng Shao added a comment - 08/Dec/08 09:25 AM
+1

I don't know if we have unit tests for all counters? If so we also need to add this into the unit test. (Try run "ant test" in the trunk directory).
Otherwise we don't have to add a test I think.


Hadoop QA added a comment - 10/Dec/08 09:29 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12395533/4749.patch
against trunk revision 725341.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 Eclipse classpath. The patch retains Eclipse classpath integrity.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3702/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3702/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3702/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3702/console

This message is automatically generated.


Runping Qi added a comment - 11/Dec/08 05:44 AM

+1
Looks good.


Zheng Shao added a comment - 11/Dec/08 06:00 AM
I've just committed this.
Thanks Yongqiang He!

Zheng Shao added a comment - 11/Dec/08 06:05 AM
Committed revision 725588 and 725589.

Hudson added a comment - 11/Dec/08 02:19 PM
Integrated in Hadoop-trunk #685 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/685/)
. Added a new counter REDUCE_INPUT_BYTES. (Yongqiang He via zshao)
. Added a new counter REDUCE_INPUT_BYTES. (Yongqiang He via zshao)