Hadoop Common
  1. Hadoop Common
  2. HADOOP-4749

reducer should output input data size when shuffling is done

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.19.0
    • Fix Version/s: 0.20.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Added a new counter REDUCE_INPUT_BYTES.

      Description

      Sometimes we see a single slow reducer because of the load balancing problem. This information will be very useful to understand how imbalanced the load is.

      Should be easy to fix I guess, since reducer should have all information needed at the end of the shuffling phase.

      1. 4749.patch
        2 kB
        He Yongqiang

        Issue Links

          Activity

          Hide
          He Yongqiang added a comment -

          At which point? just after the reducer's sort finished and before get to work?

          Show
          He Yongqiang added a comment - At which point? just after the reducer's sort finished and before get to work?
          Hide
          Zheng Shao added a comment -

          I think right after shuffling is done and before sorting is done (when the reducer has got all mapper's output) we should already be able to know and output those information.

          Show
          Zheng Shao added a comment - I think right after shuffling is done and before sorting is done (when the reducer has got all mapper's output) we should already be able to know and output those information.
          Hide
          He Yongqiang added a comment -

          while the input data size of one reducer can be collected at the reducer side after the copy phase is done, it seems that the reducer's input record count can not be collected after copy phase is done.
          Maybe these information can be collected at the map side?

          Show
          He Yongqiang added a comment - while the input data size of one reducer can be collected at the reducer side after the copy phase is done, it seems that the reducer's input record count can not be collected after copy phase is done. Maybe these information can be collected at the map side?
          Hide
          Zheng Shao added a comment -

          Let's keep it simple and just output input data size for now. Sorting does take a long time (especially with load balance problem) so I don't want to wait till that is done.

          Show
          Zheng Shao added a comment - Let's keep it simple and just output input data size for now. Sorting does take a long time (especially with load balance problem) so I don't want to wait till that is done.
          Hide
          He Yongqiang added a comment -

          Added a new counter REDUCE_INPUT_BYTES in Task class
          Currently the counter is only updated when the mapper is not local.
          Is there a need to update the counter if the mapper is local?

          Show
          He Yongqiang added a comment - Added a new counter REDUCE_INPUT_BYTES in Task class Currently the counter is only updated when the mapper is not local. Is there a need to update the counter if the mapper is local?
          Hide
          Zheng Shao added a comment -

          Thanks for the patch.

          I think we can ignore the case that mapper is local, because load balance problem would not be interesting in that case.

          Several suggestions:
          1. Please use a new variable instead of moving bytesTransferred to class member level. There are other places that referenced bytesTransferred.
          2. Make the patch into a single file (svn diff in the trunk directory);
          3. Please click on "Submit patch" after "Attach file" is done;

          Show
          Zheng Shao added a comment - Thanks for the patch. I think we can ignore the case that mapper is local, because load balance problem would not be interesting in that case. Several suggestions: 1. Please use a new variable instead of moving bytesTransferred to class member level. There are other places that referenced bytesTransferred. 2. Make the patch into a single file (svn diff in the trunk directory); 3. Please click on "Submit patch" after "Attach file" is done;
          Hide
          He Yongqiang added a comment -

          According to Zheng Shao's suggentions, made some little modifications.

          Show
          He Yongqiang added a comment - According to Zheng Shao's suggentions, made some little modifications.
          Hide
          Zheng Shao added a comment -

          +1

          I don't know if we have unit tests for all counters? If so we also need to add this into the unit test. (Try run "ant test" in the trunk directory).
          Otherwise we don't have to add a test I think.

          Show
          Zheng Shao added a comment - +1 I don't know if we have unit tests for all counters? If so we also need to add this into the unit test. (Try run "ant test" in the trunk directory). Otherwise we don't have to add a test I think.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12395533/4749.patch
          against trunk revision 725341.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3702/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3702/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3702/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3702/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12395533/4749.patch against trunk revision 725341. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3702/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3702/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3702/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3702/console This message is automatically generated.
          Hide
          Runping Qi added a comment -

          +1
          Looks good.

          Show
          Runping Qi added a comment - +1 Looks good.
          Hide
          Zheng Shao added a comment -

          I've just committed this.
          Thanks Yongqiang He!

          Show
          Zheng Shao added a comment - I've just committed this. Thanks Yongqiang He!
          Hide
          Zheng Shao added a comment -

          Committed revision 725588 and 725589.

          Show
          Zheng Shao added a comment - Committed revision 725588 and 725589.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk #685 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/685/)
          . Added a new counter REDUCE_INPUT_BYTES. (Yongqiang He via zshao)
          . Added a new counter REDUCE_INPUT_BYTES. (Yongqiang He via zshao)

          Show
          Hudson added a comment - Integrated in Hadoop-trunk #685 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/685/ ) . Added a new counter REDUCE_INPUT_BYTES. (Yongqiang He via zshao) . Added a new counter REDUCE_INPUT_BYTES. (Yongqiang He via zshao)

            People

            • Assignee:
              He Yongqiang
              Reporter:
              Zheng Shao
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development