Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5873

Shuffle bandwidth computation includes time spent waiting for maps

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.0
    • Fix Version/s: 2.6.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Currently ShuffleScheduler in ReduceTask JVM status displays bandwidth. Its definition however is confusing because it captures the time where there is no copying because there is a pause between when new wave of map outputs is available.
      current bw is definded as (bytes copied so far) / (total time in the copy phase so far)
      It would be more useful
      1) to measure bandwidth of a single copy call.
      2) display aggregated bw as long as there is at least one fetcher is in the copy call.

      1. MAPREDUCE-5873.v9.patch
        18 kB
        Siqi Li
      2. MAPREDUCE-5873.v6.patch
        18 kB
        Siqi Li
      3. MAPREDUCE-5873.v5.patch
        18 kB
        Siqi Li
      4. MAPREDUCE-5873.v4.patch
        18 kB
        Siqi Li
      5. MAPREDUCE-5873.v3.patch
        15 kB
        Siqi Li
      6. MAPREDUCE-5873.v2.patch
        15 kB
        Siqi Li
      7. MAPREDUCE-5873.v1.patch
        13 kB
        Siqi Li

        Activity

        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12642893/MAPREDUCE-5873.v1.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        -1 javac. The patch appears to cause the build to fail.

        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4574//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12642893/MAPREDUCE-5873.v1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. -1 javac . The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4574//console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12643617/MAPREDUCE-5873.v2.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4586//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4586//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4586//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12643617/MAPREDUCE-5873.v2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4586//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4586//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4586//console This message is automatically generated.
        Hide
        Siqi Li added a comment -

        Sangjin Lee Do you know what does this "-1 findbugs" mean

        Show
        Siqi Li added a comment - Sangjin Lee Do you know what does this "-1 findbugs" mean
        Hide
        Sangjin Lee added a comment -
        Show
        Sangjin Lee added a comment - That means there was a new findbugs violation. It's linked in the message above ( https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4586//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html ).
        Hide
        Siqi Li added a comment -

        I see, thanks Sangjin

        Show
        Siqi Li added a comment - I see, thanks Sangjin
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12645282/MAPREDUCE-5873.v3.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4609//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4609//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12645282/MAPREDUCE-5873.v3.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4609//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4609//console This message is automatically generated.
        Hide
        Joep Rottinghuis added a comment -

        Jason Lowe are you guys seeing this behavior as well ?
        We're finding that the current way that the shuffle bandwidth is shows is not only useless, but also confusing for our users.
        If a mapper has to be re-run it appears as if the bandwidth is a tiny amount.
        This new way of showing bandwidth is a much better indication of what is going on, and if it is very small, it is actually in indication that there is something not right in the network.

        Show
        Joep Rottinghuis added a comment - Jason Lowe are you guys seeing this behavior as well ? We're finding that the current way that the shuffle bandwidth is shows is not only useless, but also confusing for our users. If a mapper has to be re-run it appears as if the bandwidth is a tiny amount. This new way of showing bandwidth is a much better indication of what is going on, and if it is very small, it is actually in indication that there is something not right in the network.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12645282/MAPREDUCE-5873.v3.patch
        against trunk revision 0708827.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4926//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12645282/MAPREDUCE-5873.v3.patch against trunk revision 0708827. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4926//console This message is automatically generated.
        Hide
        Jason Lowe added a comment -

        Joep Rottinghuis, agree the current method of computing bandwidth is broken.

        Thanks for the patch, Siqi Li. The patch has gone stale, can you please rebase on trunk? Other comments:

        Why is the following change necessary? I don't see how mapOutput can be null, but maybe I'm missing something:

             if (!finishedMaps[mapIndex]) {
        -      output.commit();
        +      if (output != null) {
        +        output.commit();
        +      }
        

        We should provide a sizing hint to the ArrayList constructor during computeTotalCopyMills since we know worst-case how big it should be (intervals.size()+1) and we can avoid reallocation/copy during add.

        In the test, the comment doesn't match the code in a few places:

              //100MB from 50s to 100s
              bytes = (long)100 * 1024 * 1024;
              scheduler.copySucceeded(attemptID3, new MapHost(null, null), bytes, 200000, 300000, null);
        ...
              //100MB from 50s to 100s
              bytes = (long)100 * 1024 * 1024;
              scheduler.copySucceeded(attemptID4, new MapHost(null, null), bytes, 100000, 150000, null);
        

        Would like to see the unit test exercise all conditions of the interval adding code, such as adding a large overlapping interval (e.g.: 0-500s) late.

        Nit: Not sure the intervals.size() == 0 check is worth the extra code, especially if we just initialize intervals to Collections.emptyList().

        Nit: it would be nice to be consistent about Mills vs Millis. I prefer Millis myself, as it's used elsewhere in the code, or maybe CopyMillsTracker should just be CopyTimeTracker with a getCopyMillis method.

        Nit: The name TestMultipleTestSucceeded didn't convey to me what it's actually testing. Maybe TestAggregatedTransferRate would be more descriptive?

        Show
        Jason Lowe added a comment - Joep Rottinghuis , agree the current method of computing bandwidth is broken. Thanks for the patch, Siqi Li . The patch has gone stale, can you please rebase on trunk? Other comments: Why is the following change necessary? I don't see how mapOutput can be null, but maybe I'm missing something: if (!finishedMaps[mapIndex]) { - output.commit(); + if (output != null ) { + output.commit(); + } We should provide a sizing hint to the ArrayList constructor during computeTotalCopyMills since we know worst-case how big it should be (intervals.size()+1) and we can avoid reallocation/copy during add. In the test, the comment doesn't match the code in a few places: //100MB from 50s to 100s bytes = ( long )100 * 1024 * 1024; scheduler.copySucceeded(attemptID3, new MapHost( null , null ), bytes, 200000, 300000, null ); ... //100MB from 50s to 100s bytes = ( long )100 * 1024 * 1024; scheduler.copySucceeded(attemptID4, new MapHost( null , null ), bytes, 100000, 150000, null ); Would like to see the unit test exercise all conditions of the interval adding code, such as adding a large overlapping interval (e.g.: 0-500s) late. Nit: Not sure the intervals.size() == 0 check is worth the extra code, especially if we just initialize intervals to Collections.emptyList(). Nit: it would be nice to be consistent about Mills vs Millis. I prefer Millis myself, as it's used elsewhere in the code, or maybe CopyMillsTracker should just be CopyTimeTracker with a getCopyMillis method. Nit: The name TestMultipleTestSucceeded didn't convey to me what it's actually testing. Maybe TestAggregatedTransferRate would be more descriptive?
        Hide
        Siqi Li added a comment -

        Jason Lowe Thanks for your feedback. I will update my patch shortly.

        Show
        Siqi Li added a comment - Jason Lowe Thanks for your feedback. I will update my patch shortly.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12672869/MAPREDUCE-5873.v4.patch
        against trunk revision 7f6ed7f.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        -1 javac. The applied patch generated 1280 javac compiler warnings (more than the trunk's current 1266 warnings).

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

        org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS
        org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4929//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4929//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
        Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4929//artifact/patchprocess/diffJavacWarnings.txt
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4929//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672869/MAPREDUCE-5873.v4.patch against trunk revision 7f6ed7f. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. -1 javac . The applied patch generated 1280 javac compiler warnings (more than the trunk's current 1266 warnings). +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4929//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4929//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4929//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4929//console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12672869/MAPREDUCE-5873.v4.patch
        against trunk revision 0fb2735.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        -1 javac. The applied patch generated 1267 javac compiler warnings (more than the trunk's current 1266 warnings).

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4938//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4938//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html
        Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4938//artifact/patchprocess/diffJavacWarnings.txt
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4938//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672869/MAPREDUCE-5873.v4.patch against trunk revision 0fb2735. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. -1 javac . The applied patch generated 1267 javac compiler warnings (more than the trunk's current 1266 warnings). +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4938//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4938//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4938//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4938//console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12673483/MAPREDUCE-5873.v5.patch
        against trunk revision e16e25a.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4941//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4941//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4941//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673483/MAPREDUCE-5873.v5.patch against trunk revision e16e25a. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4941//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4941//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4941//console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12673678/MAPREDUCE-5873.v6.patch
        against trunk revision e16e25a.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4942//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4942//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673678/MAPREDUCE-5873.v6.patch against trunk revision e16e25a. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4942//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4942//console This message is automatically generated.
        Hide
        Siqi Li added a comment -

        Jason Lowe Hi Jason, I have update the patch, can you take a look at it and see if you can commit it ? thanks

        Show
        Siqi Li added a comment - Jason Lowe Hi Jason, I have update the patch, can you take a look at it and see if you can commit it ? thanks
        Hide
        Jason Lowe added a comment -

        Thanks for updating the patch, Siqi. It's looking better, just a few last nits.

        We should provide a sizing hint to the ArrayList constructor during computeTotalCopyMills since we know worst-case how big it should be (intervals.size()+1) and we can avoid reallocation/copy during add.

        This comment wasn't really addressed. The new patch now requires the CopyTimeTracker to take a size argument, but it still ends up reallocating the arraylists needlessly. Note that the size argument is only used to size the initial value of intervals, but this array will never be updated. It's effectively read-only and doesn't need the extra space. The only time we modify intervals is when we reassign it to a newly constructed array list in the getTotalCopyMillis method, and that method builds the new array list with no size hint. Instead of requiring users to pass the total copies to CopyTimeTracker, it'd be better to have intervals initialize to an empty array (Collections.emptyList would be appropriate here since it's always read-only) and have the getTotalCopyMillis method size the result array to be the size of the original array + 1 (i.e.: the worst-case new size of the array).

        When getTotalCopyMillis is handed a null interval we don't need to recompute the time and can just return copyMillis. That precludes the need to factor out the getMillis method which is confusing in light of the similarly-named getCopyMillis.

        The following hunk in the patch is using tabs instead of spaces on the endMillis line:

        @@ -180,7 +183,8 @@ static URI getBaseURI(TaskAttemptID reduceId, String url) {
           public synchronized void copySucceeded(TaskAttemptID mapId,
                                                  MapHost host,
                                                  long bytes,
        -                                         long millis,
        +                                         long startMillis,
        +					 long endMillis,
                                                  MapOutput<K,V> output
                                                  ) throws IOException {
             failureCounts.remove(mapId);
        
        Show
        Jason Lowe added a comment - Thanks for updating the patch, Siqi. It's looking better, just a few last nits. We should provide a sizing hint to the ArrayList constructor during computeTotalCopyMills since we know worst-case how big it should be (intervals.size()+1) and we can avoid reallocation/copy during add. This comment wasn't really addressed. The new patch now requires the CopyTimeTracker to take a size argument, but it still ends up reallocating the arraylists needlessly. Note that the size argument is only used to size the initial value of intervals , but this array will never be updated. It's effectively read-only and doesn't need the extra space. The only time we modify intervals is when we reassign it to a newly constructed array list in the getTotalCopyMillis method, and that method builds the new array list with no size hint. Instead of requiring users to pass the total copies to CopyTimeTracker, it'd be better to have intervals initialize to an empty array (Collections.emptyList would be appropriate here since it's always read-only) and have the getTotalCopyMillis method size the result array to be the size of the original array + 1 (i.e.: the worst-case new size of the array). When getTotalCopyMillis is handed a null interval we don't need to recompute the time and can just return copyMillis. That precludes the need to factor out the getMillis method which is confusing in light of the similarly-named getCopyMillis. The following hunk in the patch is using tabs instead of spaces on the endMillis line: @@ -180,7 +183,8 @@ static URI getBaseURI(TaskAttemptID reduceId, String url) { public synchronized void copySucceeded(TaskAttemptID mapId, MapHost host, long bytes, - long millis, + long startMillis, + long endMillis, MapOutput<K,V> output ) throws IOException { failureCounts.remove(mapId);
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12674604/MAPREDUCE-5873.v7.patch
        against trunk revision 178bc50.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        -1 javac. The patch appears to cause the build to fail.

        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4957//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12674604/MAPREDUCE-5873.v7.patch against trunk revision 178bc50. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. -1 javac . The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4957//console This message is automatically generated.
        Hide
        Jason Lowe added a comment -

        Also, a helpful tip for cutting down on JIRA email: it's not necessary to swap the JIRA states to kick Jenkins for each new patch. The bot is smart enough to notice a new patch on a Patch Available JIRA and re-run the tests. If the JIRA is already Patch Available then one just needs to attach a new patch.

        Show
        Jason Lowe added a comment - Also, a helpful tip for cutting down on JIRA email: it's not necessary to swap the JIRA states to kick Jenkins for each new patch. The bot is smart enough to notice a new patch on a Patch Available JIRA and re-run the tests. If the JIRA is already Patch Available then one just needs to attach a new patch.
        Hide
        Siqi Li added a comment -

        got it. Thanks Jason Lowe

        Show
        Siqi Li added a comment - got it. Thanks Jason Lowe
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12674612/MAPREDUCE-5873.v8.patch
        against trunk revision 178bc50.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        -1 javac. The patch appears to cause the build to fail.

        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4958//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12674612/MAPREDUCE-5873.v8.patch against trunk revision 178bc50. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. -1 javac . The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4958//console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12674635/MAPREDUCE-5873.v9.patch
        against trunk revision 178bc50.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4959//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4959//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12674635/MAPREDUCE-5873.v9.patch against trunk revision 178bc50. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4959//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4959//console This message is automatically generated.
        Hide
        Siqi Li added a comment -

        Jason Lowe Thanks for you feedback, I have attached an updated patch. If the patch looks good, can you get it committed?

        Show
        Siqi Li added a comment - Jason Lowe Thanks for you feedback, I have attached an updated patch. If the patch looks good, can you get it committed?
        Hide
        Jason Lowe added a comment -

        +1 lgtm. Committing this.

        Show
        Jason Lowe added a comment - +1 lgtm. Committing this.
        Hide
        Jason Lowe added a comment -

        Thanks, Siqi! I committed this to trunk, branch-2, and branch-2.6.

        Show
        Jason Lowe added a comment - Thanks, Siqi! I committed this to trunk, branch-2, and branch-2.6.
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in Hadoop-trunk-Commit #6264 (See https://builds.apache.org/job/Hadoop-trunk-Commit/6264/)
        MAPREDUCE-5873. Shuffle bandwidth computation includes time spent waiting for maps. Contributed by Siqi Li (jlowe: rev b9edad64034a9c8a121ec2b37792c190ba561e26)

        • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestShuffleScheduler.java
        • hadoop-mapreduce-project/CHANGES.txt
        • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
        • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/LocalFetcher.java
        • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleSchedulerImpl.java
        Show
        Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #6264 (See https://builds.apache.org/job/Hadoop-trunk-Commit/6264/ ) MAPREDUCE-5873 . Shuffle bandwidth computation includes time spent waiting for maps. Contributed by Siqi Li (jlowe: rev b9edad64034a9c8a121ec2b37792c190ba561e26) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestShuffleScheduler.java hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/LocalFetcher.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleSchedulerImpl.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk #713 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/713/)
        MAPREDUCE-5873. Shuffle bandwidth computation includes time spent waiting for maps. Contributed by Siqi Li (jlowe: rev b9edad64034a9c8a121ec2b37792c190ba561e26)

        • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleSchedulerImpl.java
        • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
        • hadoop-mapreduce-project/CHANGES.txt
        • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/LocalFetcher.java
        • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestShuffleScheduler.java
        Show
        Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #713 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/713/ ) MAPREDUCE-5873 . Shuffle bandwidth computation includes time spent waiting for maps. Contributed by Siqi Li (jlowe: rev b9edad64034a9c8a121ec2b37792c190ba561e26) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleSchedulerImpl.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/LocalFetcher.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestShuffleScheduler.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk #1903 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1903/)
        MAPREDUCE-5873. Shuffle bandwidth computation includes time spent waiting for maps. Contributed by Siqi Li (jlowe: rev b9edad64034a9c8a121ec2b37792c190ba561e26)

        • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleSchedulerImpl.java
        • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
        • hadoop-mapreduce-project/CHANGES.txt
        • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/LocalFetcher.java
        • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestShuffleScheduler.java
        Show
        Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #1903 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1903/ ) MAPREDUCE-5873 . Shuffle bandwidth computation includes time spent waiting for maps. Contributed by Siqi Li (jlowe: rev b9edad64034a9c8a121ec2b37792c190ba561e26) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleSchedulerImpl.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/LocalFetcher.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestShuffleScheduler.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in Hadoop-Mapreduce-trunk #1928 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1928/)
        MAPREDUCE-5873. Shuffle bandwidth computation includes time spent waiting for maps. Contributed by Siqi Li (jlowe: rev b9edad64034a9c8a121ec2b37792c190ba561e26)

        • hadoop-mapreduce-project/CHANGES.txt
        • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleSchedulerImpl.java
        • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
        • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestShuffleScheduler.java
        • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/LocalFetcher.java
        Show
        Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1928 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1928/ ) MAPREDUCE-5873 . Shuffle bandwidth computation includes time spent waiting for maps. Contributed by Siqi Li (jlowe: rev b9edad64034a9c8a121ec2b37792c190ba561e26) hadoop-mapreduce-project/CHANGES.txt hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleSchedulerImpl.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestShuffleScheduler.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/LocalFetcher.java

          People

          • Assignee:
            Siqi Li
            Reporter:
            Siqi Li
          • Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development