Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-14407

DistCp - Introduce a configurable copy buffer size

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.9.0
    • Fix Version/s: 2.9.0, 3.0.0-alpha4
    • Component/s: tools/distcp
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      The copy buffer size can be configured via the new parameter <copybuffersize>. By default the <copybuffersize> is set to 8KB.

      Description

      Currently, the RetriableFileCopyCommand has a fixed copy buffer size of just 8KB. We have noticed in our performance tests that with bigger buffer sizes we saw upto ~3x performance boost. Hence, making the copy buffer size a configurable setting via the new parameter <copybuffersize>.

      1. TotalTime-vs-CopyBufferSize.jpg
        40 kB
        Omkar Aradhya K S
      2. HADOOP-14407.branch2.002.patch
        13 kB
        Yongjun Zhang
      3. HADOOP-14407.004.patch
        12 kB
        Omkar Aradhya K S
      4. HADOOP-14407.004.patch
        12 kB
        Omkar Aradhya K S
      5. HADOOP-14407.004.branch2.patch
        12 kB
        Omkar Aradhya K S
      6. HADOOP-14407.003.patch
        12 kB
        Omkar Aradhya K S
      7. HADOOP-14407.002.patch
        11 kB
        Omkar Aradhya K S
      8. HADOOP-14407.002.patch
        11 kB
        Yongjun Zhang
      9. HADOOP-14407.001.patch
        11 kB
        Omkar Aradhya K S

        Activity

        Hide
        yzhangal Yongjun Zhang added a comment -

        Welcome Omkar Aradhya K S.

        Thanks Steve Loughran, I already committed to branch-2 yesterday.

        Show
        yzhangal Yongjun Zhang added a comment - Welcome Omkar Aradhya K S . Thanks Steve Loughran , I already committed to branch-2 yesterday.
        Hide
        stevel@apache.org Steve Loughran added a comment -

        re-opened the issue while a branch-2 version is done. Alternativelly, create a new JIRA, "backport HADOOP-14407 to branch-2" and work on things there.

        We now have distcp tests for S3 and Azure; it'd be good test those, which isn't automatically done by yetus....

        Show
        stevel@apache.org Steve Loughran added a comment - re-opened the issue while a branch-2 version is done. Alternativelly, create a new JIRA, "backport HADOOP-14407 to branch-2" and work on things there. We now have distcp tests for S3 and Azure; it'd be good test those, which isn't automatically done by yetus....
        Hide
        omkarksa Omkar Aradhya K S added a comment -

        Thanks Yongjun Zhang.

        Show
        omkarksa Omkar Aradhya K S added a comment - Thanks Yongjun Zhang .
        Hide
        yzhangal Yongjun Zhang added a comment -

        HI Omkar Aradhya K S,

        Thanks for looking into, and good point! This helped me to find a similar problem with HADOOP-11794's branch-2 patch. In trunk, the test exists in TestOptionParser, however, similar test exists in TestDistCpOptions in branch-2.

        I just updated branch-2 patches for both HADOOP-11794 and HADOOP-14407. So the issues are resolved.

        Thanks.

        Show
        yzhangal Yongjun Zhang added a comment - HI Omkar Aradhya K S , Thanks for looking into, and good point! This helped me to find a similar problem with HADOOP-11794 's branch-2 patch. In trunk, the test exists in TestOptionParser, however, similar test exists in TestDistCpOptions in branch-2. I just updated branch-2 patches for both HADOOP-11794 and HADOOP-14407 . So the issues are resolved. Thanks.
        Hide
        yzhangal Yongjun Zhang added a comment -

        Updated branch-2 patch HADOOP-14407.branch2.002.patch.

        Show
        yzhangal Yongjun Zhang added a comment - Updated branch-2 patch HADOOP-14407 .branch2.002.patch.
        Hide
        omkarksa Omkar Aradhya K S added a comment - - edited

        Yongjun Zhang Thanks for pointing this out! I have couple of doubts:
        1. My tests are present in "/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java" --> testToString(), testParseCopyBufferSize().
        2. The "/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpOptions.java --> " doesn't have "blocksPerChunk=0" so how was it passing earlier?
        3. Why are there 2 redundant "testToString()" in 2 different classes?
        4. Can I re-open this issue and add the patch here itself?
        5. What is the procedure to test branch-2 patches? I see a "-1" from Hadoop QA for the branch-2 patch above!

        Show
        omkarksa Omkar Aradhya K S added a comment - - edited Yongjun Zhang Thanks for pointing this out! I have couple of doubts: 1. My tests are present in "/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java" --> testToString(), testParseCopyBufferSize(). 2. The "/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpOptions.java --> " doesn't have "blocksPerChunk=0" so how was it passing earlier? 3. Why are there 2 redundant "testToString()" in 2 different classes? 4. Can I re-open this issue and add the patch here itself? 5. What is the procedure to test branch-2 patches? I see a "-1" from Hadoop QA for the branch-2 patch above!
        Hide
        yzhangal Yongjun Zhang added a comment -

        Welcome Omkar Aradhya K S.

        My bad, did not catch an issue in your branch-2 patch in time.

        ------------------------------------------------------
        Running org.apache.hadoop.tools.TestDistCpOptions
        Tests run: 24, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.266 sec <<< FAILURE! - in org.apache.hadoop.tools.TestDistCpOptions
        testToString(org.apache.hadoop.tools.TestDistCpOptions)  Time elapsed: 0.018 sec  <<< FAILURE!
        org.junit.ComparisonFailure: expected:<..., filtersFile='null'[]}> but was:<..., filtersFile='null'[, blocksPerChunk=0, copyBufferSize=8192]}>
        	at org.junit.Assert.assertEquals(Assert.java:115)
        	at org.junit.Assert.assertEquals(Assert.java:144)
        	at org.apache.hadoop.tools.TestDistCpOptions.testToString(TestDistCpOptions.java:317)
        

        TestDistCpOptions.java is somehow missing from the branch-2 patch, would you please create a new jira for branch-2 only and submit the patch asap?

        Thanks.

        Show
        yzhangal Yongjun Zhang added a comment - Welcome Omkar Aradhya K S . My bad, did not catch an issue in your branch-2 patch in time. ------------------------------------------------------ Running org.apache.hadoop.tools.TestDistCpOptions Tests run: 24, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.266 sec <<< FAILURE! - in org.apache.hadoop.tools.TestDistCpOptions testToString(org.apache.hadoop.tools.TestDistCpOptions) Time elapsed: 0.018 sec <<< FAILURE! org.junit.ComparisonFailure: expected:<..., filtersFile=' null '[]}> but was:<..., filtersFile=' null '[, blocksPerChunk=0, copyBufferSize=8192]}> at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.tools.TestDistCpOptions.testToString(TestDistCpOptions.java:317) TestDistCpOptions.java is somehow missing from the branch-2 patch, would you please create a new jira for branch-2 only and submit the patch asap? Thanks.
        Hide
        omkarksa Omkar Aradhya K S added a comment -

        Thanks Yongjun Zhang for the quick reviews and commits.

        Show
        omkarksa Omkar Aradhya K S added a comment - Thanks Yongjun Zhang for the quick reviews and commits.
        Hide
        yzhangal Yongjun Zhang added a comment -

        Committed to trunk and branch-2. Many thanks to Omkar Aradhya K S for the contribution!

        Show
        yzhangal Yongjun Zhang added a comment - Committed to trunk and branch-2. Many thanks to Omkar Aradhya K S for the contribution!
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 0s Docker mode activated.
        -1 patch 0m 10s HADOOP-14407 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.



        Subsystem Report/Notes
        JIRA Issue HADOOP-14407
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12869020/HADOOP-14407.004.branch2.patch
        Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12363/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. -1 patch 0m 10s HADOOP-14407 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. Subsystem Report/Notes JIRA Issue HADOOP-14407 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12869020/HADOOP-14407.004.branch2.patch Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12363/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        omkarksa Omkar Aradhya K S added a comment -

        Yongjun Zhang I have backported the feature to branch-2 and uploaded the patch (HADOOP-14407.004.branch2.patch). Please do the needful.

        Show
        omkarksa Omkar Aradhya K S added a comment - Yongjun Zhang I have backported the feature to branch-2 and uploaded the patch ( HADOOP-14407 .004.branch2.patch). Please do the needful.
        Hide
        omkarksa Omkar Aradhya K S added a comment -

        Feature patch backported to branch-2

        Show
        omkarksa Omkar Aradhya K S added a comment - Feature patch backported to branch-2
        Hide
        omkarksa Omkar Aradhya K S added a comment -

        I committed to trunk. When trying to backport to branch-2, saw quite some conflicts. Would you please help doing branch-2 version and other ones you prefer?

        Yongjun Zhang Thanks. OK, I will work on the branch-2 patch today.

        Show
        omkarksa Omkar Aradhya K S added a comment - I committed to trunk. When trying to backport to branch-2, saw quite some conflicts. Would you please help doing branch-2 version and other ones you prefer? Yongjun Zhang Thanks. OK, I will work on the branch-2 patch today.
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11752 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11752/)
        HADOOP-14407. DistCp - Introduce a configurable copy buffer size. (Omkar (yzhang: rev b4adc8392c1314d6d6fbdd00f2afb306ef20a650)

        • (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java
        • (edit) hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm
        • (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java
        • (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java
        • (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpContext.java
        • (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java
        • (edit) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpOptions.java
        • (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11752 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11752/ ) HADOOP-14407 . DistCp - Introduce a configurable copy buffer size. (Omkar (yzhang: rev b4adc8392c1314d6d6fbdd00f2afb306ef20a650) (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java (edit) hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpContext.java (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java (edit) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpOptions.java (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
        Hide
        yzhangal Yongjun Zhang added a comment -

        Hi Omkar Aradhya K S,

        I committed to trunk. When trying to backport to branch-2, saw quite some conflicts. Would you please help doing branch-2 version and other ones you prefer?

        Thanks much.

        Show
        yzhangal Yongjun Zhang added a comment - Hi Omkar Aradhya K S , I committed to trunk. When trying to backport to branch-2, saw quite some conflicts. Would you please help doing branch-2 version and other ones you prefer? Thanks much.
        Hide
        yzhangal Yongjun Zhang added a comment -

        Hi Omkar Aradhya K S,

        Thanks for the updated patch. +1 and I will commit soon.

        Show
        yzhangal Yongjun Zhang added a comment - Hi Omkar Aradhya K S , Thanks for the updated patch. +1 and I will commit soon.
        Hide
        hadoopqa Hadoop QA added a comment -
        +1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 17s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 15m 8s trunk passed
        +1 compile 0m 17s trunk passed
        +1 checkstyle 0m 16s trunk passed
        +1 mvnsite 0m 24s trunk passed
        +1 mvneclipse 0m 17s trunk passed
        +1 findbugs 0m 33s trunk passed
        +1 javadoc 0m 14s trunk passed
        +1 mvninstall 0m 21s the patch passed
        +1 compile 0m 19s the patch passed
        +1 javac 0m 19s the patch passed
        +1 checkstyle 0m 12s hadoop-tools/hadoop-distcp: The patch generated 0 new + 52 unchanged - 2 fixed = 52 total (was 54)
        +1 mvnsite 0m 21s the patch passed
        +1 mvneclipse 0m 15s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 0m 37s the patch passed
        +1 javadoc 0m 12s the patch passed
        +1 unit 12m 42s hadoop-distcp in the patch passed.
        +1 asflicense 0m 20s The patch does not generate ASF License warnings.
        34m 7s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:14b5c93
        JIRA Issue HADOOP-14407
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12868796/HADOOP-14407.004.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux ca75e484de72 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 40e6a85
        Default Java 1.8.0_131
        findbugs v3.1.0-RC1
        Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12359/testReport/
        modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp
        Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12359/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 15m 8s trunk passed +1 compile 0m 17s trunk passed +1 checkstyle 0m 16s trunk passed +1 mvnsite 0m 24s trunk passed +1 mvneclipse 0m 17s trunk passed +1 findbugs 0m 33s trunk passed +1 javadoc 0m 14s trunk passed +1 mvninstall 0m 21s the patch passed +1 compile 0m 19s the patch passed +1 javac 0m 19s the patch passed +1 checkstyle 0m 12s hadoop-tools/hadoop-distcp: The patch generated 0 new + 52 unchanged - 2 fixed = 52 total (was 54) +1 mvnsite 0m 21s the patch passed +1 mvneclipse 0m 15s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 37s the patch passed +1 javadoc 0m 12s the patch passed +1 unit 12m 42s hadoop-distcp in the patch passed. +1 asflicense 0m 20s The patch does not generate ASF License warnings. 34m 7s Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HADOOP-14407 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12868796/HADOOP-14407.004.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux ca75e484de72 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 40e6a85 Default Java 1.8.0_131 findbugs v3.1.0-RC1 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12359/testReport/ modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12359/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        omkarksa Omkar Aradhya K S added a comment -

        submit new patch - removed "long" variable type for copybuffersize

        Show
        omkarksa Omkar Aradhya K S added a comment - submit new patch - removed "long" variable type for copybuffersize
        Hide
        omkarksa Omkar Aradhya K S added a comment -

        uploading new patch - removed "long" variable type for copybuffersize

        Show
        omkarksa Omkar Aradhya K S added a comment - uploading new patch - removed "long" variable type for copybuffersize
        Hide
        hadoopqa Hadoop QA added a comment -
        +1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 16s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 13m 52s trunk passed
        +1 compile 0m 19s trunk passed
        +1 checkstyle 0m 16s trunk passed
        +1 mvnsite 0m 20s trunk passed
        +1 mvneclipse 0m 17s trunk passed
        +1 findbugs 0m 27s trunk passed
        +1 javadoc 0m 13s trunk passed
        +1 mvninstall 0m 17s the patch passed
        +1 compile 0m 17s the patch passed
        +1 javac 0m 17s the patch passed
        +1 checkstyle 0m 11s hadoop-tools/hadoop-distcp: The patch generated 0 new + 51 unchanged - 2 fixed = 51 total (was 53)
        +1 mvnsite 0m 18s the patch passed
        +1 mvneclipse 0m 13s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 0m 31s the patch passed
        +1 javadoc 0m 10s the patch passed
        +1 unit 12m 54s hadoop-distcp in the patch passed.
        +1 asflicense 0m 22s The patch does not generate ASF License warnings.
        32m 35s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:14b5c93
        JIRA Issue HADOOP-14407
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12868693/HADOOP-14407.003.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 25e4f3c9034e 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / b46cd31
        Default Java 1.8.0_131
        findbugs v3.1.0-RC1
        Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12353/testReport/
        modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp
        Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12353/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 16s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 13m 52s trunk passed +1 compile 0m 19s trunk passed +1 checkstyle 0m 16s trunk passed +1 mvnsite 0m 20s trunk passed +1 mvneclipse 0m 17s trunk passed +1 findbugs 0m 27s trunk passed +1 javadoc 0m 13s trunk passed +1 mvninstall 0m 17s the patch passed +1 compile 0m 17s the patch passed +1 javac 0m 17s the patch passed +1 checkstyle 0m 11s hadoop-tools/hadoop-distcp: The patch generated 0 new + 51 unchanged - 2 fixed = 51 total (was 53) +1 mvnsite 0m 18s the patch passed +1 mvneclipse 0m 13s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 31s the patch passed +1 javadoc 0m 10s the patch passed +1 unit 12m 54s hadoop-distcp in the patch passed. +1 asflicense 0m 22s The patch does not generate ASF License warnings. 32m 35s Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HADOOP-14407 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12868693/HADOOP-14407.003.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 25e4f3c9034e 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / b46cd31 Default Java 1.8.0_131 findbugs v3.1.0-RC1 Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12353/testReport/ modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12353/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        omkarksa Omkar Aradhya K S added a comment -

        Submit new patch with cosmetic changes

        Show
        omkarksa Omkar Aradhya K S added a comment - Submit new patch with cosmetic changes
        Hide
        omkarksa Omkar Aradhya K S added a comment -

        new patch with cosmetic changes

        Show
        omkarksa Omkar Aradhya K S added a comment - new patch with cosmetic changes
        Hide
        yzhangal Yongjun Zhang added a comment - - edited

        Hi Omkar Aradhya K S,

        Thanks Akira Ajisaka for pointing out that the build was using the "jpg" file you uploaded as the patch (precommit jenkins job chose the most recent attached file as the patch), thus the test failed. I uoloaded the patch rev2 again, and it triggered a new run. And the result shows checkstyle issues I commented earlier.

        https://builds.apache.org/job/PreCommit-HADOOP-Build/12351/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-distcp.txt

        Would you please address them and upload a new rev?

        Thanks.

        Show
        yzhangal Yongjun Zhang added a comment - - edited Hi Omkar Aradhya K S , Thanks Akira Ajisaka for pointing out that the build was using the "jpg" file you uploaded as the patch (precommit jenkins job chose the most recent attached file as the patch), thus the test failed. I uoloaded the patch rev2 again, and it triggered a new run. And the result shows checkstyle issues I commented earlier. https://builds.apache.org/job/PreCommit-HADOOP-Build/12351/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-distcp.txt Would you please address them and upload a new rev? Thanks.
        Hide
        omkarksa Omkar Aradhya K S added a comment -

        Thanks for the updated patch Omkar Aradhya K S, are you guys still looking into setting input and output buffer to different size? Or any chance we need to do that in the future?

        Yongjun Zhang Thanks for checking the patch. As explained in the previous commit, we don't need to do this change since even a small copybiffersize can give huge boos in performance.

        Somehow your submitting the patch did not trigger a jenkins test, maybe there is an infra issue.

        Thanks for pointing this out. Yes, even I waited for quite some time, but there was no result! Do I need any additional permissions for this?
        Also, can you point out how exactly you triggered the build? May be I missed something?

        Show
        omkarksa Omkar Aradhya K S added a comment - Thanks for the updated patch Omkar Aradhya K S, are you guys still looking into setting input and output buffer to different size? Or any chance we need to do that in the future? Yongjun Zhang Thanks for checking the patch. As explained in the previous commit, we don't need to do this change since even a small copybiffersize can give huge boos in performance. Somehow your submitting the patch did not trigger a jenkins test, maybe there is an infra issue. Thanks for pointing this out. Yes, even I waited for quite some time, but there was no result! Do I need any additional permissions for this? Also, can you point out how exactly you triggered the build? May be I missed something?
        Hide
        hadoopqa Hadoop QA added a comment -
        +1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 25s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 15m 7s trunk passed
        +1 compile 0m 21s trunk passed
        +1 checkstyle 0m 14s trunk passed
        +1 mvnsite 0m 23s trunk passed
        +1 mvneclipse 0m 18s trunk passed
        +1 findbugs 0m 35s trunk passed
        +1 javadoc 0m 16s trunk passed
        +1 mvninstall 0m 21s the patch passed
        +1 compile 0m 19s the patch passed
        +1 javac 0m 19s the patch passed
        -0 checkstyle 0m 12s hadoop-tools/hadoop-distcp: The patch generated 5 new + 52 unchanged - 1 fixed = 57 total (was 53)
        +1 mvnsite 0m 22s the patch passed
        +1 mvneclipse 0m 15s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 0m 39s the patch passed
        +1 javadoc 0m 12s the patch passed
        +1 unit 14m 13s hadoop-distcp in the patch passed.
        +1 asflicense 0m 23s The patch does not generate ASF License warnings.
        35m 58s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:14b5c93
        JIRA Issue HADOOP-14407
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12868655/HADOOP-14407.002.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux d540388b81b0 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / ef9e536
        Default Java 1.8.0_131
        findbugs v3.1.0-RC1
        checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/12351/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-distcp.txt
        Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12351/testReport/
        modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp
        Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12351/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 25s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 15m 7s trunk passed +1 compile 0m 21s trunk passed +1 checkstyle 0m 14s trunk passed +1 mvnsite 0m 23s trunk passed +1 mvneclipse 0m 18s trunk passed +1 findbugs 0m 35s trunk passed +1 javadoc 0m 16s trunk passed +1 mvninstall 0m 21s the patch passed +1 compile 0m 19s the patch passed +1 javac 0m 19s the patch passed -0 checkstyle 0m 12s hadoop-tools/hadoop-distcp: The patch generated 5 new + 52 unchanged - 1 fixed = 57 total (was 53) +1 mvnsite 0m 22s the patch passed +1 mvneclipse 0m 15s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 39s the patch passed +1 javadoc 0m 12s the patch passed +1 unit 14m 13s hadoop-distcp in the patch passed. +1 asflicense 0m 23s The patch does not generate ASF License warnings. 35m 58s Subsystem Report/Notes Docker Image:yetus/hadoop:14b5c93 JIRA Issue HADOOP-14407 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12868655/HADOOP-14407.002.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux d540388b81b0 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / ef9e536 Default Java 1.8.0_131 findbugs v3.1.0-RC1 checkstyle https://builds.apache.org/job/PreCommit-HADOOP-Build/12351/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-distcp.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/12351/testReport/ modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12351/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        yzhangal Yongjun Zhang added a comment -

        I triggered the jenkins test manually at https://builds.apache.org/job/PreCommit-HADOOP-Build//12347.

        Show
        yzhangal Yongjun Zhang added a comment - I triggered the jenkins test manually at https://builds.apache.org/job/PreCommit-HADOOP-Build//12347 .
        Hide
        yzhangal Yongjun Zhang added a comment -

        Thanks Steve Loughran.

        .that's something whoever commits it needs to do

        I personally think it's better for developer to clean it up so jenkins test reports clean result. Especially for new contributors, once they are aware of this, they can easily do so before submitting the patch. It'd be nice that the last patch rev attached the jira is what get committed without change.

        Thanks for the updated patch Omkar Aradhya K S, are you guys still looking into setting input and output buffer to different size? Or any chance we need to do that in the future? Somehow your submitting the patch did not trigger a jenkins test, maybe there is an infra issue.

        Show
        yzhangal Yongjun Zhang added a comment - Thanks Steve Loughran . .that's something whoever commits it needs to do I personally think it's better for developer to clean it up so jenkins test reports clean result. Especially for new contributors, once they are aware of this, they can easily do so before submitting the patch. It'd be nice that the last patch rev attached the jira is what get committed without change. Thanks for the updated patch Omkar Aradhya K S , are you guys still looking into setting input and output buffer to different size? Or any chance we need to do that in the future? Somehow your submitting the patch did not trigger a jenkins test, maybe there is an infra issue.
        Hide
        omkarksa Omkar Aradhya K S added a comment -

        Also, we will do more investigations into introducing both input and output copybuffersize configurations.

        We did a small set of runs to see at what level the copybuffersize stagnates in performance:

        In this case, with copybuffersize set to just 128KB, we get >3x performance!

        If there is benefit of doing this I will submit a new patch with the changes or else we will go ahead with this patch.

        Since even a small increase in the copybuffersize give the desired performance, we will not need two separate copybuffersize for input and output.

        Show
        omkarksa Omkar Aradhya K S added a comment - Also, we will do more investigations into introducing both input and output copybuffersize configurations. We did a small set of runs to see at what level the copybuffersize stagnates in performance: In this case, with copybuffersize set to just 128KB, we get >3x performance! If there is benefit of doing this I will submit a new patch with the changes or else we will go ahead with this patch. Since even a small increase in the copybuffersize give the desired performance, we will not need two separate copybuffersize for input and output.
        Hide
        stevel@apache.org Steve Loughran added a comment -

        (I wouldn't worry about trailing whitespace...that's something whoever commits it needs to do)

        git alias for a "git apx" command to do the merge

        alias.apx=apply -3 --verbose --whitespace=fix
        
        Show
        stevel@apache.org Steve Loughran added a comment - (I wouldn't worry about trailing whitespace...that's something whoever commits it needs to do) git alias for a "git apx" command to do the merge alias.apx=apply -3 --verbose --whitespace=fix
        Hide
        omkarksa Omkar Aradhya K S added a comment -

        Yongjun Zhang Thanks for reviewing. Please find attached the new patch with the fixes.

        Show
        omkarksa Omkar Aradhya K S added a comment - Yongjun Zhang Thanks for reviewing. Please find attached the new patch with the fixes.
        Hide
        omkarksa Omkar Aradhya K S added a comment -

        Attached new patch (HADOOP-14407.002.patch) with all required fixes.

        Show
        omkarksa Omkar Aradhya K S added a comment - Attached new patch ( HADOOP-14407 .002.patch) with all required fixes.
        Hide
        yzhangal Yongjun Zhang added a comment -

        Hi Omkar Aradhya K S,

        The patch looks good, except a few cosmetic things.

        • The patch failed to apply. Try using "git diff --no-prefix HEAD~ " to generate the patch.
        • some trailing white spaces, use "git apply --whitespace fix" to remove.
        • DistCpOptions.java, line exceeding 80 chars

        Thanks.

        Show
        yzhangal Yongjun Zhang added a comment - Hi Omkar Aradhya K S , The patch looks good, except a few cosmetic things. The patch failed to apply. Try using "git diff --no-prefix HEAD~ " to generate the patch. some trailing white spaces, use "git apply --whitespace fix" to remove. DistCpOptions.java, line exceeding 80 chars Thanks.
        Hide
        yzhangal Yongjun Zhang added a comment -
        Show
        yzhangal Yongjun Zhang added a comment - Hi Omkar Aradhya K S , Sorry for the delayed review, would you please update the patch to fix the issue reported here https://issues.apache.org/jira/browse/HADOOP-14407?focusedCommentId=16011312&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16011312 Thanks.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 0s Docker mode activated.
        -1 patch 0m 8s HADOOP-14407 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.



        Subsystem Report/Notes
        JIRA Issue HADOOP-14407
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12867756/HADOOP-14407.001.patch
        Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12319/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. -1 patch 0m 8s HADOOP-14407 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. Subsystem Report/Notes JIRA Issue HADOOP-14407 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12867756/HADOOP-14407.001.patch Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/12319/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        omkarksa Omkar Aradhya K S added a comment -

        Yongjun Zhang I have submitted the patch. Could you please check this (HADOOP-14407.001.patch)?
        Also, we will do more investigations into introducing both input and output copybuffersize configurations.
        If there is benefit of doing this I will submit a new patch with the changes or else we will go ahead with this patch.

        Show
        omkarksa Omkar Aradhya K S added a comment - Yongjun Zhang I have submitted the patch. Could you please check this ( HADOOP-14407 .001.patch)? Also, we will do more investigations into introducing both input and output copybuffersize configurations. If there is benefit of doing this I will submit a new patch with the changes or else we will go ahead with this patch.
        Hide
        omkarksa Omkar Aradhya K S added a comment -

        Initial patch with the required changes - HADOOP-14407.001.patch

        Show
        omkarksa Omkar Aradhya K S added a comment - Initial patch with the required changes - HADOOP-14407 .001.patch
        Hide
        omkarksa Omkar Aradhya K S added a comment -

        Thanks Yongjun Zhang.
        We found that we don't need 2 buffers.
        Just making the existing copy buffer size configurable should do.
        I will submit the patch soon.

        Show
        omkarksa Omkar Aradhya K S added a comment - Thanks Yongjun Zhang . We found that we don't need 2 buffers. Just making the existing copy buffer size configurable should do. I will submit the patch soon.
        Hide
        yzhangal Yongjun Zhang added a comment -

        Thanks for offering to work on this Omkar. Assigning to you.

        Show
        yzhangal Yongjun Zhang added a comment - Thanks for offering to work on this Omkar. Assigning to you.
        Hide
        yzhangal Yongjun Zhang added a comment -

        Hi Omkar Aradhya K S, thanks for the good finding. Do we want to have separate config for the input and output buffer size? In your test, did you use the same buf size for both?

        Thanks.

        Show
        yzhangal Yongjun Zhang added a comment - Hi Omkar Aradhya K S , thanks for the good finding. Do we want to have separate config for the input and output buffer size? In your test, did you use the same buf size for both? Thanks.

          People

          • Assignee:
            omkarksa Omkar Aradhya K S
            Reporter:
            omkarksa Omkar Aradhya K S
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development