Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-11827

Speed-up distcp buildListing() using threadpool

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.7.0, 2.7.1
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: tools/distcp
    • Labels:
      None
    • Target Version/s:

      Description

      For very large source trees on s3 distcp is taking long time to build file listing (client code, before starting mappers). For a dataset I used (1.5M files, 50K dirs) it was taking 65 minutes before my fix in HADOOP-11785 and 36 minutes after the fix).

      1. HADOOP-11827.patch
        37 kB
        Zoran Dimitrijevic
      2. HADOOP-11827-02.patch
        37 kB
        Zoran Dimitrijevic
      3. HADOOP-11827-03.patch
        38 kB
        Zoran Dimitrijevic
      4. HADOOP-11827-04.patch
        38 kB
        Ravi Prakash

        Issue Links

          Activity

          Hide
          3opan Zoran Dimitrijevic added a comment -

          test patch report from my laptop:

          +1 overall.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Show
          3opan Zoran Dimitrijevic added a comment - test patch report from my laptop: +1 overall . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 3.0.1) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings.
          Hide
          3opan Zoran Dimitrijevic added a comment -

          Performance results and charts for dataset I used (1.5M files and approx 50K dirs):

          https://docs.google.com/spreadsheets/d/1qJfO9ZhPXuGCpHyfX1NLE0Zm_NB39gn-cELECShd_zk/edit#gid=0

          Please note that there are two sheets (s3n -> hdfs and hdfs -> hdfs). Main improvement is when source is in s3. Improvements when source is hdfs is good as well, but since current distcp has to sort input file total improvement is not as important).

          TODO: We can sort only directories which would further improve startup time.

          Show
          3opan Zoran Dimitrijevic added a comment - Performance results and charts for dataset I used (1.5M files and approx 50K dirs): https://docs.google.com/spreadsheets/d/1qJfO9ZhPXuGCpHyfX1NLE0Zm_NB39gn-cELECShd_zk/edit#gid=0 Please note that there are two sheets (s3n -> hdfs and hdfs -> hdfs). Main improvement is when source is in s3. Improvements when source is hdfs is good as well, but since current distcp has to sort input file total improvement is not as important). TODO: We can sort only directories which would further improve startup time.
          Hide
          3opan Zoran Dimitrijevic added a comment -

          small change to handle all exceptions in worker.

          +1 overall.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Show
          3opan Zoran Dimitrijevic added a comment - small change to handle all exceptions in worker. +1 overall . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 3.0.1) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings.
          Hide
          3opan Zoran Dimitrijevic added a comment -

          Speedup for s3n source tree of 1.5M/50k dirs:
          current distcp: 36min
          2 threads: 17min
          5 threads: 7min
          10 threads: 3.5 min
          20 threads: 2.3 min

          For same source dataset on hdfs:
          current distcp: 206 seconds
          1 thread: 204 sec
          2 threads: 257 sec (not yet sure why, will repeat the experiment)
          3 threads: 154 sec
          5 threads: 94 sec
          10 threads: 51 sec
          20 threads: 45 sec

          Show
          3opan Zoran Dimitrijevic added a comment - Speedup for s3n source tree of 1.5M/50k dirs: current distcp: 36min 2 threads: 17min 5 threads: 7min 10 threads: 3.5 min 20 threads: 2.3 min For same source dataset on hdfs: current distcp: 206 seconds 1 thread: 204 sec 2 threads: 257 sec (not yet sure why, will repeat the experiment) 3 threads: 154 sec 5 threads: 94 sec 10 threads: 51 sec 20 threads: 45 sec
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12724782/HADOOP-11827-02.patch
          against trunk revision f8f5887.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-tools/hadoop-distcp.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/6098//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/6098//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12724782/HADOOP-11827-02.patch against trunk revision f8f5887. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-tools/hadoop-distcp. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/6098//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/6098//console This message is automatically generated.
          Hide
          raviprak Ravi Prakash added a comment -

          Thanks a lot for the patch Zoran! Your efforts to improve Distcp are much appreciated.

          1. Annotate maxNumListstatusThreads as @VisibleForTesting . Or should we define it in DistCpConstants?
          2. Look at Future and ExecutorService (instead of your own Producer Consumer implementation)
          Show
          raviprak Ravi Prakash added a comment - Thanks a lot for the patch Zoran! Your efforts to improve Distcp are much appreciated. Annotate maxNumListstatusThreads as @VisibleForTesting . Or should we define it in DistCpConstants? Look at Future and ExecutorService (instead of your own Producer Consumer implementation)
          Hide
          raviprak Ravi Prakash added a comment -

          Sorry for the premature comment. Please ignore it

          Show
          raviprak Ravi Prakash added a comment - Sorry for the premature comment. Please ignore it
          Hide
          raviprak Ravi Prakash added a comment -

          Hi Zoran! Thanks for all your work

          Here's a preliminary review. I haven't reviewed test code at all

          1. Please remove todo(zoran). No FNF shouldn't lead to a retry. We should keep the behavior same (even though I think it should ideally exit with an exception because someone is likely modifying the tree during the distcp)
          2. listStatuse -> listStatus ?
          3. Could you please document the max 40 thread limit?
          4. Any reason you don't want to default numListstatusThreads to 1 (instead of 0)?
          5. Do you really need protected SimpleCopyListing(Configuration configuration, Credentials credentials, int numListstatusThreads) ? Could you just set configuration it in configuration?
          6. Could you please document FileStatusProcessor ?
          7. I think its useful to have a wrapper ProducerConsumer implementation IF there isn't a way to trivially accomplish it. We should either move it into the org.apache.hadoop.tools.util package or use the trivial implementation
          8. We can probably make maybePrintStats more efficient if we chose a number which is a power of 2 rather than 100000
          9. dirCnt isn't used . getChildren too.
          10. {{new FileStatusProcessor(sourcePathRoot.getFileSystem(getConf()))); }} may not be the right thing to do. If two sources (from different file systems are used), would this cause an error?
          11. ProducerConsumer.take() calls LinkedBlockingQueue.take() which claims to block. Should the javadoc say non-blocking?

          These were my questions so far. I'll keep reviewing, but in the meantime we can multithread progress on this issue

          Show
          raviprak Ravi Prakash added a comment - Hi Zoran! Thanks for all your work Here's a preliminary review. I haven't reviewed test code at all Please remove todo(zoran). No FNF shouldn't lead to a retry. We should keep the behavior same (even though I think it should ideally exit with an exception because someone is likely modifying the tree during the distcp) listStatuse -> listStatus ? Could you please document the max 40 thread limit? Any reason you don't want to default numListstatusThreads to 1 (instead of 0)? Do you really need protected SimpleCopyListing(Configuration configuration, Credentials credentials, int numListstatusThreads) ? Could you just set configuration it in configuration? Could you please document FileStatusProcessor ? I think its useful to have a wrapper ProducerConsumer implementation IF there isn't a way to trivially accomplish it. We should either move it into the org.apache.hadoop.tools.util package or use the trivial implementation We can probably make maybePrintStats more efficient if we chose a number which is a power of 2 rather than 100000 dirCnt isn't used . getChildren too. {{new FileStatusProcessor(sourcePathRoot.getFileSystem(getConf()))); }} may not be the right thing to do. If two sources (from different file systems are used), would this cause an error? ProducerConsumer.take() calls LinkedBlockingQueue.take() which claims to block. Should the javadoc say non-blocking? These were my questions so far. I'll keep reviewing, but in the meantime we can multithread progress on this issue
          Hide
          3opan Zoran Dimitrijevic added a comment -

          Thanks for the comments Allen. I've addressed most of them:

          1, 2, 3: done
          4: in order to prefer flags over properties, i needed a value to know whether flag was set or not. 0 seemed easier than yet another bool.
          5. I added it so that I can have minimal changes in the unittest (rerun tests for various number of threads using org.junit.runners.Parameterized
          6. done
          7. agree. I wanted to make multithreaded logic outside of SimpleCopyListing.java but if you think it's an overkill, I can refactor. But it'll be uglier and if we need this again, we won't have the wrapper.
          8. considering how much code is invoked for each of these simple MaybePrintStats, I don't think it's worth doing it. But, I don't have strong opinions, I just think we should print some progress since this stage can be order of tens of minutes.
          9. removed.
          10. In current code, we use the same file system instance, so I don't think it's a problem. I use one per thread since we have small number of threads and these run listStatus in parallel.
          11. changed the docs - both are blocking, but one can be interrupted by exceptions, and then the user must handle it. Please suggest better names and I'll refactor it. Or maybe just keep one.

          Show
          3opan Zoran Dimitrijevic added a comment - Thanks for the comments Allen. I've addressed most of them: 1, 2, 3: done 4: in order to prefer flags over properties, i needed a value to know whether flag was set or not. 0 seemed easier than yet another bool. 5. I added it so that I can have minimal changes in the unittest (rerun tests for various number of threads using org.junit.runners.Parameterized 6. done 7. agree. I wanted to make multithreaded logic outside of SimpleCopyListing.java but if you think it's an overkill, I can refactor. But it'll be uglier and if we need this again, we won't have the wrapper. 8. considering how much code is invoked for each of these simple MaybePrintStats, I don't think it's worth doing it. But, I don't have strong opinions, I just think we should print some progress since this stage can be order of tens of minutes. 9. removed. 10. In current code, we use the same file system instance, so I don't think it's a problem. I use one per thread since we have small number of threads and these run listStatus in parallel. 11. changed the docs - both are blocking, but one can be interrupted by exceptions, and then the user must handle it. Please suggest better names and I'll refactor it. Or maybe just keep one.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12726791/HADOOP-11827-03.patch
          against trunk revision d52de61.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          -1 eclipse:eclipse. The patch failed to build with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/6135//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/6135//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726791/HADOOP-11827-03.patch against trunk revision d52de61. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. -1 eclipse:eclipse . The patch failed to build with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/6135//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/6135//console This message is automatically generated.
          Hide
          3opan Zoran Dimitrijevic added a comment -

          Sorry Ravi. Thanks for the comments Ravi!

          Show
          3opan Zoran Dimitrijevic added a comment - Sorry Ravi. Thanks for the comments Ravi!
          Hide
          raviprak Ravi Prakash added a comment -

          Thanks for your contribution Zoran. I agree with all your points. I've made some small changes to the test code (to get rid of silly warnings)

          I'll commit this end-of-day if there are no objections.

          Show
          raviprak Ravi Prakash added a comment - Thanks for your contribution Zoran. I agree with all your points. I've made some small changes to the test code (to get rid of silly warnings) I'll commit this end-of-day if there are no objections.
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12726960/HADOOP-11827-04.patch
          against trunk revision 997408e.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-tools/hadoop-distcp.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/6140//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/6140//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726960/HADOOP-11827-04.patch against trunk revision 997408e. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 4 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-tools/hadoop-distcp. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/6140//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/6140//console This message is automatically generated.
          Hide
          3opan Zoran Dimitrijevic added a comment -

          LGTM++

          Show
          3opan Zoran Dimitrijevic added a comment - LGTM++
          Hide
          raviprak Ravi Prakash added a comment -

          Thanks Zoran! I've committed this to trunk and branch-2. It should be released with 2.8.0

          Show
          raviprak Ravi Prakash added a comment - Thanks Zoran! I've committed this to trunk and branch-2. It should be released with 2.8.0
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #7630 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7630/)
          HADOOP-11827. Speed-up distcp buildListing() using threadpool (Zoran Dimitrijevic via raviprak) (raviprak: rev cfba355052df15f8eb6cc9b8e90e2d8492bec7d7)

          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyListing.java
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestIntegration.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkRequestProcessor.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkReport.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java
          • hadoop-common-project/hadoop-common/CHANGES.txt
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkRequest.java
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/util/TestProducerConsumer.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ProducerConsumer.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #7630 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7630/ ) HADOOP-11827 . Speed-up distcp buildListing() using threadpool (Zoran Dimitrijevic via raviprak) (raviprak: rev cfba355052df15f8eb6cc9b8e90e2d8492bec7d7) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyListing.java hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestIntegration.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkRequestProcessor.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkReport.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java hadoop-common-project/hadoop-common/CHANGES.txt hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkRequest.java hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/util/TestProducerConsumer.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ProducerConsumer.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #2103 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2103/)
          HADOOP-11827. Speed-up distcp buildListing() using threadpool (Zoran Dimitrijevic via raviprak) (raviprak: rev cfba355052df15f8eb6cc9b8e90e2d8492bec7d7)

          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/util/TestProducerConsumer.java
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkRequest.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java
          • hadoop-common-project/hadoop-common/CHANGES.txt
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestIntegration.java
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyListing.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkRequestProcessor.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkReport.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ProducerConsumer.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2103 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2103/ ) HADOOP-11827 . Speed-up distcp buildListing() using threadpool (Zoran Dimitrijevic via raviprak) (raviprak: rev cfba355052df15f8eb6cc9b8e90e2d8492bec7d7) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/util/TestProducerConsumer.java hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkRequest.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java hadoop-common-project/hadoop-common/CHANGES.txt hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestIntegration.java hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyListing.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkRequestProcessor.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkReport.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ProducerConsumer.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #162 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/162/)
          HADOOP-11827. Speed-up distcp buildListing() using threadpool (Zoran Dimitrijevic via raviprak) (raviprak: rev cfba355052df15f8eb6cc9b8e90e2d8492bec7d7)

          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyListing.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkRequestProcessor.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkReport.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestIntegration.java
          • hadoop-common-project/hadoop-common/CHANGES.txt
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkRequest.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ProducerConsumer.java
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/util/TestProducerConsumer.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #162 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/162/ ) HADOOP-11827 . Speed-up distcp buildListing() using threadpool (Zoran Dimitrijevic via raviprak) (raviprak: rev cfba355052df15f8eb6cc9b8e90e2d8492bec7d7) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyListing.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkRequestProcessor.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkReport.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestIntegration.java hadoop-common-project/hadoop-common/CHANGES.txt hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkRequest.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ProducerConsumer.java hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/util/TestProducerConsumer.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #171 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/171/)
          HADOOP-11827. Speed-up distcp buildListing() using threadpool (Zoran Dimitrijevic via raviprak) (raviprak: rev cfba355052df15f8eb6cc9b8e90e2d8492bec7d7)

          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyListing.java
          • hadoop-common-project/hadoop-common/CHANGES.txt
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkRequestProcessor.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkReport.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestIntegration.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkRequest.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ProducerConsumer.java
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/util/TestProducerConsumer.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #171 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/171/ ) HADOOP-11827 . Speed-up distcp buildListing() using threadpool (Zoran Dimitrijevic via raviprak) (raviprak: rev cfba355052df15f8eb6cc9b8e90e2d8492bec7d7) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyListing.java hadoop-common-project/hadoop-common/CHANGES.txt hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkRequestProcessor.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkReport.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestIntegration.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkRequest.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ProducerConsumer.java hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/util/TestProducerConsumer.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk #905 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/905/)
          HADOOP-11827. Speed-up distcp buildListing() using threadpool (Zoran Dimitrijevic via raviprak) (raviprak: rev cfba355052df15f8eb6cc9b8e90e2d8492bec7d7)

          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkRequest.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ProducerConsumer.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkRequestProcessor.java
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyListing.java
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/util/TestProducerConsumer.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkReport.java
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestIntegration.java
          • hadoop-common-project/hadoop-common/CHANGES.txt
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #905 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/905/ ) HADOOP-11827 . Speed-up distcp buildListing() using threadpool (Zoran Dimitrijevic via raviprak) (raviprak: rev cfba355052df15f8eb6cc9b8e90e2d8492bec7d7) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkRequest.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ProducerConsumer.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkRequestProcessor.java hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyListing.java hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/util/TestProducerConsumer.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkReport.java hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestIntegration.java hadoop-common-project/hadoop-common/CHANGES.txt hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #172 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/172/)
          HADOOP-11827. Speed-up distcp buildListing() using threadpool (Zoran Dimitrijevic via raviprak) (raviprak: rev cfba355052df15f8eb6cc9b8e90e2d8492bec7d7)

          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkRequest.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java
          • hadoop-common-project/hadoop-common/CHANGES.txt
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkReport.java
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/util/TestProducerConsumer.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ProducerConsumer.java
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyListing.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkRequestProcessor.java
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestIntegration.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #172 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/172/ ) HADOOP-11827 . Speed-up distcp buildListing() using threadpool (Zoran Dimitrijevic via raviprak) (raviprak: rev cfba355052df15f8eb6cc9b8e90e2d8492bec7d7) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkRequest.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java hadoop-common-project/hadoop-common/CHANGES.txt hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkReport.java hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/util/TestProducerConsumer.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ProducerConsumer.java hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyListing.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkRequestProcessor.java hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestIntegration.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #2121 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2121/)
          HADOOP-11827. Speed-up distcp buildListing() using threadpool (Zoran Dimitrijevic via raviprak) (raviprak: rev cfba355052df15f8eb6cc9b8e90e2d8492bec7d7)

          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ProducerConsumer.java
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/util/TestProducerConsumer.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java
          • hadoop-common-project/hadoop-common/CHANGES.txt
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkRequestProcessor.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkReport.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestIntegration.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkRequest.java
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyListing.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2121 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2121/ ) HADOOP-11827 . Speed-up distcp buildListing() using threadpool (Zoran Dimitrijevic via raviprak) (raviprak: rev cfba355052df15f8eb6cc9b8e90e2d8492bec7d7) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ProducerConsumer.java hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/util/TestProducerConsumer.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java hadoop-common-project/hadoop-common/CHANGES.txt hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkRequestProcessor.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkReport.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestIntegration.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/WorkRequest.java hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyListing.java
          Hide
          yzhangal Yongjun Zhang added a comment -

          Thanks for working on this issue guys.

          Hi Zoran Dimitrijevic and Ravi Prakash, I noticed that we are using "-numListstatusThreads" instead of "-numListstatusThreads" here in the patch, wonder if it's intended or an overlook? Other switches are using "".

          Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Thanks for working on this issue guys. Hi Zoran Dimitrijevic and Ravi Prakash , I noticed that we are using "- numListstatusThreads" instead of "-numListstatusThreads" here in the patch, wonder if it's intended or an overlook? Other switches are using " ". Thanks.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Sorry for the bad format, basically I was asking why use -- rather than - for command line switch here, thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Sorry for the bad format, basically I was asking why use -- rather than - for command line switch here, thanks.
          Hide
          3opan Zoran Dimitrijevic added a comment -

          I did this long time ago... I have no preference about using - or --. I think I did it the way all other command line arguments in distcp were done, so if we need any fix it will probably be for all options?

          Show
          3opan Zoran Dimitrijevic added a comment - I did this long time ago... I have no preference about using - or --. I think I did it the way all other command line arguments in distcp were done, so if we need any fix it will probably be for all options?

            People

            • Assignee:
              3opan Zoran Dimitrijevic
              Reporter:
              3opan Zoran Dimitrijevic
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Due:
                Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 24h
                24h
                Remaining:
                Remaining Estimate - 24h
                24h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development