Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-8828

Utilize Snapshot diff report to build diff copy list in distcp

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: distcp, snapshots
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Some users reported huge time cost to build file copy list in distcp. (30 hours for 1.6M files). We can leverage snapshot diff report to build file copy list including files/dirs which are changes only between two snapshots (or a snapshot and a normal dir). It speed up the process in two folds: 1. less copy list building time. 2. less file copy MR jobs.

      HDFS snapshot diff report provide information about file/directory creation, deletion, rename and modification between two snapshots or a snapshot and a normal directory. HDFS-7535 synchronize deletion and rename, then fallback to the default distcp. So it still relies on default distcp to building complete list of files under the source dir. This patch only puts creation and modification files into the copy list based on snapshot diff report. We can minimize the number of files to copy.

      1. HDFS-8828.011.patch
        54 kB
        Yufei Gu
      2. HDFS-8828.010.patch
        55 kB
        Yufei Gu
      3. HDFS-8828.009.patch
        54 kB
        Yufei Gu
      4. HDFS-8828.008.patch
        53 kB
        Yufei Gu
      5. HDFS-8828.007.patch
        41 kB
        Yufei Gu
      6. HDFS-8828.006.patch
        41 kB
        Yufei Gu
      7. HDFS-8828.005.patch
        32 kB
        Yufei Gu
      8. HDFS-8828.004.patch
        30 kB
        Yufei Gu
      9. HDFS-8828.003.patch
        27 kB
        Yufei Gu
      10. HDFS-8828.002.patch
        27 kB
        Yufei Gu
      11. HDFS-8828.001.patch
        27 kB
        Yufei Gu

        Issue Links

          Activity

          Hide
          liuml07 Mingliang Liu added a comment -

          Thanks Jing Zhao, Yongjun Zhang and Jitendra Nath Pandey for your prompt reply. I filed HDFS-10397 to track the effort.

          Show
          liuml07 Mingliang Liu added a comment - Thanks Jing Zhao , Yongjun Zhang and Jitendra Nath Pandey for your prompt reply. I filed HDFS-10397 to track the effort.
          Hide
          jnp Jitendra Nath Pandey added a comment -

          +1 to the proposal. '-delete' and '-diff' should have been mutually exclusive, but instead of throwing an exception ignoring "-delete" sounds ok to prevent existing apps from failing.

          Show
          jnp Jitendra Nath Pandey added a comment - +1 to the proposal. '-delete' and '-diff' should have been mutually exclusive, but instead of throwing an exception ignoring "-delete" sounds ok to prevent existing apps from failing.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Thanks Mingliang Liu and Jing Zhao, +1 on the proposed change.

          Show
          yzhangal Yongjun Zhang added a comment - Thanks Mingliang Liu and Jing Zhao , +1 on the proposed change.
          Hide
          jingzhao Jing Zhao added a comment -

          Agree with Mingliang Liu. This option change looks like incompatible, and the proposed fix sounds good to me.

          Show
          jingzhao Jing Zhao added a comment - Agree with Mingliang Liu . This option change looks like incompatible, and the proposed fix sounds good to me.
          Hide
          liuml07 Mingliang Liu added a comment -

          Hi,

          I'm aware that -delete and -diff options are mutually exclusive. I'm thinking this patch makes the existing applications (or scripts) that work just fine with both -delete and -diff options stop performing because of the java.lang.IllegalArgumentException: Diff is valid only with update options exception. That's said, this change is backward incompatible.

          Do you think it makes sense to ignore the -delete option, given -diff option, instead of exiting the program? Along with that, we can print a warning message saying that "Diff is valid only with update options, and -delete option is ignored". Ping Jitendra Nath Pandey for more input.

          Show
          liuml07 Mingliang Liu added a comment - Hi, I'm aware that -delete and -diff options are mutually exclusive. I'm thinking this patch makes the existing applications (or scripts) that work just fine with both -delete and -diff options stop performing because of the java.lang.IllegalArgumentException: Diff is valid only with update options exception. That's said, this change is backward incompatible. Do you think it makes sense to ignore the -delete option, given -diff option, instead of exiting the program? Along with that, we can print a warning message saying that "Diff is valid only with update options, and -delete option is ignored". Ping Jitendra Nath Pandey for more input.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Hdfs-trunk #2221 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2221/)
          HDFS-8828. Utilize Snapshot diff report to build diff copy list in distcp. (Yufei Gu via Yongjun Zhang) (yzhang: rev 0bc15cb6e60dc60885234e01dec1c7cb4557a926)

          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk #2221 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2221/ ) HDFS-8828 . Utilize Snapshot diff report to build diff copy list in distcp. (Yufei Gu via Yongjun Zhang) (yzhang: rev 0bc15cb6e60dc60885234e01dec1c7cb4557a926) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #283 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/283/)
          HDFS-8828. Utilize Snapshot diff report to build diff copy list in distcp. (Yufei Gu via Yongjun Zhang) (yzhang: rev 0bc15cb6e60dc60885234e01dec1c7cb4557a926)

          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #283 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/283/ ) HDFS-8828 . Utilize Snapshot diff report to build diff copy list in distcp. (Yufei Gu via Yongjun Zhang) (yzhang: rev 0bc15cb6e60dc60885234e01dec1c7cb4557a926) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2240 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2240/)
          HDFS-8828. Utilize Snapshot diff report to build diff copy list in distcp. (Yufei Gu via Yongjun Zhang) (yzhang: rev 0bc15cb6e60dc60885234e01dec1c7cb4557a926)

          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2240 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2240/ ) HDFS-8828 . Utilize Snapshot diff report to build diff copy list in distcp. (Yufei Gu via Yongjun Zhang) (yzhang: rev 0bc15cb6e60dc60885234e01dec1c7cb4557a926) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #294 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/294/)
          HDFS-8828. Utilize Snapshot diff report to build diff copy list in distcp. (Yufei Gu via Yongjun Zhang) (yzhang: rev 0bc15cb6e60dc60885234e01dec1c7cb4557a926)

          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #294 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/294/ ) HDFS-8828 . Utilize Snapshot diff report to build diff copy list in distcp. (Yufei Gu via Yongjun Zhang) (yzhang: rev 0bc15cb6e60dc60885234e01dec1c7cb4557a926) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #1024 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1024/)
          HDFS-8828. Utilize Snapshot diff report to build diff copy list in distcp. (Yufei Gu via Yongjun Zhang) (yzhang: rev 0bc15cb6e60dc60885234e01dec1c7cb4557a926)

          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #1024 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1024/ ) HDFS-8828 . Utilize Snapshot diff report to build diff copy list in distcp. (Yufei Gu via Yongjun Zhang) (yzhang: rev 0bc15cb6e60dc60885234e01dec1c7cb4557a926) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #291 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/291/)
          HDFS-8828. Utilize Snapshot diff report to build diff copy list in distcp. (Yufei Gu via Yongjun Zhang) (yzhang: rev 0bc15cb6e60dc60885234e01dec1c7cb4557a926)

          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #291 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/291/ ) HDFS-8828 . Utilize Snapshot diff report to build diff copy list in distcp. (Yufei Gu via Yongjun Zhang) (yzhang: rev 0bc15cb6e60dc60885234e01dec1c7cb4557a926) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java
          Hide
          yufeigu Yufei Gu added a comment -

          Thank you very much for committing, Yongjun Zhang.
          Thank you very much for review, Jing Zhao.

          Show
          yufeigu Yufei Gu added a comment - Thank you very much for committing, Yongjun Zhang . Thank you very much for review, Jing Zhao .
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #8328 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8328/)
          HDFS-8828. Utilize Snapshot diff report to build diff copy list in distcp. (Yufei Gu via Yongjun Zhang) (yzhang: rev 0bc15cb6e60dc60885234e01dec1c7cb4557a926)

          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java
          • hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
          • hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8328 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8328/ ) HDFS-8828 . Utilize Snapshot diff report to build diff copy list in distcp. (Yufei Gu via Yongjun Zhang) (yzhang: rev 0bc15cb6e60dc60885234e01dec1c7cb4557a926) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyListing.java hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java
          Hide
          yzhangal Yongjun Zhang added a comment -

          Committed to trunk and branch-2. Many thanks to Yufei Gu for the contribution and Jing Zhao for the review!

          Show
          yzhangal Yongjun Zhang added a comment - Committed to trunk and branch-2. Many thanks to Yufei Gu for the contribution and Jing Zhao for the review!
          Hide
          hadoopqa Hadoop QA added a comment -



          +1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 16m 31s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 2 new or modified test files.
          +1 javac 9m 55s There were no new javac warning messages.
          +1 javadoc 12m 6s There were no new javadoc warning messages.
          +1 release audit 0m 26s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 0m 29s There were no new checkstyle issues.
          +1 whitespace 0m 8s The patch has no lines that end in whitespace.
          +1 install 1m 42s mvn install still works.
          +1 eclipse:eclipse 0m 41s The patch built with eclipse:eclipse.
          +1 findbugs 0m 54s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 tools/hadoop tests 7m 34s Tests passed in hadoop-distcp.
              50m 31s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12751374/HDFS-8828.011.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 80a2990
          hadoop-distcp test log https://builds.apache.org/job/PreCommit-HDFS-Build/12057/artifact/patchprocess/testrun_hadoop-distcp.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12057/testReport/
          Java 1.7.0_55
          uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12057/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 pre-patch 16m 31s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 2 new or modified test files. +1 javac 9m 55s There were no new javac warning messages. +1 javadoc 12m 6s There were no new javadoc warning messages. +1 release audit 0m 26s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 29s There were no new checkstyle issues. +1 whitespace 0m 8s The patch has no lines that end in whitespace. +1 install 1m 42s mvn install still works. +1 eclipse:eclipse 0m 41s The patch built with eclipse:eclipse. +1 findbugs 0m 54s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 tools/hadoop tests 7m 34s Tests passed in hadoop-distcp.     50m 31s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12751374/HDFS-8828.011.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 80a2990 hadoop-distcp test log https://builds.apache.org/job/PreCommit-HDFS-Build/12057/artifact/patchprocess/testrun_hadoop-distcp.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12057/testReport/ Java 1.7.0_55 uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12057/console This message was automatically generated.
          Hide
          yzhangal Yongjun Zhang added a comment -

          +1 on rev 011. Will commit tomorrow morning.

          Thanks Yufei Gu and Jing Zhao!

          Show
          yzhangal Yongjun Zhang added a comment - +1 on rev 011. Will commit tomorrow morning. Thanks Yufei Gu and Jing Zhao !
          Hide
          hadoopqa Hadoop QA added a comment -



          +1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 15m 56s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 2 new or modified test files.
          +1 javac 7m 38s There were no new javac warning messages.
          +1 javadoc 9m 40s There were no new javadoc warning messages.
          +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 0m 26s There were no new checkstyle issues.
          +1 whitespace 0m 10s The patch has no lines that end in whitespace.
          +1 install 1m 28s mvn install still works.
          +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse.
          +1 findbugs 0m 47s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 tools/hadoop tests 6m 26s Tests passed in hadoop-distcp.
              43m 32s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12751374/HDFS-8828.011.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 36b1a1e
          hadoop-distcp test log https://builds.apache.org/job/PreCommit-HDFS-Build/12053/artifact/patchprocess/testrun_hadoop-distcp.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12053/testReport/
          Java 1.7.0_55
          uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12053/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 pre-patch 15m 56s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 2 new or modified test files. +1 javac 7m 38s There were no new javac warning messages. +1 javadoc 9m 40s There were no new javadoc warning messages. +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 26s There were no new checkstyle issues. +1 whitespace 0m 10s The patch has no lines that end in whitespace. +1 install 1m 28s mvn install still works. +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse. +1 findbugs 0m 47s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 tools/hadoop tests 6m 26s Tests passed in hadoop-distcp.     43m 32s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12751374/HDFS-8828.011.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 36b1a1e hadoop-distcp test log https://builds.apache.org/job/PreCommit-HDFS-Build/12053/artifact/patchprocess/testrun_hadoop-distcp.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12053/testReport/ Java 1.7.0_55 uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12053/console This message was automatically generated.
          Hide
          yufeigu Yufei Gu added a comment -

          Thank you very much, Jing Zhao!
          Thank you very much, Yongjun Zhang!

          I've uploaded a new patch 011 for all your comments. I will create a follow-up jira for the document.

          Thanks.

          Show
          yufeigu Yufei Gu added a comment - Thank you very much, Jing Zhao ! Thank you very much, Yongjun Zhang ! I've uploaded a new patch 011 for all your comments. I will create a follow-up jira for the document. Thanks.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Thanks a lot Jing Zhao!

          Hi Yufei Gu,

          Would you please work out a new rev to address our latest comments? I will commit it after jenkins.

          And would you please create a follow-up jira for document update afterwards?

          Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Thanks a lot Jing Zhao ! Hi Yufei Gu , Would you please work out a new rev to address our latest comments? I will commit it after jenkins. And would you please create a follow-up jira for document update afterwards? Thanks.
          Hide
          hadoopqa Hadoop QA added a comment -



          +1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 15m 43s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 2 new or modified test files.
          +1 javac 7m 47s There were no new javac warning messages.
          +1 javadoc 9m 53s There were no new javadoc warning messages.
          +1 release audit 0m 24s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 0m 27s There were no new checkstyle issues.
          +1 whitespace 0m 6s The patch has no lines that end in whitespace.
          +1 install 1m 29s mvn install still works.
          +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse.
          +1 findbugs 0m 47s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 tools/hadoop tests 6m 24s Tests passed in hadoop-distcp.
              43m 40s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12751292/HDFS-8828.010.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 4e14f79
          hadoop-distcp test log https://builds.apache.org/job/PreCommit-HDFS-Build/12049/artifact/patchprocess/testrun_hadoop-distcp.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12049/testReport/
          Java 1.7.0_55
          uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12049/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 pre-patch 15m 43s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 2 new or modified test files. +1 javac 7m 47s There were no new javac warning messages. +1 javadoc 9m 53s There were no new javadoc warning messages. +1 release audit 0m 24s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 27s There were no new checkstyle issues. +1 whitespace 0m 6s The patch has no lines that end in whitespace. +1 install 1m 29s mvn install still works. +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse. +1 findbugs 0m 47s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 tools/hadoop tests 6m 24s Tests passed in hadoop-distcp.     43m 40s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12751292/HDFS-8828.010.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 4e14f79 hadoop-distcp test log https://builds.apache.org/job/PreCommit-HDFS-Build/12049/artifact/patchprocess/testrun_hadoop-distcp.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12049/testReport/ Java 1.7.0_55 uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12049/console This message was automatically generated.
          Hide
          jingzhao Jing Zhao added a comment -

          Thanks for updating the patch, Yufei! The latest patch looks good to me. One nit is the two "=null" initialization can be skipped.

            DistCpSync(DistCpOptions options, Configuration conf) {
              this.inputOptions = options;
              this.conf = conf;
              this.diffMap = null;
              this.renameDiffs = null;
            }
          

          +1 after addressing this and Yongjun's comments. Also looks like we need to update the distcp doc for this functionality? Please feel free to do it in a separate jira.

          Show
          jingzhao Jing Zhao added a comment - Thanks for updating the patch, Yufei! The latest patch looks good to me. One nit is the two "=null" initialization can be skipped. DistCpSync(DistCpOptions options, Configuration conf) { this .inputOptions = options; this .conf = conf; this .diffMap = null ; this .renameDiffs = null ; } +1 after addressing this and Yongjun's comments. Also looks like we need to update the distcp doc for this functionality? Please feel free to do it in a separate jira.
          Hide
          yzhangal Yongjun Zhang added a comment -

          The build failure may be an intermittent one. I manually kicked off a new run at

          https://builds.apache.org/job/PreCommit-HDFS-Build/12049/

          Show
          yzhangal Yongjun Zhang added a comment - The build failure may be an intermittent one. I manually kicked off a new run at https://builds.apache.org/job/PreCommit-HDFS-Build/12049/
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Jing Zhao,

          Thanks a lot for your review and comments, I discussed with Yufei Gu and he worked out the new revs to address your comments.

          HI Yufei Gu, thanks for the new rev, some nits:

          • put the following code into its own method, like createInputFileListingWithDiff
            180	          Path fileListingPath = getFileListingPath();
            181	          CopyListing copyListing =
            182	              new SimpleCopyListing(job.getConfiguration(),
            183	                  job.getCredentials(), distCpSync);
            184	          copyListing.buildListing(fileListingPath, inputOptions);
            

            so this can be in parallel with the existing method createInputFileListing(Job job)

          • you accidentally changed * http://www.apache.org/licenses/LICENSE-2.0, please revert this change
          • In comments, "//xyz" should be "// xyz", notice the space between "//" and the text

          Please consider addressing them together with what Jing might have,

          Hi Jing Zhao, it looks good to me after the above nits addressed. Would you mind take another look so Yufei can address altogether if you have any more comments?

          Thanks,

          Show
          yzhangal Yongjun Zhang added a comment - Hi Jing Zhao , Thanks a lot for your review and comments, I discussed with Yufei Gu and he worked out the new revs to address your comments. HI Yufei Gu , thanks for the new rev, some nits: put the following code into its own method, like createInputFileListingWithDiff 180 Path fileListingPath = getFileListingPath(); 181 CopyListing copyListing = 182 new SimpleCopyListing(job.getConfiguration(), 183 job.getCredentials(), distCpSync); 184 copyListing.buildListing(fileListingPath, inputOptions); so this can be in parallel with the existing method createInputFileListing(Job job) you accidentally changed * http://www.apache.org/licenses/LICENSE-2.0 , please revert this change In comments, "//xyz" should be "// xyz", notice the space between "//" and the text Please consider addressing them together with what Jing might have, Hi Jing Zhao , it looks good to me after the above nits addressed. Would you mind take another look so Yufei can address altogether if you have any more comments? Thanks,
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          -1 javac 0m 6s The patch appears to cause the build to fail.



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12751292/HDFS-8828.010.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / f61120d
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12044/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 javac 0m 6s The patch appears to cause the build to fail. Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12751292/HDFS-8828.010.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / f61120d Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12044/console This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 15m 49s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 2 new or modified test files.
          +1 javac 7m 44s There were no new javac warning messages.
          +1 javadoc 9m 35s There were no new javadoc warning messages.
          +1 release audit 0m 24s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 0m 25s The applied patch generated 1 new checkstyle issues (total was 145, now 144).
          +1 whitespace 0m 7s The patch has no lines that end in whitespace.
          +1 install 1m 23s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 0m 46s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 tools/hadoop tests 6m 33s Tests passed in hadoop-distcp.
              43m 29s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12751197/HDFS-8828.010.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 7ecbfd4
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/12039/artifact/patchprocess/diffcheckstylehadoop-distcp.txt
          hadoop-distcp test log https://builds.apache.org/job/PreCommit-HDFS-Build/12039/artifact/patchprocess/testrun_hadoop-distcp.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12039/testReport/
          Java 1.7.0_55
          uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12039/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 15m 49s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 2 new or modified test files. +1 javac 7m 44s There were no new javac warning messages. +1 javadoc 9m 35s There were no new javadoc warning messages. +1 release audit 0m 24s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 0m 25s The applied patch generated 1 new checkstyle issues (total was 145, now 144). +1 whitespace 0m 7s The patch has no lines that end in whitespace. +1 install 1m 23s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 0m 46s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 tools/hadoop tests 6m 33s Tests passed in hadoop-distcp.     43m 29s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12751197/HDFS-8828.010.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 7ecbfd4 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/12039/artifact/patchprocess/diffcheckstylehadoop-distcp.txt hadoop-distcp test log https://builds.apache.org/job/PreCommit-HDFS-Build/12039/artifact/patchprocess/testrun_hadoop-distcp.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12039/testReport/ Java 1.7.0_55 uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12039/console This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 15m 48s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 2 new or modified test files.
          -1 javac 7m 50s The applied patch generated 2 additional warning messages.
          +1 javadoc 9m 59s There were no new javadoc warning messages.
          +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 0m 25s The applied patch generated 7 new checkstyle issues (total was 145, now 150).
          +1 whitespace 0m 7s The patch has no lines that end in whitespace.
          +1 install 1m 26s mvn install still works.
          +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse.
          +1 findbugs 0m 49s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 tools/hadoop tests 6m 32s Tests passed in hadoop-distcp.
              43m 58s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12751164/HDFS-8828.009.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 30e342a
          javac https://builds.apache.org/job/PreCommit-HDFS-Build/12030/artifact/patchprocess/diffJavacWarnings.txt
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/12030/artifact/patchprocess/diffcheckstylehadoop-distcp.txt
          hadoop-distcp test log https://builds.apache.org/job/PreCommit-HDFS-Build/12030/artifact/patchprocess/testrun_hadoop-distcp.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12030/testReport/
          Java 1.7.0_55
          uname Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12030/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 15m 48s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 2 new or modified test files. -1 javac 7m 50s The applied patch generated 2 additional warning messages. +1 javadoc 9m 59s There were no new javadoc warning messages. +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 0m 25s The applied patch generated 7 new checkstyle issues (total was 145, now 150). +1 whitespace 0m 7s The patch has no lines that end in whitespace. +1 install 1m 26s mvn install still works. +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse. +1 findbugs 0m 49s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 tools/hadoop tests 6m 32s Tests passed in hadoop-distcp.     43m 58s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12751164/HDFS-8828.009.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 30e342a javac https://builds.apache.org/job/PreCommit-HDFS-Build/12030/artifact/patchprocess/diffJavacWarnings.txt checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/12030/artifact/patchprocess/diffcheckstylehadoop-distcp.txt hadoop-distcp test log https://builds.apache.org/job/PreCommit-HDFS-Build/12030/artifact/patchprocess/testrun_hadoop-distcp.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12030/testReport/ Java 1.7.0_55 uname Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12030/console This message was automatically generated.
          Hide
          yufeigu Yufei Gu added a comment -

          Hi Jing Zhao,

          Thank you very much for great comments. I've uploaded a new patch 009.
          1. The new patch moves the diff report processing code and the list building code all into the DistCpSync class, and wraps the DiffInfo EnumMap in the class DistCpSync.
          2. The new patch removes one sort operation, and modifies code to optimize function getTraverseExcludeList with a sorted array.

          Show
          yufeigu Yufei Gu added a comment - Hi Jing Zhao , Thank you very much for great comments. I've uploaded a new patch 009. 1. The new patch moves the diff report processing code and the list building code all into the DistCpSync class, and wraps the DiffInfo EnumMap in the class DistCpSync. 2. The new patch removes one sort operation, and modifies code to optimize function getTraverseExcludeList with a sorted array.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          -1 pre-patch 16m 9s Findbugs (version ) appears to be broken on trunk.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 2 new or modified test files.
          -1 javac 8m 18s The applied patch generated 2 additional warning messages.
          +1 javadoc 10m 48s There were no new javadoc warning messages.
          +1 release audit 0m 24s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 0m 15s There were no new checkstyle issues.
          +1 whitespace 0m 7s The patch has no lines that end in whitespace.
          +1 install 1m 28s mvn install still works.
          +1 eclipse:eclipse 0m 35s The patch built with eclipse:eclipse.
          +1 findbugs 0m 53s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 tools/hadoop tests 7m 9s Tests passed in hadoop-distcp.
              46m 21s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12751058/HDFS-8828.008.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 3a76a01
          javac https://builds.apache.org/job/PreCommit-HDFS-Build/12024/artifact/patchprocess/diffJavacWarnings.txt
          hadoop-distcp test log https://builds.apache.org/job/PreCommit-HDFS-Build/12024/artifact/patchprocess/testrun_hadoop-distcp.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12024/testReport/
          Java 1.7.0_55
          uname Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12024/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 pre-patch 16m 9s Findbugs (version ) appears to be broken on trunk. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 2 new or modified test files. -1 javac 8m 18s The applied patch generated 2 additional warning messages. +1 javadoc 10m 48s There were no new javadoc warning messages. +1 release audit 0m 24s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 15s There were no new checkstyle issues. +1 whitespace 0m 7s The patch has no lines that end in whitespace. +1 install 1m 28s mvn install still works. +1 eclipse:eclipse 0m 35s The patch built with eclipse:eclipse. +1 findbugs 0m 53s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 tools/hadoop tests 7m 9s Tests passed in hadoop-distcp.     46m 21s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12751058/HDFS-8828.008.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 3a76a01 javac https://builds.apache.org/job/PreCommit-HDFS-Build/12024/artifact/patchprocess/diffJavacWarnings.txt hadoop-distcp test log https://builds.apache.org/job/PreCommit-HDFS-Build/12024/artifact/patchprocess/testrun_hadoop-distcp.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12024/testReport/ Java 1.7.0_55 uname Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12024/console This message was automatically generated.
          Hide
          jingzhao Jing Zhao added a comment - - edited

          Thanks for updating the patch, Yufei! Some further comments on the latest patch:

          1. Now we have many static methods in DiffInfo and DistCpSync, and we're passing the diffs between them. It's better to leave DiffInfo still as a simple data structure class, move the diff report processing code and the list building code all into the DistCpSync class. We then can make the DiffInfo EnumMap as a class member. In this way we can simplify a lot of code and make the logic more clear.
          2. I notice there are multiple places we retrieve the DiffInfo list (e.g., with Rename type), convert it into an array, and sort it afterwards. I'm not sure if a sorted array is necessary for all the scenarios (e.g., getTraverseExcludeList and getRenameItem). Maybe we can directly go through the corresponding list inside of the DiffInfo map.
          3. Or alternatively, when updating/excluding the created file/dir based on renamed items, we only need to scan the renamed directories, and the search can be a binary search based on path strings. Thus as an optional optimization, maybe we can prepare a sorted renamed directory array for this use case?
          4. Should be "!syncAndBuildListWithSnapshot".
                   if (inputOptions.shouldUseDiff()) {
            -        if (!DistCpSync.sync(inputOptions, getConf())) {
            +        if (syncAndBuildListWithSnapshot(job)) {
                       inputOptions.disableUsingDiff();
                     }
                   }
            
          Show
          jingzhao Jing Zhao added a comment - - edited Thanks for updating the patch, Yufei! Some further comments on the latest patch: Now we have many static methods in DiffInfo and DistCpSync, and we're passing the diffs between them. It's better to leave DiffInfo still as a simple data structure class, move the diff report processing code and the list building code all into the DistCpSync class. We then can make the DiffInfo EnumMap as a class member. In this way we can simplify a lot of code and make the logic more clear. I notice there are multiple places we retrieve the DiffInfo list (e.g., with Rename type), convert it into an array, and sort it afterwards. I'm not sure if a sorted array is necessary for all the scenarios (e.g., getTraverseExcludeList and getRenameItem ). Maybe we can directly go through the corresponding list inside of the DiffInfo map. Or alternatively, when updating/excluding the created file/dir based on renamed items, we only need to scan the renamed directories, and the search can be a binary search based on path strings. Thus as an optional optimization, maybe we can prepare a sorted renamed directory array for this use case? Should be "!syncAndBuildListWithSnapshot". if (inputOptions.shouldUseDiff()) { - if (!DistCpSync.sync(inputOptions, getConf())) { + if (syncAndBuildListWithSnapshot(job)) { inputOptions.disableUsingDiff(); } }
          Hide
          yufeigu Yufei Gu added a comment -

          Hi Jing Zhao,

          Thank you very much for review and nice suggestion. I've uploaded the patch 008 for your comments.

          Show
          yufeigu Yufei Gu added a comment - Hi Jing Zhao , Thank you very much for review and nice suggestion. I've uploaded the patch 008 for your comments.
          Hide
          jingzhao Jing Zhao added a comment -

          Yes, we can try to use DistCpSync or a new class to carry the information. The third way can be: handle diff related logic together, i.e., to have a separate building list function for diff report based mechanism and call it immediately after doing the sync. The code may look like:

          if (use diff) {
             get diff report
             parse diff report
             do sync based on rename/delete diff info
             build copy list
          } else {
             all the original implementation
          }
          
          Show
          jingzhao Jing Zhao added a comment - Yes, we can try to use DistCpSync or a new class to carry the information. The third way can be: handle diff related logic together, i.e., to have a separate building list function for diff report based mechanism and call it immediately after doing the sync. The code may look like: if (use diff) { get diff report parse diff report do sync based on rename/delete diff info build copy list } else { all the original implementation }
          Hide
          yufeigu Yufei Gu added a comment -

          Hi Jing Zhao,

          Thank you very much for the review and nice comments!

          And thanks Yongjun Zhang for good suggestions of comments 1 & 2. I agree that we need a more reasonable place to carry information between different distcp stages if we don't use the DistCpOptions.

          One of my thoughts is that we utilize a new class(or reuse the class DistCpSync) to carry these information as Yongjun Zhang suggested. Then add one object of the class as a member of class SimpleCopyListing. This will avoid connecting Snapshot diff report with DistCpOptions, but some function signatures should be changed, e.g. createInputFileListing, CopyListing#getCopyListing, and so on. We also need to create a new class or add some members in class DistCpSync.

          How do you guys think?

          Thanks.

          Show
          yufeigu Yufei Gu added a comment - Hi Jing Zhao , Thank you very much for the review and nice comments! And thanks Yongjun Zhang for good suggestions of comments 1 & 2. I agree that we need a more reasonable place to carry information between different distcp stages if we don't use the DistCpOptions. One of my thoughts is that we utilize a new class(or reuse the class DistCpSync) to carry these information as Yongjun Zhang suggested. Then add one object of the class as a member of class SimpleCopyListing. This will avoid connecting Snapshot diff report with DistCpOptions, but some function signatures should be changed, e.g. createInputFileListing, CopyListing#getCopyListing, and so on. We also need to create a new class or add some members in class DistCpSync. How do you guys think? Thanks.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Jing Zhao,

          Thanks for the review and good comments!

          Some thoughts:

          Currently DistCpOptions is currently the only vehicle to pass info between different stages of distcp. To address your comment 1 & 2, we need to add something new to pass additional info (which is derived data, and can be hold by a new class) to pass between sync and copyListing stages. To do this, there are two choices:

          1. Pass an object of this new class as a standalone parameter between stages, which require changing quite some method signatures, most of the places don't use this new parameter.

          2. Put the object of this class as a member of DistCpOptions, so there is no need to change method signatures. We can create another new class DistCpDerivedInput to hold derived input, and put all derived data as members of DistCpDerivedInput. If we define DistCpOptions as only holding command line options, then then this choice is not perfect; however, if we define DistcpOptions as may-contain derived input too, then it's ok.

          Which choice you like better? or additional thoughts?

          Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Hi Jing Zhao , Thanks for the review and good comments! Some thoughts: Currently DistCpOptions is currently the only vehicle to pass info between different stages of distcp. To address your comment 1 & 2, we need to add something new to pass additional info (which is derived data, and can be hold by a new class) to pass between sync and copyListing stages. To do this, there are two choices: 1. Pass an object of this new class as a standalone parameter between stages, which require changing quite some method signatures, most of the places don't use this new parameter. 2. Put the object of this class as a member of DistCpOptions, so there is no need to change method signatures. We can create another new class DistCpDerivedInput to hold derived input, and put all derived data as members of DistCpDerivedInput . If we define DistCpOptions as only holding command line options, then then this choice is not perfect; however, if we define DistcpOptions as may-contain derived input too, then it's ok. Which choice you like better? or additional thoughts? Thanks.
          Hide
          jingzhao Jing Zhao added a comment -

          Thanks again for working on this, Yufei. The patch looks good to me overall. Some minor comments:

          1. DistCpOptions may not be a good place to put SnapshotDiffReport. The diff report is an intermediate result of the distcp, while DistCpOptions is only used for holding all the options. Let's find another way to pass the diff report to the list building function.
          2. It's better to combine getDiffs and getDiffsForListBuilding so that we do not need to scan the diff report twice. Maybe we can let getDiffs return an EnumMap<DiffType, List<DiffInfo>>?
          3. We can remove empty @param description from javadoc
          4. DiffInfo#type can be declared as final.
          Show
          jingzhao Jing Zhao added a comment - Thanks again for working on this, Yufei. The patch looks good to me overall. Some minor comments: DistCpOptions may not be a good place to put SnapshotDiffReport. The diff report is an intermediate result of the distcp, while DistCpOptions is only used for holding all the options. Let's find another way to pass the diff report to the list building function. It's better to combine getDiffs and getDiffsForListBuilding so that we do not need to scan the diff report twice. Maybe we can let getDiffs return an EnumMap<DiffType, List<DiffInfo>>? We can remove empty @param description from javadoc DiffInfo#type can be declared as final.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Thanks Yufei Gu, +1 on rev 007.

          Show
          yzhangal Yongjun Zhang added a comment - Thanks Yufei Gu , +1 on rev 007.
          Hide
          hadoopqa Hadoop QA added a comment -



          +1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 15m 43s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 2 new or modified test files.
          +1 javac 7m 41s There were no new javac warning messages.
          +1 javadoc 9m 40s There were no new javadoc warning messages.
          +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 0m 26s There were no new checkstyle issues.
          +1 whitespace 0m 4s The patch has no lines that end in whitespace.
          +1 install 1m 24s mvn install still works.
          +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse.
          +1 findbugs 0m 49s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 tools/hadoop tests 6m 26s Tests passed in hadoop-distcp.
              43m 13s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12750404/HDFS-8828.007.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 0a03054
          hadoop-distcp test log https://builds.apache.org/job/PreCommit-HDFS-Build/11990/artifact/patchprocess/testrun_hadoop-distcp.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11990/testReport/
          Java 1.7.0_55
          uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11990/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 pre-patch 15m 43s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 2 new or modified test files. +1 javac 7m 41s There were no new javac warning messages. +1 javadoc 9m 40s There were no new javadoc warning messages. +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 26s There were no new checkstyle issues. +1 whitespace 0m 4s The patch has no lines that end in whitespace. +1 install 1m 24s mvn install still works. +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse. +1 findbugs 0m 49s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 tools/hadoop tests 6m 26s Tests passed in hadoop-distcp.     43m 13s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12750404/HDFS-8828.007.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 0a03054 hadoop-distcp test log https://builds.apache.org/job/PreCommit-HDFS-Build/11990/artifact/patchprocess/testrun_hadoop-distcp.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11990/testReport/ Java 1.7.0_55 uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11990/console This message was automatically generated.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Thank you Yufei Gu and Jing Zhao!

          Show
          yzhangal Yongjun Zhang added a comment - Thank you Yufei Gu and Jing Zhao !
          Hide
          yufeigu Yufei Gu added a comment -

          Thank you, Jing Zhao. Glad to have you review the code.

          Show
          yufeigu Yufei Gu added a comment - Thank you, Jing Zhao . Glad to have you review the code.
          Hide
          yufeigu Yufei Gu added a comment -

          Hi Yongjun Zhang,

          Thanks very much for code review. I've done the modification and uploaded the new patch.

          Show
          yufeigu Yufei Gu added a comment - Hi Yongjun Zhang , Thanks very much for code review. I've done the modification and uploaded the new patch.
          Hide
          yufeigu Yufei Gu added a comment -

          No. Just one CREATE item in snapshot diff report in this case.

          Show
          yufeigu Yufei Gu added a comment - No. Just one CREATE item in snapshot diff report in this case.
          Hide
          jingzhao Jing Zhao added a comment -

          Sure. I will review the patch. Thanks for the work, Yufei and Yongjun!

          Show
          jingzhao Jing Zhao added a comment - Sure. I will review the patch. Thanks for the work, Yufei and Yongjun!
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Yufei Gu,

          Thanks for answering my question in person. So for newly created dir, there is indeed one entry "CREATE" in the snapshot diff report, and no entries for new elements created below this dir.

          So please take care of my comment 1, 2 in my previous review, plus:

          3. Suggest to change the getExcludeList method to getTraverseExcludeList (hopefully a better name) and with the following javadoc as we agreed.

          This method returns a list of items to be excluded when recursively traversing newDir to build the copy list.
          
          Specifically, given a newly created directory newDir (a CREATE entry in the snapshot diff), if a previously copied file/directory itemX is moved (a RENAME entry in the snapshot diff) into newDir, itemX should be excluded when recursively traversing newDir in #traverseDirectory,  so that it will not to be copied again.
          
          If the same itemX also has a MODIFY entry in the snapshot diff report, meaning it was modified after it was previously copied, it will still be added to the copy list (handled in the main loop of doBuildListingWithSnapshotDiff).
          

          4. Do refactoring to consolidate duplicated code in test code that we discussed.

          Hi Jing Zhao, I had quite some side discussion with Yufei, I am +1 on the change after the above comments are addressed. Would you please take a look at it if you wish? I'm targeting at committing it next Monday.

          Thanks much.

          Show
          yzhangal Yongjun Zhang added a comment - Hi Yufei Gu , Thanks for answering my question in person. So for newly created dir, there is indeed one entry "CREATE" in the snapshot diff report, and no entries for new elements created below this dir. So please take care of my comment 1, 2 in my previous review, plus: 3. Suggest to change the getExcludeList method to getTraverseExcludeList (hopefully a better name) and with the following javadoc as we agreed. This method returns a list of items to be excluded when recursively traversing newDir to build the copy list. Specifically, given a newly created directory newDir (a CREATE entry in the snapshot diff), if a previously copied file/directory itemX is moved (a RENAME entry in the snapshot diff) into newDir, itemX should be excluded when recursively traversing newDir in #traverseDirectory, so that it will not to be copied again. If the same itemX also has a MODIFY entry in the snapshot diff report, meaning it was modified after it was previously copied, it will still be added to the copy list (handled in the main loop of doBuildListingWithSnapshotDiff). 4. Do refactoring to consolidate duplicated code in test code that we discussed. Hi Jing Zhao , I had quite some side discussion with Yufei, I am +1 on the change after the above comments are addressed. Would you please take a look at it if you wish? I'm targeting at committing it next Monday. Thanks much.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hello Yufei Gu,

          I expect every new CREATE/MODIFICATION below the newly created dir would also have an entry in the snapshot diff report (maybe except the first level children case described in my last comment), is this not the case?

          Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Hello Yufei Gu , I expect every new CREATE/MODIFICATION below the newly created dir would also have an entry in the snapshot diff report (maybe except the first level children case described in my last comment), is this not the case? Thanks.
          Hide
          yufeigu Yufei Gu added a comment -

          Hi Yongjun Zhang,

          Thank you for detailed review.

          For 3, we do need recursively traverse because a created directory item in a snapshot diff report could have multiple levels of subdirectories.

          Show
          yufeigu Yufei Gu added a comment - Hi Yongjun Zhang , Thank you for detailed review. For 3, we do need recursively traverse because a created directory item in a snapshot diff report could have multiple levels of subdirectories.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Yufei Gu,

          Thanks for the new rev 006 which tries to address the issue we discussed (to avoid re-copying an already copied dir/file which is moved to a newly created dir since last snapshot/distcp).

          I have some more comments:

          1. Change Number of path in the copy list to Number of paths in the copy list
          2. change
                   if (LOG.isDebugEnabled()) {
                      LOG.debug("Path in the copy list: " + lastFileStatus.getPath().toUri().getPath());
                    }
            

            To (add an idx and only print in usediff && debug mode):
            if (options.shouldUseDiff() && LOG.isDebugEnabled())

            { LOG.debug("Copy list entry " + idx + ": " + lastFileStatus.getPath().toUri().getPath()); } ++idx; {code}
          3. Add some more explanation to the javadoc of static HashSet<String> getExcludeList(Path dir, DiffInfo[] renameDiffs, Path prefix}, such as:
            Given a newly created directory newDir in the snapshot diff, if a previously copied file/dirctory itemX is moved (renamed) to below newDir, itemX should be excluded so it will not to be copied again.         
            
          4. the goal of this jira is to only copy modified/created files, all of which would have entries in the snapshot diff report, why we have to call traverseDirectory to recursively traverse everything in doBuildListingWithSnapshotDiff(..} in this mode? Sounds to me that we only need to look at each snapshot diff item, and its direct children. ("mv ./x/y ./p/q" would make two entries in the snapshot diff: ./x and ./y, so do need to care about the first level children of snapshot diff entry). Right?
            If so, to reuse the code in traverseDirectory, we can modify traverseDirectory to support a mode that only cares about the current source and and it's first level children, but not recursively.

          Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Hi Yufei Gu , Thanks for the new rev 006 which tries to address the issue we discussed (to avoid re-copying an already copied dir/file which is moved to a newly created dir since last snapshot/distcp). I have some more comments: Change Number of path in the copy list to Number of paths in the copy list change if (LOG.isDebugEnabled()) { LOG.debug( "Path in the copy list: " + lastFileStatus.getPath().toUri().getPath()); } To (add an idx and only print in usediff && debug mode): if (options.shouldUseDiff() && LOG.isDebugEnabled()) { LOG.debug("Copy list entry " + idx + ": " + lastFileStatus.getPath().toUri().getPath()); } ++idx; {code} Add some more explanation to the javadoc of static HashSet<String> getExcludeList(Path dir, DiffInfo[] renameDiffs, Path prefix }, such as: Given a newly created directory newDir in the snapshot diff, if a previously copied file/dirctory itemX is moved (renamed) to below newDir, itemX should be excluded so it will not to be copied again. the goal of this jira is to only copy modified/created files, all of which would have entries in the snapshot diff report, why we have to call traverseDirectory to recursively traverse everything in doBuildListingWithSnapshotDiff(.. } in this mode? Sounds to me that we only need to look at each snapshot diff item, and its direct children. ("mv ./x/y ./p/q" would make two entries in the snapshot diff: ./x and ./y, so do need to care about the first level children of snapshot diff entry). Right? If so, to reuse the code in traverseDirectory , we can modify traverseDirectory to support a mode that only cares about the current source and and it's first level children, but not recursively. Thanks.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 16m 37s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 2 new or modified test files.
          +1 javac 8m 20s There were no new javac warning messages.
          +1 javadoc 10m 20s There were no new javadoc warning messages.
          +1 release audit 0m 21s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 0m 28s The applied patch generated 1 new checkstyle issues (total was 120, now 121).
          +1 whitespace 0m 4s The patch has no lines that end in whitespace.
          +1 install 1m 29s mvn install still works.
          +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse.
          +1 findbugs 0m 50s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 tools/hadoop tests 6m 29s Tests passed in hadoop-distcp.
              45m 36s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12749947/HDFS-8828.006.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 7c796fd
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/11967/artifact/patchprocess/diffcheckstylehadoop-distcp.txt
          hadoop-distcp test log https://builds.apache.org/job/PreCommit-HDFS-Build/11967/artifact/patchprocess/testrun_hadoop-distcp.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11967/testReport/
          Java 1.7.0_55
          uname Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11967/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 16m 37s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 2 new or modified test files. +1 javac 8m 20s There were no new javac warning messages. +1 javadoc 10m 20s There were no new javadoc warning messages. +1 release audit 0m 21s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 0m 28s The applied patch generated 1 new checkstyle issues (total was 120, now 121). +1 whitespace 0m 4s The patch has no lines that end in whitespace. +1 install 1m 29s mvn install still works. +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse. +1 findbugs 0m 50s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 tools/hadoop tests 6m 29s Tests passed in hadoop-distcp.     45m 36s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12749947/HDFS-8828.006.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 7c796fd checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/11967/artifact/patchprocess/diffcheckstylehadoop-distcp.txt hadoop-distcp test log https://builds.apache.org/job/PreCommit-HDFS-Build/11967/artifact/patchprocess/testrun_hadoop-distcp.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11967/testReport/ Java 1.7.0_55 uname Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11967/console This message was automatically generated.
          Hide
          yufeigu Yufei Gu added a comment -

          Add exclude list while recursively traverse the created directories in snapshot diff report.

          Show
          yufeigu Yufei Gu added a comment - Add exclude list while recursively traverse the created directories in snapshot diff report.
          Hide
          hadoopqa Hadoop QA added a comment -



          +1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 15m 28s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 2 new or modified test files.
          +1 javac 7m 41s There were no new javac warning messages.
          +1 javadoc 9m 40s There were no new javadoc warning messages.
          +1 release audit 0m 24s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 0m 26s There were no new checkstyle issues.
          +1 whitespace 0m 3s The patch has no lines that end in whitespace.
          +1 install 1m 22s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 0m 47s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 tools/hadoop tests 6m 23s Tests passed in hadoop-distcp.
              42m 51s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12749426/HDFS-8828.005.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 8f73bdd
          hadoop-distcp test log https://builds.apache.org/job/PreCommit-HDFS-Build/11946/artifact/patchprocess/testrun_hadoop-distcp.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11946/testReport/
          Java 1.7.0_55
          uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11946/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 pre-patch 15m 28s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 2 new or modified test files. +1 javac 7m 41s There were no new javac warning messages. +1 javadoc 9m 40s There were no new javadoc warning messages. +1 release audit 0m 24s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 26s There were no new checkstyle issues. +1 whitespace 0m 3s The patch has no lines that end in whitespace. +1 install 1m 22s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 0m 47s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 tools/hadoop tests 6m 23s Tests passed in hadoop-distcp.     42m 51s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12749426/HDFS-8828.005.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 8f73bdd hadoop-distcp test log https://builds.apache.org/job/PreCommit-HDFS-Build/11946/artifact/patchprocess/testrun_hadoop-distcp.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11946/testReport/ Java 1.7.0_55 uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11946/console This message was automatically generated.
          Hide
          yufeigu Yufei Gu added a comment -

          Hi Yongjun Zhang,

          Thank you very much for detailed code reviews and nice suggestions. I've upload a new path for your latest code review.

          Show
          yufeigu Yufei Gu added a comment - Hi Yongjun Zhang , Thank you very much for detailed code reviews and nice suggestions. I've upload a new path for your latest code review.
          Hide
          yzhangal Yongjun Zhang added a comment -
              return childPath.length() > parentPath.length() &&
                  childPath.startsWith(parentPath);
          
          Show
          yzhangal Yongjun Zhang added a comment - return childPath.length() > parentPath.length() && childPath.startsWith(parentPath);
          Hide
          yzhangal Yongjun Zhang added a comment -

          Looked at the checkstyle now, and realized that I missed in my last review, so one more comment:

          Change:

              if (childPath.startsWith(parentPath)) {
                if (childPath.length() == parentPath.length()) {
                  return false;
                } else {
                  return true;
                }
              }
              return false;
          

          to:

              return childPath.length() > parentPath.length() &&
                    childPath.startsWith(parentPath);
          
          Show
          yzhangal Yongjun Zhang added a comment - Looked at the checkstyle now, and realized that I missed in my last review, so one more comment: Change: if (childPath.startsWith(parentPath)) { if (childPath.length() == parentPath.length()) { return false ; } else { return true ; } } return false ; to: return childPath.length() > parentPath.length() && childPath.startsWith(parentPath);
          Hide
          yzhangal Yongjun Zhang added a comment -

          HI Yufei Gu,

          Thanks for addressing my above comments and verbal one about adding assertion in the testing code in the new rev. I did one more round of review, and have couple of nits and an improvement comments:

          1. move the following two statements to next to each other.
                path = getPathWithSchemeAndAuthority(path);
                path = makeQualified(path);
            
          2. change
               for (Map.Entry<Text, CopyListingFileStatus> entry : copyListing.entrySet())
               {
            

            back to

                for (Map.Entry<Text, CopyListingFileStatus> entry : copyListing.entrySet()) {
            

            I know that this means 81 chars one line, I guess we favor the coding guideline of not to have "{" at a new line. This will possibly solve the checkstyle warn reported by jenkins (I can't see it now because the link is not accessible right now, please look into anyways).

          3. In TestDistCpSync.java, suggest to modify all changeData<XYZ> method to return an integer of number of created/modified items,
            as below:
            int changeData<xyz>() {
              int numCreatedModified = 0;
              ...
              // after creating/modifying a file
              numCreatedModified += 1;
              ...
              return numCreatedModified;
            }
            

            And replace the hardcoded number in the added assertion in each test with the returned value.
            This way, it's easier to maintain the test code.

          Hi Jing Zhao, it largely looks good to me after addressing the above comments. Would you please help taking a look from your perspective?

          Many thanks.

          Show
          yzhangal Yongjun Zhang added a comment - HI Yufei Gu , Thanks for addressing my above comments and verbal one about adding assertion in the testing code in the new rev. I did one more round of review, and have couple of nits and an improvement comments: move the following two statements to next to each other. path = getPathWithSchemeAndAuthority(path); path = makeQualified(path); change for (Map.Entry<Text, CopyListingFileStatus> entry : copyListing.entrySet()) { back to for (Map.Entry<Text, CopyListingFileStatus> entry : copyListing.entrySet()) { I know that this means 81 chars one line, I guess we favor the coding guideline of not to have "{" at a new line. This will possibly solve the checkstyle warn reported by jenkins (I can't see it now because the link is not accessible right now, please look into anyways). In TestDistCpSync.java, suggest to modify all changeData<XYZ> method to return an integer of number of created/modified items, as below: int changeData<xyz>() { int numCreatedModified = 0; ... // after creating/modifying a file numCreatedModified += 1; ... return numCreatedModified; } And replace the hardcoded number in the added assertion in each test with the returned value. This way, it's easier to maintain the test code. Hi Jing Zhao , it largely looks good to me after addressing the above comments. Would you please help taking a look from your perspective? Many thanks.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 15m 35s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 2 new or modified test files.
          +1 javac 7m 40s There were no new javac warning messages.
          +1 javadoc 9m 38s There were no new javadoc warning messages.
          +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 0m 27s The applied patch generated 1 new checkstyle issues (total was 120, now 121).
          +1 whitespace 0m 3s The patch has no lines that end in whitespace.
          +1 install 1m 21s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 0m 47s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 tools/hadoop tests 6m 23s Tests passed in hadoop-distcp.
              42m 54s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12749397/HDFS-8828.004.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 8f73bdd
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/11941/artifact/patchprocess/diffcheckstylehadoop-distcp.txt
          hadoop-distcp test log https://builds.apache.org/job/PreCommit-HDFS-Build/11941/artifact/patchprocess/testrun_hadoop-distcp.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11941/testReport/
          Java 1.7.0_55
          uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11941/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 15m 35s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 2 new or modified test files. +1 javac 7m 40s There were no new javac warning messages. +1 javadoc 9m 38s There were no new javadoc warning messages. +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 0m 27s The applied patch generated 1 new checkstyle issues (total was 120, now 121). +1 whitespace 0m 3s The patch has no lines that end in whitespace. +1 install 1m 21s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 0m 47s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 tools/hadoop tests 6m 23s Tests passed in hadoop-distcp.     42m 54s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12749397/HDFS-8828.004.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 8f73bdd checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/11941/artifact/patchprocess/diffcheckstylehadoop-distcp.txt hadoop-distcp test log https://builds.apache.org/job/PreCommit-HDFS-Build/11941/artifact/patchprocess/testrun_hadoop-distcp.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11941/testReport/ Java 1.7.0_55 uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11941/console This message was automatically generated.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Thanks Yufei Gu for the new rev and Jing Zhao for looking into.

          I did more review of rev 3, below are my comments:

          1. {{static DiffInfo[] getDiffsForListBuilding(SnapshotDiffReport report)}
            1. Please add one line in javadoc to state that DELETE is handled in DistCpSync#sync
            2. Add a comment for if(entry.getSourcePath().length <= 0), what it means when the source path is <= 0.
          2. {{isParentOf(Path parent, Path child) }}
            1. it need to check whether the matching point is a path delimiter; in addition,
            2. if (childPath.equals(parentPath)) can be optimized with checking length instead of content, once the startWith matches.
          3. In getRenameItem}}:
            1. The comment in the body: suggest to change
              // The same path string may appear in:
              // 1. both renamed and modified snapshot diff entries,
              // 2. both renamed and created snapshot diff entries
              // Case 1 is the about same file/directory, whereas case 2 is about two different files/directories.
              // We are finding case 1 here, thus we check against DiffType.MODIFY.
            2. Add a space in }else if
            3. Add a comment in the else if branch saying that this item can be either modified or created.
          4. In static Path getTargetPath(Path sourcePath, DiffInfo renameItem)
            1. change "by the rename item" to "based on the rename item"
            2. StringBuffer sb = new StringBuffer(sourcePath.toString()); to after the first return statement.
          5. In getDiffList(DistCpOptions options),
            1. The two calls to finalListWithTarget.add(diff); in getDiffList(DistCpOptions options) can be consolidated.
            2. The check of if (diff.target != null) meant to check whether the entry is not rename, can we change it to check the type instead of target to be more clear?
          6. Need some improvement in javadoc for public void doBuildListingWithSnapshotDiff. Suggest to replace "We need to handle rename item here since some created/modified items could be renamed as well." with "An item can be created/modified and renamed, in which case, the target path is put into the list".

          I did not review the test code yet, hope you can revisit and can catch something yourself. I will review after this round.

          Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Thanks Yufei Gu for the new rev and Jing Zhao for looking into. I did more review of rev 3, below are my comments: {{static DiffInfo[] getDiffsForListBuilding(SnapshotDiffReport report)} Please add one line in javadoc to state that DELETE is handled in DistCpSync#sync Add a comment for if(entry.getSourcePath().length <= 0) , what it means when the source path is <= 0. {{isParentOf(Path parent, Path child) }} it need to check whether the matching point is a path delimiter; in addition, if (childPath.equals(parentPath)) can be optimized with checking length instead of content, once the startWith matches. In getRenameItem }}: The comment in the body: suggest to change // The same path string may appear in: // 1. both renamed and modified snapshot diff entries, // 2. both renamed and created snapshot diff entries // Case 1 is the about same file/directory, whereas case 2 is about two different files/directories. // We are finding case 1 here, thus we check against DiffType.MODIFY. Add a space in }else if Add a comment in the else if branch saying that this item can be either modified or created. In static Path getTargetPath(Path sourcePath, DiffInfo renameItem) change "by the rename item" to "based on the rename item" StringBuffer sb = new StringBuffer(sourcePath.toString()); to after the first return statement. In getDiffList(DistCpOptions options) , The two calls to finalListWithTarget.add(diff); in getDiffList(DistCpOptions options) can be consolidated. The check of if (diff.target != null) meant to check whether the entry is not rename, can we change it to check the type instead of target to be more clear? Need some improvement in javadoc for public void doBuildListingWithSnapshotDiff . Suggest to replace "We need to handle rename item here since some created/modified items could be renamed as well." with "An item can be created/modified and renamed, in which case, the target path is put into the list". I did not review the test code yet, hope you can revisit and can catch something yourself. I will review after this round. Thanks.
          Hide
          jingzhao Jing Zhao added a comment -

          Thanks for the explanation, Yufei! Yes, you're right that our current code uses the file list to check if a file is in the source. In that sense excluding "-delete" may be our only option here. But we may need to provide more details in the documentation about the behavior, as also suggested by Yongjun.

          Show
          jingzhao Jing Zhao added a comment - Thanks for the explanation, Yufei! Yes, you're right that our current code uses the file list to check if a file is in the source. In that sense excluding "-delete" may be our only option here. But we may need to provide more details in the documentation about the behavior, as also suggested by Yongjun.
          Hide
          yufeigu Yufei Gu added a comment -

          Hi Jing Zhao,

          Thank you for reviewing the code. We changed the option here for the following reason. This patch is to build the diff file list instead of complete file list. In the other words, only files/directories changed/created will be in the copy file list. With the "-delete" option on, the MR jobs will delete every files/directories in the target which are not in the copy file list. So it will delete files we intend to keep.

          Show
          yufeigu Yufei Gu added a comment - Hi Jing Zhao, Thank you for reviewing the code. We changed the option here for the following reason. This patch is to build the diff file list instead of complete file list. In the other words, only files/directories changed/created will be in the copy file list. With the "-delete" option on, the MR jobs will delete every files/directories in the target which are not in the copy file list. So it will delete files we intend to keep.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 15m 22s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 2 new or modified test files.
          +1 javac 7m 33s There were no new javac warning messages.
          -1 javadoc 9m 37s The applied patch generated 2 additional warning messages.
          +1 release audit 0m 24s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 0m 25s There were no new checkstyle issues.
          +1 whitespace 0m 4s The patch has no lines that end in whitespace.
          +1 install 1m 22s mvn install still works.
          +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse.
          +1 findbugs 0m 47s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 tools/hadoop tests 6m 19s Tests passed in hadoop-distcp.
              42m 32s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12748535/HDFS-8828.003.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 469cfcd
          javadoc https://builds.apache.org/job/PreCommit-HDFS-Build/11891/artifact/patchprocess/diffJavadocWarnings.txt
          hadoop-distcp test log https://builds.apache.org/job/PreCommit-HDFS-Build/11891/artifact/patchprocess/testrun_hadoop-distcp.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11891/testReport/
          Java 1.7.0_55
          uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11891/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 15m 22s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 2 new or modified test files. +1 javac 7m 33s There were no new javac warning messages. -1 javadoc 9m 37s The applied patch generated 2 additional warning messages. +1 release audit 0m 24s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 25s There were no new checkstyle issues. +1 whitespace 0m 4s The patch has no lines that end in whitespace. +1 install 1m 22s mvn install still works. +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse. +1 findbugs 0m 47s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 tools/hadoop tests 6m 19s Tests passed in hadoop-distcp.     42m 32s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12748535/HDFS-8828.003.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 469cfcd javadoc https://builds.apache.org/job/PreCommit-HDFS-Build/11891/artifact/patchprocess/diffJavadocWarnings.txt hadoop-distcp test log https://builds.apache.org/job/PreCommit-HDFS-Build/11891/artifact/patchprocess/testrun_hadoop-distcp.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11891/testReport/ Java 1.7.0_55 uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11891/console This message was automatically generated.
          Hide
          jingzhao Jing Zhao added a comment -

          Thanks for working on this, Yufei! One quick comment is about the following change:

          -    if ((!syncFolder || !deleteMissing) && useDiff) {
          +    if ((!syncFolder || deleteMissing) && useDiff) {
                 throw new IllegalArgumentException(
          -          "Diff is valid only with update and delete options");
          +          "Diff is valid only with update options");
               }
          

          Currently we delete files/directories according to DELETE diff already. Looks to me this is consistent with the "deleteMissing" option actually. Any specific reason we want to change the semantic here?

          Show
          jingzhao Jing Zhao added a comment - Thanks for working on this, Yufei! One quick comment is about the following change: - if ((!syncFolder || !deleteMissing) && useDiff) { + if ((!syncFolder || deleteMissing) && useDiff) { throw new IllegalArgumentException( - "Diff is valid only with update and delete options" ); + "Diff is valid only with update options" ); } Currently we delete files/directories according to DELETE diff already. Looks to me this is consistent with the "deleteMissing" option actually. Any specific reason we want to change the semantic here?
          Hide
          yufeigu Yufei Gu added a comment -

          Hi Yongjun,

          Thank you very much for detailed code review and all nice suggestion. I've upload a new patch(HDFS-8828.003.path) for above comments.

          Show
          yufeigu Yufei Gu added a comment - Hi Yongjun, Thank you very much for detailed code review and all nice suggestion. I've upload a new patch( HDFS-8828 .003.path) for above comments.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Yufei Gu,

          Thanks for reporting the issue and the patch! It would be a very good performance improvement for distcp if we can skip files copied previously.

          I browsed the patch and have some comments:

          1. Would you please add java doc to the new methods you introduced including the test area (to describe what the test does)?
          2. There is coding guideline of 80 chars per line. I saw some in your change that exceeded 80.
          3. With the type field introduced, constructor DiffInfo(Path source, Path target) would leave type undefined. sounds this constructor should be totally replaced with DiffInfo(Path source, Path target, SnapshotDiffReport.DiffType type).
          4. javadoc is needed here: static DiffInfo[] getDiffsForListBuilding(SnapshotDiffReport report) check RENAME but doesn't check DELETE operation; DistCpSync#sync checks RENAME and DELETE operation. I can see that these two methods work together: deleted files are handled by the sync method only; but renamed file may be modfied too, thus need to be handled by both methods.
          5. Before we see it fully functional, it might be a good idea to introduce a new distcp switch to enable this improvement. When disabled, it behave like before. However, the new switch might make user interface and internal implemenatation more complex. If everything works nicely, we might not need the new switch. We can address this comment later.
          6. Add space between ")" and "throws" in private Path getFullPath(Path path)throws IOException
          7. Generally, add a space between "//" and the text following it
          8. private Path getFullPath(Path path)throws IOException is not clear to me:
            1. why it need a loop if it just need to use the last entry in inputs?
            2. path = root; return path can be simplified as return root
            3. is root the right variable name? The method is getFullPath, seems not an accurate name. I guess you meant getFullPathOfRoot?
          9. The following comment is not very clear to me:
              // if two paths are the same, just modify the target path of
              // "MODIFY" actions, ignore "CREATE" action.
            

            Suggest to try to make it more clear.

          10. When the patch gets to a shape, we need to update distcp document to describe the new improvemnt. Some tricky scenario in the snapshot diff report need to be explained in the document.

          Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Hi Yufei Gu , Thanks for reporting the issue and the patch! It would be a very good performance improvement for distcp if we can skip files copied previously. I browsed the patch and have some comments: Would you please add java doc to the new methods you introduced including the test area (to describe what the test does)? There is coding guideline of 80 chars per line. I saw some in your change that exceeded 80. With the type field introduced, constructor DiffInfo(Path source, Path target) would leave type undefined. sounds this constructor should be totally replaced with DiffInfo(Path source, Path target, SnapshotDiffReport.DiffType type) . javadoc is needed here: static DiffInfo[] getDiffsForListBuilding(SnapshotDiffReport report) check RENAME but doesn't check DELETE operation; DistCpSync#sync checks RENAME and DELETE operation. I can see that these two methods work together: deleted files are handled by the sync method only; but renamed file may be modfied too, thus need to be handled by both methods. Before we see it fully functional, it might be a good idea to introduce a new distcp switch to enable this improvement. When disabled, it behave like before. However, the new switch might make user interface and internal implemenatation more complex. If everything works nicely, we might not need the new switch. We can address this comment later. Add space between ")" and "throws" in private Path getFullPath(Path path)throws IOException Generally, add a space between "//" and the text following it private Path getFullPath(Path path)throws IOException is not clear to me: why it need a loop if it just need to use the last entry in inputs ? path = root; return path can be simplified as return root is root the right variable name? The method is getFullPath , seems not an accurate name. I guess you meant getFullPathOfRoot ? The following comment is not very clear to me: // if two paths are the same, just modify the target path of // "MODIFY" actions, ignore "CREATE" action. Suggest to try to make it more clear. When the patch gets to a shape, we need to update distcp document to describe the new improvemnt. Some tricky scenario in the snapshot diff report need to be explained in the document. Thanks.
          Hide
          hadoopqa Hadoop QA added a comment -



          +1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 15m 31s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 2 new or modified test files.
          +1 javac 7m 34s There were no new javac warning messages.
          +1 javadoc 9m 39s There were no new javadoc warning messages.
          +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings.
          +1 checkstyle 0m 27s There were no new checkstyle issues.
          +1 whitespace 0m 4s The patch has no lines that end in whitespace.
          +1 install 1m 21s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 0m 47s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 tools/hadoop tests 6m 27s Tests passed in hadoop-distcp.
              42m 49s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12748294/HDFS-8828.002.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 90b5104
          hadoop-distcp test log https://builds.apache.org/job/PreCommit-HDFS-Build/11884/artifact/patchprocess/testrun_hadoop-distcp.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11884/testReport/
          Java 1.7.0_55
          uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11884/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 pre-patch 15m 31s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 2 new or modified test files. +1 javac 7m 34s There were no new javac warning messages. +1 javadoc 9m 39s There were no new javadoc warning messages. +1 release audit 0m 23s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 0m 27s There were no new checkstyle issues. +1 whitespace 0m 4s The patch has no lines that end in whitespace. +1 install 1m 21s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 0m 47s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 tools/hadoop tests 6m 27s Tests passed in hadoop-distcp.     42m 49s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12748294/HDFS-8828.002.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 90b5104 hadoop-distcp test log https://builds.apache.org/job/PreCommit-HDFS-Build/11884/artifact/patchprocess/testrun_hadoop-distcp.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11884/testReport/ Java 1.7.0_55 uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11884/console This message was automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 16m 0s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 2 new or modified test files.
          +1 javac 7m 57s There were no new javac warning messages.
          +1 javadoc 11m 58s There were no new javadoc warning messages.
          +1 release audit 0m 29s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 0m 29s The applied patch generated 19 new checkstyle issues (total was 120, now 139).
          +1 whitespace 0m 2s The patch has no lines that end in whitespace.
          +1 install 1m 52s mvn install still works.
          +1 eclipse:eclipse 0m 45s The patch built with eclipse:eclipse.
          +1 findbugs 1m 13s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 tools/hadoop tests 7m 18s Tests passed in hadoop-distcp.
              48m 8s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12747636/HDFS-8828.001.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / d311a38
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/11881/artifact/patchprocess/diffcheckstylehadoop-distcp.txt
          hadoop-distcp test log https://builds.apache.org/job/PreCommit-HDFS-Build/11881/artifact/patchprocess/testrun_hadoop-distcp.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11881/testReport/
          Java 1.7.0_55
          uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11881/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 16m 0s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 2 new or modified test files. +1 javac 7m 57s There were no new javac warning messages. +1 javadoc 11m 58s There were no new javadoc warning messages. +1 release audit 0m 29s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 0m 29s The applied patch generated 19 new checkstyle issues (total was 120, now 139). +1 whitespace 0m 2s The patch has no lines that end in whitespace. +1 install 1m 52s mvn install still works. +1 eclipse:eclipse 0m 45s The patch built with eclipse:eclipse. +1 findbugs 1m 13s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 tools/hadoop tests 7m 18s Tests passed in hadoop-distcp.     48m 8s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12747636/HDFS-8828.001.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / d311a38 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/11881/artifact/patchprocess/diffcheckstylehadoop-distcp.txt hadoop-distcp test log https://builds.apache.org/job/PreCommit-HDFS-Build/11881/artifact/patchprocess/testrun_hadoop-distcp.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/11881/testReport/ Java 1.7.0_55 uname Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/11881/console This message was automatically generated.
          Hide
          yufeigu Yufei Gu added a comment -

          Submitted patch rev 001.

          Show
          yufeigu Yufei Gu added a comment - Submitted patch rev 001.

            People

            • Assignee:
              yufeigu Yufei Gu
              Reporter:
              yufeigu Yufei Gu
            • Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development