Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9820

Improve distcp to support efficient restore to an earlier snapshot

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.6.4
    • Fix Version/s: 2.9.0, 3.0.0-alpha2
    • Component/s: distcp
    • Labels:
      None

      Description

      A common use scenario (scenaio 1):

      1. create snapshot sx in clusterX,
      2. do some experiemnts in clusterX, which creates some files.
      3. throw away the files changed and go back to sx.

      Another scenario (scenario 2) is, there is a production cluster and a backup cluster, we periodically sync up the data from production cluster to the backup cluster with distcp.

      The cluster in scenario 1 could be the backup cluster in scenario 2.

      For scenario 1:

      HDFS-4167 intends to restore HDFS to the most recent snapshot, and there are some complexity and challenges. Before that jira is implemented, we count on distcp to copy from snapshot to the current state. However, the performance of this operation could be very bad because we have to go through all files even if we only changed a few files.

      For scenario 2:

      HDFS-7535 improved distcp performance by avoiding copying files that changed name since last backup.

      On top of HDFS-7535, HDFS-8828 improved distcp performance when copying data from source to target cluster, by only copying changed files since last backup. The way it works is use snapshot diff to find out all files changed, and copy the changed files only.

      See https://blog.cloudera.com/blog/2015/12/distcp-performance-improvements-in-apache-hadoop/

      This jira is to propose a variation of HDFS-8828, to find out the files changed in target cluster since last snapshot sx, and copy these from snapshot sx of either the source or the target cluster, to restore target cluster's current state to sx.

      Specifically,

      If a file/dir is

      • renamed, rename it back
      • created in target cluster, delete it
      • modified, put it to the copy list
      • run distcp with the copy list, copy from the source cluster's corresponding snapshot

      This could be a new command line switch -rdiff in distcp.

      As a native restore feature, HDFS-4167 would still be ideal to have. However, HDFS-9820 would hopefully be easier to implement, before HDFS-4167 is in place.

      1. HDFS-9820.001.patch
        51 kB
        Yongjun Zhang
      2. HDFS-9820.002.patch
        56 kB
        Yongjun Zhang
      3. HDFS-9820.003.patch
        51 kB
        Yongjun Zhang
      4. HDFS-9820.004.patch
        59 kB
        Yongjun Zhang
      5. HDFS-9820.005.patch
        80 kB
        Yongjun Zhang
      6. HDFS-9820.006.patch
        76 kB
        Yongjun Zhang
      7. HDFS-9820.007.patch
        77 kB
        Yongjun Zhang
      8. HDFS-9820.008.patch
        77 kB
        Yongjun Zhang
      9. HDFS-9820.009.patch
        77 kB
        Yongjun Zhang
      10. HDFS-9820.branch-2.002.patch
        82 kB
        Yongjun Zhang
      11. HDFS-9820.branch-2.patch
        82 kB
        Yongjun Zhang

        Issue Links

          Activity

          Hide
          jingzhao Jing Zhao added a comment -

          Thanks for creating the jira, Yongjun Zhang. I think this jira is not very tightly related to snapshot restoring or HDFS-4167. Instead, this is a nice feature to have for snapshot-diff-based distcp, which help us get rid of the assumption/requirement that no changes can be made on the target FS after the last copy. With this feature we now have an option to detect and revert the change and continue using snapshot diff for distcp.

          Show
          jingzhao Jing Zhao added a comment - Thanks for creating the jira, Yongjun Zhang . I think this jira is not very tightly related to snapshot restoring or HDFS-4167 . Instead, this is a nice feature to have for snapshot-diff-based distcp, which help us get rid of the assumption/requirement that no changes can be made on the target FS after the last copy. With this feature we now have an option to detect and revert the change and continue using snapshot diff for distcp.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Thanks Jing Zhao!

          Agree. HDFS-4167 also has the restriction that it can only revert back to the most recent snapshot, not any earlier ones, per your comment in the jira:

          Note that the snapshot must be the most recent one. We throw exceptions if there are intermediate snapshot. This is the same behavior with ZFS.

          Show
          yzhangal Yongjun Zhang added a comment - Thanks Jing Zhao ! Agree. HDFS-4167 also has the restriction that it can only revert back to the most recent snapshot, not any earlier ones, per your comment in the jira: Note that the snapshot must be the most recent one. We throw exceptions if there are intermediate snapshot. This is the same behavior with ZFS.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hmm, in theory, I think we could make HDFS-4167 to relax the restriction, such that it can revert to any snapshot. Right Jing Zhao? Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Hmm, in theory, I think we could make HDFS-4167 to relax the restriction, such that it can revert to any snapshot. Right Jing Zhao ? Thanks.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Attaching patch rev 001.

          A summary:

          • Built on top of HDFS-8828, HDFS-7535, introduced command line switch -rdiff to enable this feature.
          • Assume there is snapshot sx on both distcp source and target, then there is change made on target, and this feature when used will revert the change done on target by applying the snapshot diff in a reverted way of how HDFS-7535 is implemented. Then it will build the copy list to copy data from distcp source' sx to distcp target.
          • In theory, the distcp source can be snapshot sx at either a remote cluster, or at the distcp target, some further test need to be done.

          HI Jing Zhao and Yufei Gu, would you please take a look when you get chance? There are places to improve, but I hope to get some initial input. Thanks a lot.

          Show
          yzhangal Yongjun Zhang added a comment - Attaching patch rev 001. A summary: Built on top of HDFS-8828 , HDFS-7535 , introduced command line switch -rdiff to enable this feature. Assume there is snapshot sx on both distcp source and target, then there is change made on target, and this feature when used will revert the change done on target by applying the snapshot diff in a reverted way of how HDFS-7535 is implemented. Then it will build the copy list to copy data from distcp source' sx to distcp target. In theory, the distcp source can be snapshot sx at either a remote cluster, or at the distcp target, some further test need to be done. HI Jing Zhao and Yufei Gu , would you please take a look when you get chance? There are places to improve, but I hope to get some initial input. Thanks a lot.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 25s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 9m 46s trunk passed
          +1 compile 0m 25s trunk passed with JDK v1.8.0_74
          +1 compile 0m 22s trunk passed with JDK v1.7.0_95
          +1 checkstyle 0m 24s trunk passed
          +1 mvnsite 0m 29s trunk passed
          +1 mvneclipse 0m 20s trunk passed
          +1 findbugs 0m 38s trunk passed
          +1 javadoc 0m 21s trunk passed with JDK v1.8.0_74
          +1 javadoc 0m 21s trunk passed with JDK v1.7.0_95
          +1 mvninstall 0m 24s the patch passed
          +1 compile 0m 22s the patch passed with JDK v1.8.0_74
          +1 javac 0m 22s the patch passed
          +1 compile 0m 21s the patch passed with JDK v1.7.0_95
          +1 javac 0m 21s the patch passed
          -1 checkstyle 0m 17s hadoop-tools/hadoop-distcp: patch generated 6 new + 142 unchanged - 2 fixed = 148 total (was 144)
          +1 mvnsite 0m 26s the patch passed
          +1 mvneclipse 0m 15s the patch passed
          -1 whitespace 0m 0s The patch has 23 line(s) that end in whitespace. Use git apply --whitespace=fix.
          +1 findbugs 0m 52s the patch passed
          +1 javadoc 0m 18s the patch passed with JDK v1.8.0_74
          +1 javadoc 0m 16s the patch passed with JDK v1.7.0_95
          -1 unit 10m 32s hadoop-distcp in the patch failed with JDK v1.8.0_74.
          -1 unit 9m 16s hadoop-distcp in the patch failed with JDK v1.7.0_95.
          +1 asflicense 0m 20s Patch does not generate ASF License warnings.
          38m 35s



          Reason Tests
          JDK v1.8.0_74 Failed junit tests hadoop.tools.TestOptionsParser
          JDK v1.7.0_95 Failed junit tests hadoop.tools.TestOptionsParser



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:fbe3e86
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12794260/HDFS-9820.001.patch
          JIRA Issue HDFS-9820
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux e1db69ba4aa5 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 33239c9
          Default Java 1.7.0_95
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_74 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/14869/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-distcp.txt
          whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/14869/artifact/patchprocess/whitespace-eol.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14869/artifact/patchprocess/patch-unit-hadoop-tools_hadoop-distcp-jdk1.8.0_74.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14869/artifact/patchprocess/patch-unit-hadoop-tools_hadoop-distcp-jdk1.7.0_95.txt
          unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14869/artifact/patchprocess/patch-unit-hadoop-tools_hadoop-distcp-jdk1.8.0_74.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14869/artifact/patchprocess/patch-unit-hadoop-tools_hadoop-distcp-jdk1.7.0_95.txt
          JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14869/testReport/
          modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14869/console
          Powered by Apache Yetus 0.2.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 25s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 9m 46s trunk passed +1 compile 0m 25s trunk passed with JDK v1.8.0_74 +1 compile 0m 22s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 24s trunk passed +1 mvnsite 0m 29s trunk passed +1 mvneclipse 0m 20s trunk passed +1 findbugs 0m 38s trunk passed +1 javadoc 0m 21s trunk passed with JDK v1.8.0_74 +1 javadoc 0m 21s trunk passed with JDK v1.7.0_95 +1 mvninstall 0m 24s the patch passed +1 compile 0m 22s the patch passed with JDK v1.8.0_74 +1 javac 0m 22s the patch passed +1 compile 0m 21s the patch passed with JDK v1.7.0_95 +1 javac 0m 21s the patch passed -1 checkstyle 0m 17s hadoop-tools/hadoop-distcp: patch generated 6 new + 142 unchanged - 2 fixed = 148 total (was 144) +1 mvnsite 0m 26s the patch passed +1 mvneclipse 0m 15s the patch passed -1 whitespace 0m 0s The patch has 23 line(s) that end in whitespace. Use git apply --whitespace=fix. +1 findbugs 0m 52s the patch passed +1 javadoc 0m 18s the patch passed with JDK v1.8.0_74 +1 javadoc 0m 16s the patch passed with JDK v1.7.0_95 -1 unit 10m 32s hadoop-distcp in the patch failed with JDK v1.8.0_74. -1 unit 9m 16s hadoop-distcp in the patch failed with JDK v1.7.0_95. +1 asflicense 0m 20s Patch does not generate ASF License warnings. 38m 35s Reason Tests JDK v1.8.0_74 Failed junit tests hadoop.tools.TestOptionsParser JDK v1.7.0_95 Failed junit tests hadoop.tools.TestOptionsParser Subsystem Report/Notes Docker Image:yetus/hadoop:fbe3e86 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12794260/HDFS-9820.001.patch JIRA Issue HDFS-9820 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux e1db69ba4aa5 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 33239c9 Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_74 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/14869/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-distcp.txt whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/14869/artifact/patchprocess/whitespace-eol.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14869/artifact/patchprocess/patch-unit-hadoop-tools_hadoop-distcp-jdk1.8.0_74.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14869/artifact/patchprocess/patch-unit-hadoop-tools_hadoop-distcp-jdk1.7.0_95.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14869/artifact/patchprocess/patch-unit-hadoop-tools_hadoop-distcp-jdk1.8.0_74.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14869/artifact/patchprocess/patch-unit-hadoop-tools_hadoop-distcp-jdk1.7.0_95.txt JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14869/testReport/ modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14869/console Powered by Apache Yetus 0.2.0 http://yetus.apache.org This message was automatically generated.
          Hide
          yzhangal Yongjun Zhang added a comment -

          A hopefully improved version is on the way ...

          Show
          yzhangal Yongjun Zhang added a comment - A hopefully improved version is on the way ...
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Jing Zhao and Yufei Gu,

          I just attached patch rev 002. I did basic cluster testing of copying from either source path's snapshot, or target's patch snapshot, it can revert the changes done at the target and copy from the snapshot (source's or target's). I do believe this version is cleaner than rev 001.

          Would you please help taking a look?

          Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Hi Jing Zhao and Yufei Gu , I just attached patch rev 002. I did basic cluster testing of copying from either source path's snapshot, or target's patch snapshot, it can revert the changes done at the target and copy from the snapshot (source's or target's). I do believe this version is cleaner than rev 001. Would you please help taking a look? Thanks.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 11s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 7m 0s trunk passed
          +1 compile 0m 15s trunk passed with JDK v1.8.0_74
          +1 compile 0m 18s trunk passed with JDK v1.7.0_95
          +1 checkstyle 0m 15s trunk passed
          +1 mvnsite 0m 23s trunk passed
          +1 mvneclipse 0m 13s trunk passed
          +1 findbugs 0m 27s trunk passed
          +1 javadoc 0m 13s trunk passed with JDK v1.8.0_74
          +1 javadoc 0m 15s trunk passed with JDK v1.7.0_95
          +1 mvninstall 0m 18s the patch passed
          +1 compile 0m 12s the patch passed with JDK v1.8.0_74
          +1 javac 0m 12s the patch passed
          +1 compile 0m 14s the patch passed with JDK v1.7.0_95
          +1 javac 0m 14s the patch passed
          -1 checkstyle 0m 13s hadoop-tools/hadoop-distcp: patch generated 9 new + 133 unchanged - 11 fixed = 142 total (was 144)
          +1 mvnsite 0m 20s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 0m 38s the patch passed
          +1 javadoc 0m 10s the patch passed with JDK v1.8.0_74
          +1 javadoc 0m 12s the patch passed with JDK v1.7.0_95
          -1 unit 8m 50s hadoop-distcp in the patch failed with JDK v1.8.0_74.
          -1 unit 8m 8s hadoop-distcp in the patch failed with JDK v1.7.0_95.
          +1 asflicense 0m 22s Patch does not generate ASF License warnings.
          30m 15s



          Reason Tests
          JDK v1.8.0_74 Failed junit tests hadoop.tools.TestOptionsParser
          JDK v1.7.0_95 Failed junit tests hadoop.tools.TestOptionsParser



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:fbe3e86
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12794948/HDFS-9820.002.patch
          JIRA Issue HDFS-9820
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 7e359f0a9fc0 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / a107cee
          Default Java 1.7.0_95
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_74 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/14906/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-distcp.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14906/artifact/patchprocess/patch-unit-hadoop-tools_hadoop-distcp-jdk1.8.0_74.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14906/artifact/patchprocess/patch-unit-hadoop-tools_hadoop-distcp-jdk1.7.0_95.txt
          unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14906/artifact/patchprocess/patch-unit-hadoop-tools_hadoop-distcp-jdk1.8.0_74.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14906/artifact/patchprocess/patch-unit-hadoop-tools_hadoop-distcp-jdk1.7.0_95.txt
          JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14906/testReport/
          modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14906/console
          Powered by Apache Yetus 0.2.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 11s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 7m 0s trunk passed +1 compile 0m 15s trunk passed with JDK v1.8.0_74 +1 compile 0m 18s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 15s trunk passed +1 mvnsite 0m 23s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 0m 27s trunk passed +1 javadoc 0m 13s trunk passed with JDK v1.8.0_74 +1 javadoc 0m 15s trunk passed with JDK v1.7.0_95 +1 mvninstall 0m 18s the patch passed +1 compile 0m 12s the patch passed with JDK v1.8.0_74 +1 javac 0m 12s the patch passed +1 compile 0m 14s the patch passed with JDK v1.7.0_95 +1 javac 0m 14s the patch passed -1 checkstyle 0m 13s hadoop-tools/hadoop-distcp: patch generated 9 new + 133 unchanged - 11 fixed = 142 total (was 144) +1 mvnsite 0m 20s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 0m 38s the patch passed +1 javadoc 0m 10s the patch passed with JDK v1.8.0_74 +1 javadoc 0m 12s the patch passed with JDK v1.7.0_95 -1 unit 8m 50s hadoop-distcp in the patch failed with JDK v1.8.0_74. -1 unit 8m 8s hadoop-distcp in the patch failed with JDK v1.7.0_95. +1 asflicense 0m 22s Patch does not generate ASF License warnings. 30m 15s Reason Tests JDK v1.8.0_74 Failed junit tests hadoop.tools.TestOptionsParser JDK v1.7.0_95 Failed junit tests hadoop.tools.TestOptionsParser Subsystem Report/Notes Docker Image:yetus/hadoop:fbe3e86 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12794948/HDFS-9820.002.patch JIRA Issue HDFS-9820 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 7e359f0a9fc0 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / a107cee Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_74 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/14906/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-distcp.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14906/artifact/patchprocess/patch-unit-hadoop-tools_hadoop-distcp-jdk1.8.0_74.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14906/artifact/patchprocess/patch-unit-hadoop-tools_hadoop-distcp-jdk1.7.0_95.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14906/artifact/patchprocess/patch-unit-hadoop-tools_hadoop-distcp-jdk1.8.0_74.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14906/artifact/patchprocess/patch-unit-hadoop-tools_hadoop-distcp-jdk1.7.0_95.txt JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14906/testReport/ modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14906/console Powered by Apache Yetus 0.2.0 http://yetus.apache.org This message was automatically generated.
          Hide
          jingzhao Jing Zhao added a comment -

          Thanks for working on this, Yongjun Zhang! Some comments on the current patch:

          1. For the rdiff, We can switch the order of from-snapshot and to-snapshot before computing the diff report on the target cluster. In this way we can reuse the original sync code.
          2. Currently rdiff is a standalone option for distcp. This means we're using distcp to do the restore. To restore a directory back to a snapshot, this may not be the most efficient way compared with a local restoring solution (HDFS-4167), which can avoid most of the unnecessary data copying and can provide a copy-on-write semantic when supporting restoring appended/truncated files.
          3. But before we finish the work in HDFS-4167, maybe we can augment the current diff-based distcp by allowing the admin to choose to restore the target back to the latest snapshot. We can still use the implementation in the current patch, but instead of adding a new rdiff option for distcp, we add a "--force" option to the current diff-based distcp. What do you think?
          Show
          jingzhao Jing Zhao added a comment - Thanks for working on this, Yongjun Zhang ! Some comments on the current patch: For the rdiff, We can switch the order of from-snapshot and to-snapshot before computing the diff report on the target cluster. In this way we can reuse the original sync code. Currently rdiff is a standalone option for distcp. This means we're using distcp to do the restore. To restore a directory back to a snapshot, this may not be the most efficient way compared with a local restoring solution ( HDFS-4167 ), which can avoid most of the unnecessary data copying and can provide a copy-on-write semantic when supporting restoring appended/truncated files. But before we finish the work in HDFS-4167 , maybe we can augment the current diff-based distcp by allowing the admin to choose to restore the target back to the latest snapshot. We can still use the implementation in the current patch, but instead of adding a new rdiff option for distcp, we add a "--force" option to the current diff-based distcp. What do you think?
          Hide
          yufeigu Yufei Gu added a comment -

          Hi Yongjun,

          Thanks a lot for working on it. I have some comments here.
          1. Why not use getSnapshotDiffReport(target, "", s1) to get the diff report instead manually twiddle between delete/create?
          2. Since this one is also trying to restore HDFS to previous snapshot, should test cases cover it? e.g. providing another set of test
          case which source and target are the same.
          3. It will be nice to delete the line // syncAndVerify(); in function testSync5.

          Show
          yufeigu Yufei Gu added a comment - Hi Yongjun, Thanks a lot for working on it. I have some comments here. 1. Why not use getSnapshotDiffReport(target, "", s1) to get the diff report instead manually twiddle between delete/create? 2. Since this one is also trying to restore HDFS to previous snapshot, should test cases cover it? e.g. providing another set of test case which source and target are the same. 3. It will be nice to delete the line // syncAndVerify(); in function testSync5 .
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Jing Zhao and Yufei Gu,

          Thanks much for your review! sorry for getting back late since I was out last week.

          For the comment #1 from both of you, I did an attempt, and found a bug HDFS-10263.

          Jing, would you please confirm if it's a bug, and wonder if you have a quick solution for this bug I found?

          Thanks a lot.

          Show
          yzhangal Yongjun Zhang added a comment - Hi Jing Zhao and Yufei Gu , Thanks much for your review! sorry for getting back late since I was out last week. For the comment #1 from both of you, I did an attempt, and found a bug HDFS-10263 . Jing, would you please confirm if it's a bug, and wonder if you have a quick solution for this bug I found? Thanks a lot.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Jing Zhao,

          About your other comments:

          2. Currently rdiff is a standalone option for distcp. This means we're using distcp to do the restore. To restore a directory back to a snapshot, this may not be the most efficient way compared with a local restoring solution (HDFS-4167), which can avoid most of the unnecessary data copying and can provide a copy-on-write semantic when supporting restoring appended/truncated files.

          I agree that HDFS-4167 is more efficient because it's native. However, there is quite some complexity there. Before we have HDFS-4167, I'm hoping HDFS-9820 can be a simpler solution and reasonably fast, especially with the ground work in HDFS-7535 and HDFS-8828. For appended/truncated data, MAPREDUCE-6572 would help when implemented. The idea there is to remember the truncated length of file, and copy only changed data.

          3. But before we finish the work in HDFS-4167, maybe we can augment the current diff-based distcp by allowing the admin to choose to restore the target back to the latest snapshot. We can still use the implementation in the current patch, but instead of adding a new rdiff option for distcp, we add a "--force" option to the current diff-based distcp. What do you think?

          In my opinion, the functions of -diff and -rdiff are quite different, the former is to copy changed diff from source cluster to target; the latter is to revert changes made in target (or any) cluster to a snapshot point. So I personally think it's more clear to use two different command options, thus less error-prone, from user's perspective. In addition, -diff requires two snapshot names as parameters, and -rdiff just need one.

          Wonder if you agree?

          Thanks a lot.

          Show
          yzhangal Yongjun Zhang added a comment - Hi Jing Zhao , About your other comments: 2. Currently rdiff is a standalone option for distcp. This means we're using distcp to do the restore. To restore a directory back to a snapshot, this may not be the most efficient way compared with a local restoring solution ( HDFS-4167 ), which can avoid most of the unnecessary data copying and can provide a copy-on-write semantic when supporting restoring appended/truncated files. I agree that HDFS-4167 is more efficient because it's native. However, there is quite some complexity there. Before we have HDFS-4167 , I'm hoping HDFS-9820 can be a simpler solution and reasonably fast, especially with the ground work in HDFS-7535 and HDFS-8828 . For appended/truncated data, MAPREDUCE-6572 would help when implemented. The idea there is to remember the truncated length of file, and copy only changed data. 3. But before we finish the work in HDFS-4167 , maybe we can augment the current diff-based distcp by allowing the admin to choose to restore the target back to the latest snapshot. We can still use the implementation in the current patch, but instead of adding a new rdiff option for distcp, we add a "--force" option to the current diff-based distcp. What do you think? In my opinion, the functions of -diff and -rdiff are quite different, the former is to copy changed diff from source cluster to target; the latter is to revert changes made in target (or any) cluster to a snapshot point. So I personally think it's more clear to use two different command options, thus less error-prone, from user's perspective. In addition, -diff requires two snapshot names as parameters, and -rdiff just need one. Wonder if you agree? Thanks a lot.
          Hide
          yzhangal Yongjun Zhang added a comment -

          HI Jing Zhao,

          I'm uploading rev 003 to use -diff "" <ss> as command line to indicate that we are reverting changes made at target cluster.

          Would you please help taking a look?

          Thanks a lot!

          Show
          yzhangal Yongjun Zhang added a comment - HI Jing Zhao , I'm uploading rev 003 to use -diff "" <ss> as command line to indicate that we are reverting changes made at target cluster. Would you please help taking a look? Thanks a lot!
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 16s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 9m 29s trunk passed
          +1 compile 0m 28s trunk passed with JDK v1.8.0_77
          +1 compile 0m 25s trunk passed with JDK v1.7.0_95
          +1 checkstyle 0m 22s trunk passed
          +1 mvnsite 0m 32s trunk passed
          +1 mvneclipse 0m 22s trunk passed
          +1 findbugs 0m 39s trunk passed
          +1 javadoc 0m 23s trunk passed with JDK v1.8.0_77
          +1 javadoc 0m 22s trunk passed with JDK v1.7.0_95
          +1 mvninstall 0m 25s the patch passed
          +1 compile 0m 24s the patch passed with JDK v1.8.0_77
          +1 javac 0m 24s the patch passed
          +1 compile 0m 20s the patch passed with JDK v1.7.0_95
          +1 javac 0m 20s the patch passed
          -1 checkstyle 0m 17s hadoop-tools/hadoop-distcp: patch generated 8 new + 90 unchanged - 11 fixed = 98 total (was 101)
          +1 mvnsite 0m 26s the patch passed
          +1 mvneclipse 0m 14s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          -1 findbugs 0m 51s hadoop-tools/hadoop-distcp generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
          +1 javadoc 0m 19s the patch passed with JDK v1.8.0_77
          +1 javadoc 0m 18s the patch passed with JDK v1.7.0_95
          -1 unit 10m 59s hadoop-distcp in the patch failed with JDK v1.8.0_77.
          -1 unit 10m 19s hadoop-distcp in the patch failed with JDK v1.7.0_95.
          +1 asflicense 0m 24s Patch does not generate ASF License warnings.
          40m 4s



          Reason Tests
          FindBugs module:hadoop-tools/hadoop-distcp
            Comparison of String parameter using == or != in org.apache.hadoop.tools.DistCpOptions.setUseDiff(boolean, String, String) At DistCpOptions.java:== or != in org.apache.hadoop.tools.DistCpOptions.setUseDiff(boolean, String, String) At DistCpOptions.java:[line 308]
          JDK v1.8.0_77 Failed junit tests hadoop.tools.TestOptionsParser
          JDK v1.7.0_95 Failed junit tests hadoop.tools.TestOptionsParser



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:fbe3e86
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12798162/HDFS-9820.003.patch
          JIRA Issue HDFS-9820
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux b5d042103fd2 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 44bbc50
          Default Java 1.7.0_95
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/15138/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-distcp.txt
          findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/15138/artifact/patchprocess/new-findbugs-hadoop-tools_hadoop-distcp.html
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/15138/artifact/patchprocess/patch-unit-hadoop-tools_hadoop-distcp-jdk1.8.0_77.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/15138/artifact/patchprocess/patch-unit-hadoop-tools_hadoop-distcp-jdk1.7.0_95.txt
          unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/15138/artifact/patchprocess/patch-unit-hadoop-tools_hadoop-distcp-jdk1.8.0_77.txt https://builds.apache.org/job/PreCommit-HDFS-Build/15138/artifact/patchprocess/patch-unit-hadoop-tools_hadoop-distcp-jdk1.7.0_95.txt
          JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/15138/testReport/
          modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/15138/console
          Powered by Apache Yetus 0.2.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 16s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 9m 29s trunk passed +1 compile 0m 28s trunk passed with JDK v1.8.0_77 +1 compile 0m 25s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 22s trunk passed +1 mvnsite 0m 32s trunk passed +1 mvneclipse 0m 22s trunk passed +1 findbugs 0m 39s trunk passed +1 javadoc 0m 23s trunk passed with JDK v1.8.0_77 +1 javadoc 0m 22s trunk passed with JDK v1.7.0_95 +1 mvninstall 0m 25s the patch passed +1 compile 0m 24s the patch passed with JDK v1.8.0_77 +1 javac 0m 24s the patch passed +1 compile 0m 20s the patch passed with JDK v1.7.0_95 +1 javac 0m 20s the patch passed -1 checkstyle 0m 17s hadoop-tools/hadoop-distcp: patch generated 8 new + 90 unchanged - 11 fixed = 98 total (was 101) +1 mvnsite 0m 26s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. -1 findbugs 0m 51s hadoop-tools/hadoop-distcp generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) +1 javadoc 0m 19s the patch passed with JDK v1.8.0_77 +1 javadoc 0m 18s the patch passed with JDK v1.7.0_95 -1 unit 10m 59s hadoop-distcp in the patch failed with JDK v1.8.0_77. -1 unit 10m 19s hadoop-distcp in the patch failed with JDK v1.7.0_95. +1 asflicense 0m 24s Patch does not generate ASF License warnings. 40m 4s Reason Tests FindBugs module:hadoop-tools/hadoop-distcp   Comparison of String parameter using == or != in org.apache.hadoop.tools.DistCpOptions.setUseDiff(boolean, String, String) At DistCpOptions.java:== or != in org.apache.hadoop.tools.DistCpOptions.setUseDiff(boolean, String, String) At DistCpOptions.java: [line 308] JDK v1.8.0_77 Failed junit tests hadoop.tools.TestOptionsParser JDK v1.7.0_95 Failed junit tests hadoop.tools.TestOptionsParser Subsystem Report/Notes Docker Image:yetus/hadoop:fbe3e86 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12798162/HDFS-9820.003.patch JIRA Issue HDFS-9820 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux b5d042103fd2 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 44bbc50 Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/15138/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-distcp.txt findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/15138/artifact/patchprocess/new-findbugs-hadoop-tools_hadoop-distcp.html unit https://builds.apache.org/job/PreCommit-HDFS-Build/15138/artifact/patchprocess/patch-unit-hadoop-tools_hadoop-distcp-jdk1.8.0_77.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/15138/artifact/patchprocess/patch-unit-hadoop-tools_hadoop-distcp-jdk1.7.0_95.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/15138/artifact/patchprocess/patch-unit-hadoop-tools_hadoop-distcp-jdk1.8.0_77.txt https://builds.apache.org/job/PreCommit-HDFS-Build/15138/artifact/patchprocess/patch-unit-hadoop-tools_hadoop-distcp-jdk1.7.0_95.txt JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/15138/testReport/ modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp Console output https://builds.apache.org/job/PreCommit-HDFS-Build/15138/console Powered by Apache Yetus 0.2.0 http://yetus.apache.org This message was automatically generated.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Uploading rev 004 to

          • fix checkstyle issue and failed test
          • address review comments from Yufei about adding test to copy from target's own snapshot, which is what I planned to do originally too.

          Thanks Yufei Gu.

          Show
          yzhangal Yongjun Zhang added a comment - Uploading rev 004 to fix checkstyle issue and failed test address review comments from Yufei about adding test to copy from target's own snapshot, which is what I planned to do originally too. Thanks Yufei Gu .
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Jing Zhao and Yufei Gu,

          If you could help taking a look at the latest rev, I'd really appreciate it.

          Thank you!

          Show
          yzhangal Yongjun Zhang added a comment - Hi Jing Zhao and Yufei Gu , If you could help taking a look at the latest rev, I'd really appreciate it. Thank you!
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 10s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 5 new or modified test files.
          +1 mvninstall 6m 38s trunk passed
          +1 compile 0m 16s trunk passed with JDK v1.8.0_77
          +1 compile 0m 17s trunk passed with JDK v1.7.0_95
          +1 checkstyle 0m 15s trunk passed
          +1 mvnsite 0m 22s trunk passed
          +1 mvneclipse 0m 16s trunk passed
          +1 findbugs 0m 29s trunk passed
          +1 javadoc 0m 12s trunk passed with JDK v1.8.0_77
          +1 javadoc 0m 15s trunk passed with JDK v1.7.0_95
          +1 mvninstall 0m 17s the patch passed
          +1 compile 0m 12s the patch passed with JDK v1.8.0_77
          +1 javac 0m 12s the patch passed
          +1 compile 0m 15s the patch passed with JDK v1.7.0_95
          +1 javac 0m 15s the patch passed
          +1 checkstyle 0m 12s hadoop-tools/hadoop-distcp: patch generated 0 new + 84 unchanged - 17 fixed = 84 total (was 101)
          +1 mvnsite 0m 19s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          +1 whitespace 0m 1s Patch has no whitespace issues.
          +1 findbugs 0m 37s the patch passed
          +1 javadoc 0m 10s the patch passed with JDK v1.8.0_77
          +1 javadoc 0m 12s the patch passed with JDK v1.7.0_95
          +1 unit 8m 31s hadoop-distcp in the patch passed with JDK v1.8.0_77.
          +1 unit 7m 33s hadoop-distcp in the patch passed with JDK v1.7.0_95.
          +1 asflicense 0m 18s Patch does not generate ASF License warnings.
          28m 53s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:fbe3e86
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12798411/HDFS-9820.004.patch
          JIRA Issue HDFS-9820
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux c7d52c056d71 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 35f0770
          Default Java 1.7.0_95
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
          findbugs v3.0.0
          JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/15144/testReport/
          modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/15144/console
          Powered by Apache Yetus 0.2.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 10s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 5 new or modified test files. +1 mvninstall 6m 38s trunk passed +1 compile 0m 16s trunk passed with JDK v1.8.0_77 +1 compile 0m 17s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 15s trunk passed +1 mvnsite 0m 22s trunk passed +1 mvneclipse 0m 16s trunk passed +1 findbugs 0m 29s trunk passed +1 javadoc 0m 12s trunk passed with JDK v1.8.0_77 +1 javadoc 0m 15s trunk passed with JDK v1.7.0_95 +1 mvninstall 0m 17s the patch passed +1 compile 0m 12s the patch passed with JDK v1.8.0_77 +1 javac 0m 12s the patch passed +1 compile 0m 15s the patch passed with JDK v1.7.0_95 +1 javac 0m 15s the patch passed +1 checkstyle 0m 12s hadoop-tools/hadoop-distcp: patch generated 0 new + 84 unchanged - 17 fixed = 84 total (was 101) +1 mvnsite 0m 19s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 1s Patch has no whitespace issues. +1 findbugs 0m 37s the patch passed +1 javadoc 0m 10s the patch passed with JDK v1.8.0_77 +1 javadoc 0m 12s the patch passed with JDK v1.7.0_95 +1 unit 8m 31s hadoop-distcp in the patch passed with JDK v1.8.0_77. +1 unit 7m 33s hadoop-distcp in the patch passed with JDK v1.7.0_95. +1 asflicense 0m 18s Patch does not generate ASF License warnings. 28m 53s Subsystem Report/Notes Docker Image:yetus/hadoop:fbe3e86 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12798411/HDFS-9820.004.patch JIRA Issue HDFS-9820 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux c7d52c056d71 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 35f0770 Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_77 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/15144/testReport/ modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp Console output https://builds.apache.org/job/PreCommit-HDFS-Build/15144/console Powered by Apache Yetus 0.2.0 http://yetus.apache.org This message was automatically generated.
          Hide
          jingzhao Jing Zhao added a comment -

          Sure, I will review the patch today/tomorrow.

          Show
          jingzhao Jing Zhao added a comment - Sure, I will review the patch today/tomorrow.
          Hide
          jingzhao Jing Zhao added a comment -

          Thanks again for working on this, Yongjun. Some comments:

          1. Even without HDFS-10263, looks like we can still achieve the goal by computing the diff with reversed snapshot names. I guess we only need to skip DistCpSync#prepareDiffList here. Can you confirm? Or do you see any other issue?
          2. When using snapshot diff for distcp, our basic assumption is "there is no change between snapshot from and the current status in target
            file system." This is our best effort to avoid applying the diff on a wrong state of the target. Now with the patch this check is disabled for rdiff. However, in the rdiff scenario, the current status of target is always mapped to a snapshot in both source and target, thus looks to me we do not need to make any changes in command options.
          3. Because of HDFS-10263, we only need to understand which snapshot is earlier, the from or to. This information can be retrieved from the modification time of the two snapshots (i.e., to call getListing} against {{path-of-snapshottable-dir/.snapshot}).
          Show
          jingzhao Jing Zhao added a comment - Thanks again for working on this, Yongjun. Some comments: Even without HDFS-10263 , looks like we can still achieve the goal by computing the diff with reversed snapshot names. I guess we only need to skip DistCpSync#prepareDiffList here. Can you confirm? Or do you see any other issue? When using snapshot diff for distcp, our basic assumption is "there is no change between snapshot from and the current status in target file system." This is our best effort to avoid applying the diff on a wrong state of the target. Now with the patch this check is disabled for rdiff. However, in the rdiff scenario, the current status of target is always mapped to a snapshot in both source and target, thus looks to me we do not need to make any changes in command options. Because of HDFS-10263 , we only need to understand which snapshot is earlier, the from or to . This information can be retrieved from the modification time of the two snapshots (i.e., to call getListing} against {{path-of-snapshottable-dir/.snapshot }).
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Jing Zhao,

          Thanks a lot for your review and comments!

          Here is my reply in the same order of your questions, hope they make sense to you:

          1. Without HDFS-10263 fix, internally I always use forward snapshot diff, and do transformation from there. Not sure if your first question implies you suggest we still use reversed diff that doesn't have HDFS-10263 fix, and translate the result to be symmetric as forward snapshot diff (same as what HDFS-10263 would have achieved). If so, because the result still need another (existing) transformation as we currently do, that would cause the complexity I referred to in HDFS-10263.
          2. We now use -diff "" <ss> at command line to do the same behavior as -rdiff <ss> as in last patch rev. Due to lack of HDFS-10263, I swapped the source and target internally (and added the useRdiff flag to indicate the swapping), and always use forward snapshot diff.
          3. Seems you mean we should allow user to pass snapshot names in any order, either -diff s1 s2 or -diff s2 s1, and let the program to order s1 s2? What I was thinking was, we need to use the order user passed to indicate whether we are doing forward diff (HDFS-8828) or reverse diff (HDFS-9820). Thus -diff s1 s2 and -diff s2 s1 means different thing to me. I may have misunderstood you though.

          In addition, after HDFS-10263 is in place, we can make the implementation more symmetric (HDFS-8828 vs HDFS-9820).

          Thanks much.

          Show
          yzhangal Yongjun Zhang added a comment - Hi Jing Zhao , Thanks a lot for your review and comments! Here is my reply in the same order of your questions, hope they make sense to you: Without HDFS-10263 fix, internally I always use forward snapshot diff, and do transformation from there. Not sure if your first question implies you suggest we still use reversed diff that doesn't have HDFS-10263 fix, and translate the result to be symmetric as forward snapshot diff (same as what HDFS-10263 would have achieved). If so, because the result still need another (existing) transformation as we currently do, that would cause the complexity I referred to in HDFS-10263 . We now use -diff "" <ss> at command line to do the same behavior as -rdiff <ss> as in last patch rev. Due to lack of HDFS-10263 , I swapped the source and target internally (and added the useRdiff flag to indicate the swapping), and always use forward snapshot diff. Seems you mean we should allow user to pass snapshot names in any order, either -diff s1 s2 or -diff s2 s1 , and let the program to order s1 s2? What I was thinking was, we need to use the order user passed to indicate whether we are doing forward diff ( HDFS-8828 ) or reverse diff ( HDFS-9820 ). Thus -diff s1 s2 and -diff s2 s1 means different thing to me. I may have misunderstood you though. In addition, after HDFS-10263 is in place, we can make the implementation more symmetric ( HDFS-8828 vs HDFS-9820 ). Thanks much.
          Hide
          jingzhao Jing Zhao added a comment -

          Let's say we first have snapshot s1 both both source and target (and the source and the target have been synced). Then we make some changes on the source, do a forward incremental distcp copy to apply the changes to the target. Based on our assumption, before the next incremental copy, we will create a snapshot s2 on both the source and the target.

          Let's say at this time you want to restore the target back to s1. Theoretically we only need to do "distcp -diff s2 s1", where s2 is the from snapshot and s1 is the to snapshot. Note there is no diff between the current states and s2 on the target. Only after verifying this can we continue the incremental dictcp. Because of the lack of HDFS-10263, we need to make slight changes when applying the reversed diff, i.e., to bypass DistCpSync#prepareDiffList. This requires the distcp tool to understand s2 is actually after s1, and we can call getListing against path-of-snapshottable-dir/.snapshot to achieve this.

          We should allow user to pass in two snapshots in any order. The only restriction here is the from snapshot must also exist in the target cluster and there is no difference between this snapshot and the current status in the target cluster.

          Show
          jingzhao Jing Zhao added a comment - Let's say we first have snapshot s1 both both source and target (and the source and the target have been synced). Then we make some changes on the source, do a forward incremental distcp copy to apply the changes to the target. Based on our assumption, before the next incremental copy, we will create a snapshot s2 on both the source and the target. Let's say at this time you want to restore the target back to s1. Theoretically we only need to do "distcp -diff s2 s1", where s2 is the from snapshot and s1 is the to snapshot. Note there is no diff between the current states and s2 on the target. Only after verifying this can we continue the incremental dictcp. Because of the lack of HDFS-10263 , we need to make slight changes when applying the reversed diff, i.e., to bypass DistCpSync#prepareDiffList . This requires the distcp tool to understand s2 is actually after s1, and we can call getListing against path-of-snapshottable-dir/.snapshot to achieve this. We should allow user to pass in two snapshots in any order. The only restriction here is the from snapshot must also exist in the target cluster and there is no difference between this snapshot and the current status in the target cluster.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Thanks a lot Jing Zhao!

          My thoughts to share:

          1.

          Let's say we first have snapshot s1 both both source and target (and the source and the target have been synced). Then we make some changes on the source, do a forward incremental distcp copy to apply the changes to the target. Based on our assumption, before the next incremental copy, we will create a snapshot s2 on both the source and the target.

          This is HDFS-7535/HDFS-8828. One small correction: before we do the incremental copy, we create a snapshot s2 on source cluster first, find snapshot diff between s1 and s2, and apply this diff to target cluster, then finally create s2 on target cluster. We assume that no changes have been made at target cluster after s1 was created before we do incremental copy in this case (assumption I).

          2. Do you mean if "" ever appear as one parameter of -diff, then it's a revert operation, otherwise it's forward operation?

          In theory, we could copy incremental changes from source cluster to destination cluster without creating a new snapshot (s2 in our example). Say, after s1 is made in source cluster, and s1 is sync-ed to target cluster, and s1 is also created in target cluster, we could interpret

          distcp -diff s1 "" source target.

          as to incrementally copy changes made after s1 in source cluster to target, right? Because "" is just an alias of current state "snapshot",

          I personally feel it's more intuitive to count on the parameter order, and let (-diff s1 s2 mean the forward change from s1 to s2, -diff s2 s1 mean the revert change from s2 to s1. Say, assume a cluster is already at state s2, and we do -diff s1 s2, it would be a non-op; If we do -diff s2 s1, it means to go back to s1. In other words, -diff <fromState> <toState> is what I feel more intuitive.

          But if this is what you prefer, we can relax the order requirement, and let "" means revert operation. Would you please confirm?

          And would you please let me know whether my comment #1 in my previous reply makes sense to you?

          3. Not quite follow what you meant by "bypass DistCpSync#prepareDiffList. ". Some more details would help.

          Many thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Thanks a lot Jing Zhao ! My thoughts to share: 1. Let's say we first have snapshot s1 both both source and target (and the source and the target have been synced). Then we make some changes on the source, do a forward incremental distcp copy to apply the changes to the target. Based on our assumption, before the next incremental copy, we will create a snapshot s2 on both the source and the target. This is HDFS-7535 / HDFS-8828 . One small correction: before we do the incremental copy, we create a snapshot s2 on source cluster first, find snapshot diff between s1 and s2, and apply this diff to target cluster, then finally create s2 on target cluster. We assume that no changes have been made at target cluster after s1 was created before we do incremental copy in this case ( assumption I ). 2. Do you mean if "" ever appear as one parameter of -diff , then it's a revert operation, otherwise it's forward operation? In theory, we could copy incremental changes from source cluster to destination cluster without creating a new snapshot (s2 in our example). Say, after s1 is made in source cluster, and s1 is sync-ed to target cluster, and s1 is also created in target cluster, we could interpret distcp -diff s1 "" source target . as to incrementally copy changes made after s1 in source cluster to target, right? Because "" is just an alias of current state "snapshot", I personally feel it's more intuitive to count on the parameter order, and let ( -diff s1 s2 mean the forward change from s1 to s2, -diff s2 s1 mean the revert change from s2 to s1. Say, assume a cluster is already at state s2, and we do -diff s1 s2 , it would be a non-op; If we do -diff s2 s1 , it means to go back to s1. In other words, -diff <fromState> <toState> is what I feel more intuitive. But if this is what you prefer, we can relax the order requirement, and let "" means revert operation. Would you please confirm? And would you please let me know whether my comment #1 in my previous reply makes sense to you? 3. Not quite follow what you meant by "bypass DistCpSync#prepareDiffList. ". Some more details would help. Many thanks.
          Hide
          jingzhao Jing Zhao added a comment -

          One small correction: before we do the incremental copy, we create a snapshot s2 on source cluster first

          No. This is incorrect. We allow distcp -diff s1 .. "s2" can be done after the copy. See TestDistCpSync#testSyncWithCurrent as an example.

          We assume that no changes have been made at target cluster after s1 was created before we do incremental copy in this case (assumption I)

          This assumption must be verified before the new distcp. Currently we do a snapshot diff report on target (between from and ".") to check. This check cannot be dropped as in your current patch.

          Do you mean if "" ever appear as one parameter of -diff

          I mean "" or "." should never be used as the fromState in distcp -diff command, otherwise we have no way to verify there is no change happening on target. So we actually should use "s2" here.

          Because "" is just an alias of current state "snapshot"

          This is also wrong. In command line "." is the alias of the current state.

          -diff <fromState> <toState> is what I feel more intuitive

          But if this is what you prefer, we can relax the order requirement, and let "" means revert operation. Would you please confirm?

          What I mean is: we should always use "-diff <fromState> <toState>", but instead of using ".", we should use "s2". No change is necessary on DistCpOptions.

          bypass DistCpSync#prepareDiffList

          For any modification/creation happening under a renamed directory, the diff report always uses the paths before the rename (as reported by HDFS-10263). prepareDiffList changes these paths to new paths after the rename, but when applying the reverse diff, we do not need to do this.

          Show
          jingzhao Jing Zhao added a comment - One small correction: before we do the incremental copy, we create a snapshot s2 on source cluster first No. This is incorrect. We allow distcp -diff s1 . . "s2" can be done after the copy. See TestDistCpSync#testSyncWithCurrent as an example. We assume that no changes have been made at target cluster after s1 was created before we do incremental copy in this case (assumption I) This assumption must be verified before the new distcp. Currently we do a snapshot diff report on target (between from and ".") to check. This check cannot be dropped as in your current patch. Do you mean if "" ever appear as one parameter of -diff I mean "" or "." should never be used as the fromState in distcp -diff command, otherwise we have no way to verify there is no change happening on target. So we actually should use "s2" here. Because "" is just an alias of current state "snapshot" This is also wrong. In command line "." is the alias of the current state. -diff <fromState> <toState> is what I feel more intuitive But if this is what you prefer, we can relax the order requirement, and let "" means revert operation. Would you please confirm? What I mean is: we should always use "-diff <fromState> <toState>", but instead of using ".", we should use "s2". No change is necessary on DistCpOptions . bypass DistCpSync#prepareDiffList For any modification/creation happening under a renamed directory, the diff report always uses the paths before the rename (as reported by HDFS-10263 ). prepareDiffList changes these paths to new paths after the rename, but when applying the reverse diff, we do not need to do this.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Many thanks Jing Zhao. Good discussion!

          1.

          No. This is incorrect. We allow distcp -diff s1 .. "s2" can be done after the copy. See TestDistCpSync#testSyncWithCurrent as an example

          if some changes is made while we were running distcp or after, but before s2 is created, then the stuff copied is not exact the content of s2. Right?

          2.

          This assumption must be verified before the new distcp. Currently we do a snapshot diff report on target (between from and ".") to check. This check cannot be dropped as in your current patch.

          I certainly agree that we should do the checking. I emphasized the assumption I and II in my last comment. However, since the checking can only be done in the beginning of distcp, if some changes are made before s2 is created, they will be missed in the checking. So I think we need to document that no change should be made when we do this operation.

          3.

          I mean "" or "." should never be used as the fromState in distcp -diff command, otherwise we have no way to verify there is no change happening on target. So we actually should use "s2" here.

          Then in the case HDFS-9820 tries to solve, are you suggesting to create a snapshot s2 first (for the sake of doing a check), before reverting it back to s1? The issue described in #2 above also applies.

          4.

          This is also wrong. In command line "." is the alias of the current state.

          I saw distcp was using "", maybe we should change to stick to using ".".

          5.

          For any modification/creation happening under a renamed directory, the diff report always uses the paths before the rename (as reported by HDFS-10263). prepareDiffList changes these paths to new paths after the rename, but when applying the reverse diff, we do not need to do this.

          Renaming x in s1 to y in s2 means that x is the original name before the rename, as reported in snapShotDiff(s1, s2), where s1 is fromSS, s2 is toSS;
          When we look at the reversion, the rename operation become renaming y in s2 to x in s1, so y should be the original name before the rename.
          as I expect to see from the reports of in snapshotDiff(s2, s1), where s2 is fromSS, s1 is toSS.

          However, snapshotDiff(s2, s1) still uses the names in s1 as the original name (x in this case, I really expect it to be y), though It does change the order operands, comparing with snapshotDiff(s1,s2), This is the issue I reported in HDFS-10263. You can see some example there.

          Basically I expect snapshotDiff(fromSS, toSS) to use names in fromSS. In the reversion case, it's the "." state. This is the symmetry I was referring to.

          Does this explanation make sense?

          Thanks again!

          Show
          yzhangal Yongjun Zhang added a comment - Many thanks Jing Zhao . Good discussion! 1. No. This is incorrect. We allow distcp -diff s1 .. "s2" can be done after the copy. See TestDistCpSync#testSyncWithCurrent as an example if some changes is made while we were running distcp or after, but before s2 is created, then the stuff copied is not exact the content of s2. Right? 2. This assumption must be verified before the new distcp. Currently we do a snapshot diff report on target (between from and ".") to check. This check cannot be dropped as in your current patch. I certainly agree that we should do the checking. I emphasized the assumption I and II in my last comment. However, since the checking can only be done in the beginning of distcp, if some changes are made before s2 is created, they will be missed in the checking. So I think we need to document that no change should be made when we do this operation. 3. I mean "" or "." should never be used as the fromState in distcp -diff command, otherwise we have no way to verify there is no change happening on target. So we actually should use "s2" here. Then in the case HDFS-9820 tries to solve, are you suggesting to create a snapshot s2 first (for the sake of doing a check), before reverting it back to s1? The issue described in #2 above also applies. 4. This is also wrong. In command line "." is the alias of the current state. I saw distcp was using "" , maybe we should change to stick to using "." . 5. For any modification/creation happening under a renamed directory, the diff report always uses the paths before the rename (as reported by HDFS-10263 ). prepareDiffList changes these paths to new paths after the rename, but when applying the reverse diff, we do not need to do this. Renaming x in s1 to y in s2 means that x is the original name before the rename, as reported in snapShotDiff(s1, s2), where s1 is fromSS, s2 is toSS; When we look at the reversion, the rename operation become renaming y in s2 to x in s1, so y should be the original name before the rename. as I expect to see from the reports of in snapshotDiff(s2, s1), where s2 is fromSS, s1 is toSS. However, snapshotDiff(s2, s1) still uses the names in s1 as the original name (x in this case, I really expect it to be y), though It does change the order operands, comparing with snapshotDiff(s1,s2), This is the issue I reported in HDFS-10263 . You can see some example there. Basically I expect snapshotDiff(fromSS, toSS) to use names in fromSS. In the reversion case, it's the "." state. This is the symmetry I was referring to. Does this explanation make sense? Thanks again!
          Hide
          jingzhao Jing Zhao added a comment -

          Yongjun Zhang, I have a very good understanding about HDFS-10263.... But I'm not sure if you understand my point about why this issue can by solved in a much easier way... Please let me know if you want an offline discussion.

          Show
          jingzhao Jing Zhao added a comment - Yongjun Zhang , I have a very good understanding about HDFS-10263 .... But I'm not sure if you understand my point about why this issue can by solved in a much easier way... Please let me know if you want an offline discussion.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Jing Zhao,

          Thanks for proposing offline discussion, I was thinking about the same Just shared contact info.

          Because of the similarity to HDFS-7535/HDFS-8828, the change indeed can be small (I have tried). In latest patch, some changes tries to address the in-symmetric output (HDFS-10263) by always going with forward snapshot diff; some other changes are intended to reorg the code for better readability.

          For completeness' sake, if you could comment back to the comments I made in my prior update, it would be appreciated.

          Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Hi Jing Zhao , Thanks for proposing offline discussion, I was thinking about the same Just shared contact info. Because of the similarity to HDFS-7535 / HDFS-8828 , the change indeed can be small (I have tried). In latest patch, some changes tries to address the in-symmetric output ( HDFS-10263 ) by always going with forward snapshot diff; some other changes are intended to reorg the code for better readability. For completeness' sake, if you could comment back to the comments I made in my prior update, it would be appreciated. Thanks.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Many thanks to Jing Zhao for the offline discussion. I created HDFS-10313 and HDFS-10314 as a result.

          Show
          yzhangal Yongjun Zhang added a comment - Many thanks to Jing Zhao for the offline discussion. I created HDFS-10313 and HDFS-10314 as a result.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Per discussion in HDFS-10314, we are going to continue to work on HDFS-9820. Uploaded rev005.

          Show
          yzhangal Yongjun Zhang added a comment - Per discussion in HDFS-10314 , we are going to continue to work on HDFS-9820 . Uploaded rev005.
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 14s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 5 new or modified test files.
          +1 mvninstall 7m 5s trunk passed
          +1 compile 0m 16s trunk passed
          +1 checkstyle 0m 16s trunk passed
          +1 mvnsite 0m 20s trunk passed
          +1 mvneclipse 0m 17s trunk passed
          +1 findbugs 0m 24s trunk passed
          +1 javadoc 0m 12s trunk passed
          +1 mvninstall 0m 15s the patch passed
          +1 compile 0m 14s the patch passed
          +1 javac 0m 14s the patch passed
          -0 checkstyle 0m 12s hadoop-tools/hadoop-distcp: The patch generated 44 new + 172 unchanged - 12 fixed = 216 total (was 184)
          +1 mvnsite 0m 17s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 0m 27s the patch passed
          +1 javadoc 0m 10s the patch passed
          +1 unit 10m 55s hadoop-distcp in the patch passed.
          +1 asflicense 0m 15s The patch does not generate ASF License warnings.
          22m 39s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue HDFS-9820
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12831237/HDFS-9820.005.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 113c4e3bd32b 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / fe9ebe2
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16969/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-distcp.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16969/testReport/
          modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16969/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 14s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 5 new or modified test files. +1 mvninstall 7m 5s trunk passed +1 compile 0m 16s trunk passed +1 checkstyle 0m 16s trunk passed +1 mvnsite 0m 20s trunk passed +1 mvneclipse 0m 17s trunk passed +1 findbugs 0m 24s trunk passed +1 javadoc 0m 12s trunk passed +1 mvninstall 0m 15s the patch passed +1 compile 0m 14s the patch passed +1 javac 0m 14s the patch passed -0 checkstyle 0m 12s hadoop-tools/hadoop-distcp: The patch generated 44 new + 172 unchanged - 12 fixed = 216 total (was 184) +1 mvnsite 0m 17s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 27s the patch passed +1 javadoc 0m 10s the patch passed +1 unit 10m 55s hadoop-distcp in the patch passed. +1 asflicense 0m 15s The patch does not generate ASF License warnings. 22m 39s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HDFS-9820 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12831237/HDFS-9820.005.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 113c4e3bd32b 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / fe9ebe2 Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/16969/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-distcp.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/16969/testReport/ modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16969/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Copied from
          https://issues.apache.org/jira/browse/HDFS-10314?focusedCommentId=15248778&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15248778

          The idea is the wrap around distcp as a tool to achieve the functionality of distcp's switch -rdiff (if we will do the same for -diff, it will be a different jira). Here is a description and comparison of the -diff and unimplemented -rdiff switches.

          Definition: Assuming we have two snapshots, s1 and s2, where s1 is created earlier, and s2 is newer.
          
          - SnapshotDiff(s1, s2): represents the delta between s1 and s2; That is, if we apply 
            snapshotDiff(s1, s2)  on top of s1, we can go to the state of s2.
          - SnapshotDiff(s2, s1) represents the reversed delta between s1 and s2. That is, if
            we apply SnapshotDiff(s2, s1) on top of s2, we can go back to the state of s1.
          
          Note: When we talk about source and target, we mean distcp source and distcp target.
          
          A. -diff allows distcp to efficiently copy incremental changes made (on top of previously copied
              snapshot s1) in source cluster to target cluster   Assuming snapshot s2 is created at the source to
              capture s1 + incremental changes, snapshotDiff(s1,s2) is the incremental changes, the output of this
              operation is that the target will be at s2 sate. this operation involves three steps:
          
            A.1 calculate snapshotDiff(s1, s2) at the source
            A.2 apply the rename and delete portion of the snapshotDiff at the target. this step is called "sync"
            A.3 copy created/modified files from source's s2 to target 
          
          B. -rdiff allows distcp to efficiently copy data from snapshot s1 to overwrite changes made in target
              after snapshot s1 was created in target. Assuming snapshot s2 is created at the target to capture
              the changes that need to be overwritten, snapshotDiff(s2, s1) is what we want to apply to target. 
              The output of this operation is that the target is at s1 state. Similar to -diff, but with some 
              differences, this operation involves three steps too:
          
            B.1 calculate snapshotDiff(s2, s1) at the target,
            B.2 apply the rename and delete portion of the snapshot diff at the target. this step is called "sync"
            B.3 copy created/modified files from source's s1 to target. (the source here can be a different
                  cluster, or the target itself. When it's a different cluster, the cluster has to have snapshot s1 
                  that's has exact same name and content as the s1 at the target)
          
          A tablularized comparison:
          
                            required snapshots      DiffCalc       Output After Operation
                            --------------------------
                            source        target        
                            ------------------------------------------
          -diff             s1, s2   ->  s1             source         target is at s2
          -rdiff            s1       ->   s1,s2        target          target is at  s1  
          
          (note, for -rdiff, the source could be the same as target)
          
          So the "r" (reversed) in the -rdiff means the following and is very symmetric to -diff:
          
          - swap the snapshot requirement of source and target in -diff 
            (from "s1, s2   ->   s1 "  to  "s1  ->   s1,s2")
          - swap the result snapshot after operation (from s2 to s1)
          - swap the snapshot diff calculation place  (from source to target)
          
          We require source and target to have same snapshot s1 (same snapshot name, same content).
          
          Show
          yzhangal Yongjun Zhang added a comment - Copied from https://issues.apache.org/jira/browse/HDFS-10314?focusedCommentId=15248778&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15248778 The idea is the wrap around distcp as a tool to achieve the functionality of distcp's switch -rdiff (if we will do the same for -diff, it will be a different jira). Here is a description and comparison of the -diff and unimplemented -rdiff switches. Definition: Assuming we have two snapshots, s1 and s2, where s1 is created earlier, and s2 is newer. - SnapshotDiff(s1, s2): represents the delta between s1 and s2; That is, if we apply snapshotDiff(s1, s2) on top of s1, we can go to the state of s2. - SnapshotDiff(s2, s1) represents the reversed delta between s1 and s2. That is, if we apply SnapshotDiff(s2, s1) on top of s2, we can go back to the state of s1. Note: When we talk about source and target, we mean distcp source and distcp target. A. -diff allows distcp to efficiently copy incremental changes made (on top of previously copied snapshot s1) in source cluster to target cluster Assuming snapshot s2 is created at the source to capture s1 + incremental changes, snapshotDiff(s1,s2) is the incremental changes, the output of this operation is that the target will be at s2 sate. this operation involves three steps: A.1 calculate snapshotDiff(s1, s2) at the source A.2 apply the rename and delete portion of the snapshotDiff at the target. this step is called "sync" A.3 copy created/modified files from source's s2 to target B. -rdiff allows distcp to efficiently copy data from snapshot s1 to overwrite changes made in target after snapshot s1 was created in target. Assuming snapshot s2 is created at the target to capture the changes that need to be overwritten, snapshotDiff(s2, s1) is what we want to apply to target. The output of this operation is that the target is at s1 state. Similar to -diff, but with some differences, this operation involves three steps too: B.1 calculate snapshotDiff(s2, s1) at the target, B.2 apply the rename and delete portion of the snapshot diff at the target. this step is called "sync" B.3 copy created/modified files from source's s1 to target. (the source here can be a different cluster, or the target itself. When it's a different cluster, the cluster has to have snapshot s1 that's has exact same name and content as the s1 at the target) A tablularized comparison: required snapshots DiffCalc Output After Operation -------------------------- source target ------------------------------------------ -diff s1, s2 -> s1 source target is at s2 -rdiff s1 -> s1,s2 target target is at s1 (note, for -rdiff, the source could be the same as target) So the "r" (reversed) in the -rdiff means the following and is very symmetric to -diff: - swap the snapshot requirement of source and target in -diff (from "s1, s2 -> s1 " to "s1 -> s1,s2" ) - swap the result snapshot after operation (from s2 to s1) - swap the snapshot diff calculation place (from source to target) We require source and target to have same snapshot s1 (same snapshot name, same content).
          Hide
          yzhangal Yongjun Zhang added a comment -

          Copied from
          https://issues.apache.org/jira/browse/HDFS-10314?focusedCommentId=15510391&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15510391

          For clarity, and as a recap, here is a comparison table between -diff and the proposed -rdiff, which shows the symmetricity:

          Comparison -diff s1 s2 <src> <tgt> -rdiff s2 s1 <src> <tgt>
          Current feature state Existing in distcp Proposed Addition
          Functionality Given <tgt>'s current state is s1, make <tgt>'s current state the same as newer snapshot s2 Given <tgt>'s current state is s2, make <tgt>'s current state the same as older snapshot s1
          Requirements
          1. <src> and <tgt> need to be different paths
          2. both <src> and <tgt> have snapshot s1 with exact same content
          3. <src> has snapshot s2
          4. s2 is newer than s1
          5. <tgt>'s current state is the same as s1
          6. <tgt> doesn't have snapshot s2
          1. <src> and <tgt> can be the same or different paths
          2. both <src> and <tgt> have snapshot s1 with exact same content
          3. <tgt> has snapshot s2
          4. s2 is newer than s1
          5. <tgt>'s current state is the same as s2
          6. <src> may or may not have snapshot s2
          Steps
          1. calculate snapshotDiff<s1,s2> at <src>
          2. apply rename/delete part of snapshotDiff on <tgt>
          3. copy modified part of snapshotDiff from s2 of <src> to <tgt>
          1. calculate snapshotDiff<s2,s1> at <tgt>
          2. apply rename/delete part of snapshotDiff on <tgt>
          3. copy modified part of snapshotDiff from s1 of <src> to <tgt>
          Show
          yzhangal Yongjun Zhang added a comment - Copied from https://issues.apache.org/jira/browse/HDFS-10314?focusedCommentId=15510391&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15510391 For clarity, and as a recap, here is a comparison table between -diff and the proposed -rdiff, which shows the symmetricity: Comparison -diff s1 s2 <src> <tgt> -rdiff s2 s1 <src> <tgt> Current feature state Existing in distcp Proposed Addition Functionality Given <tgt>'s current state is s1, make <tgt>'s current state the same as newer snapshot s2 Given <tgt>'s current state is s2, make <tgt>'s current state the same as older snapshot s1 Requirements <src> and <tgt> need to be different paths both <src> and <tgt> have snapshot s1 with exact same content <src> has snapshot s2 s2 is newer than s1 <tgt>'s current state is the same as s1 <tgt> doesn't have snapshot s2 <src> and <tgt> can be the same or different paths both <src> and <tgt> have snapshot s1 with exact same content <tgt> has snapshot s2 s2 is newer than s1 <tgt>'s current state is the same as s2 <src> may or may not have snapshot s2 Steps calculate snapshotDiff<s1,s2> at <src> apply rename/delete part of snapshotDiff on <tgt> copy modified part of snapshotDiff from s2 of <src> to <tgt> calculate snapshotDiff<s2,s1> at <tgt> apply rename/delete part of snapshotDiff on <tgt> copy modified part of snapshotDiff from s1 of <src> to <tgt>
          Hide
          andrew.wang Andrew Wang added a comment -

          Hi Yongjun, thanks for sticking with this one. I had a few meta comments, and some code comments:

          Meta comments:

          • I see a TODO about HDFS-10263. How much would it simplify the code here if we implemented that first? There's a lot of logic about source vs. target due to how we flip it for the rdiff, and overall not much code sharing with the existing snapshot diff mechanism.
          • Is there a situation where we'd want to pass two different clusters for "src" and "tgt" to rdiff? They need to both have identical "s1" base states anyway.

          Code comments:

          • Comment in DistCpOptions says that forward diff is referred to as "Fdiff" but it's still just "diff" in multiple places. Maybe change to say "referred to as diff or Fdiff"?
          • Unused Preconditions import in DistCpOptions

          DistCpSync:

          • Extra "//" line on diffMap. I'd also prefer if we made this a javadoc block comment (/** */).
          • preSyncCheck could use some explanatory comments about how we flip the diff for rdiff. I don't know what the "c" in "cfs" stands for also, so could use a comment as well. Also, is there a way to additionally dedupe the rdiff/diff checking on the modtime? If not, cfs/cdir aren't saving us much and might as well just put them in the corresponding part of the if/else block.
          • We lost this comment in sync:
                // TODO: since we have tmp directory, we can support "undo" with failures
                // set the source path using the snapshot path
          
          • getRenameAndDeleteDiffsRdiff, I think we mean "reversal" rather than "reversion", and "Reversed" rather than "Reverted" in renameDiffsListReverted
          • TestDistCpSync: what happened to testFallback? New syncAndFail also isn't used.

          I didn't go through all the new tests, will get to that later.

          Show
          andrew.wang Andrew Wang added a comment - Hi Yongjun, thanks for sticking with this one. I had a few meta comments, and some code comments: Meta comments: I see a TODO about HDFS-10263 . How much would it simplify the code here if we implemented that first? There's a lot of logic about source vs. target due to how we flip it for the rdiff, and overall not much code sharing with the existing snapshot diff mechanism. Is there a situation where we'd want to pass two different clusters for "src" and "tgt" to rdiff? They need to both have identical "s1" base states anyway. Code comments: Comment in DistCpOptions says that forward diff is referred to as "Fdiff" but it's still just "diff" in multiple places. Maybe change to say "referred to as diff or Fdiff"? Unused Preconditions import in DistCpOptions DistCpSync: Extra "//" line on diffMap. I'd also prefer if we made this a javadoc block comment ( /** */ ). preSyncCheck could use some explanatory comments about how we flip the diff for rdiff. I don't know what the "c" in "cfs" stands for also, so could use a comment as well. Also, is there a way to additionally dedupe the rdiff/diff checking on the modtime? If not, cfs/cdir aren't saving us much and might as well just put them in the corresponding part of the if/else block. We lost this comment in sync: // TODO: since we have tmp directory, we can support "undo" with failures // set the source path using the snapshot path getRenameAndDeleteDiffsRdiff, I think we mean "reversal" rather than "reversion", and "Reversed" rather than "Reverted" in renameDiffsListReverted TestDistCpSync: what happened to testFallback? New syncAndFail also isn't used. I didn't go through all the new tests, will get to that later.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Thanks a lot for your review and comments Andrew Wang!

          Good comments!

          Indeed, the difference between -diff and -rdiff would be very minimum if the snapshot diff were symmetric. For example, getRenameAndDeleteDiffsRdiff would look the same as getRenameAndDeleteDiffsFdiff when HDFS-10263 is fixed.

          One main sharing between -diff and -rdiff done in the patch is the infrastructure that handle snapshot diff, that is, where to do the following steps and to handle the difference between forward and backward snapshot diff report accordingly:

          • calculate snapshot diff
          • find out operations to do sync
          • do sync (rename/delete)
          • create copy listing based on changed items

          Will address your other comments soon.

          Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - Thanks a lot for your review and comments Andrew Wang ! Good comments! Indeed, the difference between -diff and -rdiff would be very minimum if the snapshot diff were symmetric. For example, getRenameAndDeleteDiffsRdiff would look the same as getRenameAndDeleteDiffsFdiff when HDFS-10263 is fixed. One main sharing between -diff and -rdiff done in the patch is the infrastructure that handle snapshot diff, that is, where to do the following steps and to handle the difference between forward and backward snapshot diff report accordingly: calculate snapshot diff find out operations to do sync do sync (rename/delete) create copy listing based on changed items Will address your other comments soon. Thanks.
          Hide
          yzhangal Yongjun Zhang added a comment -

          HI Andrew Wang,

          Many thanks again for the review. I uploaded rev 006 hopefully with all of your comments addressed.

          I added back the fallback feature (if sync fails, go back to the default distcp) of -diff (so not to change -diff's behavior), and added the fallback to -rdiff (so to be consistent with -diff). So this patch once committed, we can safely backport to 2.x if we like.

          I will create a jira for 3.0 to drop the fallback feature.

          Would you please help taking a look?

          Thanks much.

          Show
          yzhangal Yongjun Zhang added a comment - HI Andrew Wang , Many thanks again for the review. I uploaded rev 006 hopefully with all of your comments addressed. I added back the fallback feature (if sync fails, go back to the default distcp) of -diff (so not to change -diff's behavior), and added the fallback to -rdiff (so to be consistent with -diff). So this patch once committed, we can safely backport to 2.x if we like. I will create a jira for 3.0 to drop the fallback feature. Would you please help taking a look? Thanks much.
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 16s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 5 new or modified test files.
          +1 mvninstall 7m 6s trunk passed
          +1 compile 0m 17s trunk passed
          +1 checkstyle 0m 14s trunk passed
          +1 mvnsite 0m 20s trunk passed
          +1 mvneclipse 0m 13s trunk passed
          +1 findbugs 0m 23s trunk passed
          +1 javadoc 0m 13s trunk passed
          +1 mvninstall 0m 16s the patch passed
          +1 compile 0m 14s the patch passed
          +1 javac 0m 14s the patch passed
          -0 checkstyle 0m 12s hadoop-tools/hadoop-distcp: The patch generated 26 new + 172 unchanged - 12 fixed = 198 total (was 184)
          +1 mvnsite 0m 18s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 0m 30s the patch passed
          +1 javadoc 0m 11s the patch passed
          +1 unit 11m 12s hadoop-distcp in the patch passed.
          +1 asflicense 0m 15s The patch does not generate ASF License warnings.
          22m 58s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue HDFS-9820
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12831905/HDFS-9820.006.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 281742fcd5b1 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 272a217
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/17042/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-distcp.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17042/testReport/
          modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17042/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 16s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 5 new or modified test files. +1 mvninstall 7m 6s trunk passed +1 compile 0m 17s trunk passed +1 checkstyle 0m 14s trunk passed +1 mvnsite 0m 20s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 0m 23s trunk passed +1 javadoc 0m 13s trunk passed +1 mvninstall 0m 16s the patch passed +1 compile 0m 14s the patch passed +1 javac 0m 14s the patch passed -0 checkstyle 0m 12s hadoop-tools/hadoop-distcp: The patch generated 26 new + 172 unchanged - 12 fixed = 198 total (was 184) +1 mvnsite 0m 18s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 30s the patch passed +1 javadoc 0m 11s the patch passed +1 unit 11m 12s hadoop-distcp in the patch passed. +1 asflicense 0m 15s The patch does not generate ASF License warnings. 22m 58s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HDFS-9820 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12831905/HDFS-9820.006.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 281742fcd5b1 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 272a217 Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/17042/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-distcp.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17042/testReport/ modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17042/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          yzhangal Yongjun Zhang added a comment - - edited

          Hi Andrew Wang,

          As I verbally addressed your question

          Is there a situation where we'd want to pass two different clusters for "src" and "tgt" to rdiff? They need to both have identical "s1" base states anyway.

          The reason is that a user reported copying from a different cluster is faster, so I hope we can support the flexibility to copy from either the same cluster's snapshot or a different cluster's snapshot. It's common for user to have a mirroring cluster, one is production, one is a backup.

          I will address most of the checkstyle issue in next rev, but one thing I followed the existing test is to use "_" in file name variables to indicate the hierarchy, which seems easier to read, so I'd like to keep that.

          Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - - edited Hi Andrew Wang , As I verbally addressed your question Is there a situation where we'd want to pass two different clusters for "src" and "tgt" to rdiff? They need to both have identical "s1" base states anyway. The reason is that a user reported copying from a different cluster is faster, so I hope we can support the flexibility to copy from either the same cluster's snapshot or a different cluster's snapshot. It's common for user to have a mirroring cluster, one is production, one is a backup. I will address most of the checkstyle issue in next rev, but one thing I followed the existing test is to use "_" in file name variables to indicate the hierarchy, which seems easier to read, so I'd like to keep that. Thanks.
          Hide
          andrew.wang Andrew Wang added a comment -

          Thanks for the replies Yongjun, sounds good. Took another look, some small comments:

          • I still see some references to "revert" and "reversion" in reference to the snapshot diff. I think "reverse" and "reversal" and so on are more accurate.
          • Need to update the DistCp documentation to describe the new option. Some of this information about specifying different vs. same clusters would be good to work in there also.
          • Should we add a test for restoring across multiple intermediate snapshots? e.g. s1, s2, s3, go from s3 back to s1?
          • Also I might have missed this, but is there a test for restoring from "." to a snapshot?
          Show
          andrew.wang Andrew Wang added a comment - Thanks for the replies Yongjun, sounds good. Took another look, some small comments: I still see some references to "revert" and "reversion" in reference to the snapshot diff. I think "reverse" and "reversal" and so on are more accurate. Need to update the DistCp documentation to describe the new option. Some of this information about specifying different vs. same clusters would be good to work in there also. Should we add a test for restoring across multiple intermediate snapshots? e.g. s1, s2, s3, go from s3 back to s1? Also I might have missed this, but is there a test for restoring from "." to a snapshot?
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Andrew Wang,

          Many thanks again for the review. I uploaded rev 007 which addressed all of your comments, except for the documentation.

          • replaced all revert with reverse
          • added testSync9() by adding a snapshot in between s1 and s2
          • modified testSyncWithCurrent() to test against "."

          About the documentation, can we do it in a follow-up jira after committing this one?

          I also would like to propose dropping the fallback feature and possibly the support of "." in hadoop 3, as follow-up, since they are going to be incompatible changes for 2.x.

          Thanks a lot.

          Show
          yzhangal Yongjun Zhang added a comment - Hi Andrew Wang , Many thanks again for the review. I uploaded rev 007 which addressed all of your comments, except for the documentation. replaced all revert with reverse added testSync9() by adding a snapshot in between s1 and s2 modified testSyncWithCurrent() to test against "." About the documentation, can we do it in a follow-up jira after committing this one? I also would like to propose dropping the fallback feature and possibly the support of "." in hadoop 3, as follow-up, since they are going to be incompatible changes for 2.x. Thanks a lot.
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 24s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 5 new or modified test files.
          +1 mvninstall 7m 37s trunk passed
          +1 compile 0m 16s trunk passed
          +1 checkstyle 0m 15s trunk passed
          +1 mvnsite 0m 20s trunk passed
          +1 mvneclipse 0m 18s trunk passed
          +1 findbugs 0m 24s trunk passed
          +1 javadoc 0m 15s trunk passed
          +1 mvninstall 0m 17s the patch passed
          +1 compile 0m 16s the patch passed
          +1 javac 0m 16s the patch passed
          -0 checkstyle 0m 13s hadoop-tools/hadoop-distcp: The patch generated 24 new + 174 unchanged - 12 fixed = 198 total (was 186)
          +1 mvnsite 0m 19s the patch passed
          +1 mvneclipse 0m 11s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 0m 29s the patch passed
          +1 javadoc 0m 11s the patch passed
          +1 unit 12m 3s hadoop-distcp in the patch passed.
          +1 asflicense 0m 18s The patch does not generate ASF License warnings.
          24m 49s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue HDFS-9820
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12833464/HDFS-9820.007.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 109024dc6c24 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 76cc84e
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/17170/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-distcp.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17170/testReport/
          modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17170/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 24s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 5 new or modified test files. +1 mvninstall 7m 37s trunk passed +1 compile 0m 16s trunk passed +1 checkstyle 0m 15s trunk passed +1 mvnsite 0m 20s trunk passed +1 mvneclipse 0m 18s trunk passed +1 findbugs 0m 24s trunk passed +1 javadoc 0m 15s trunk passed +1 mvninstall 0m 17s the patch passed +1 compile 0m 16s the patch passed +1 javac 0m 16s the patch passed -0 checkstyle 0m 13s hadoop-tools/hadoop-distcp: The patch generated 24 new + 174 unchanged - 12 fixed = 198 total (was 186) +1 mvnsite 0m 19s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 29s the patch passed +1 javadoc 0m 11s the patch passed +1 unit 12m 3s hadoop-distcp in the patch passed. +1 asflicense 0m 18s The patch does not generate ASF License warnings. 24m 49s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HDFS-9820 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12833464/HDFS-9820.007.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 109024dc6c24 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 76cc84e Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/17170/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-distcp.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17170/testReport/ modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17170/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          andrew.wang Andrew Wang added a comment -

          Great work Yongjun, I'm +1. New tests look good. I'm okay with doing the additional fixes as follow-on.

          Show
          andrew.wang Andrew Wang added a comment - Great work Yongjun, I'm +1. New tests look good. I'm okay with doing the additional fixes as follow-on.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Many thanks Andrew Wang!

          Show
          yzhangal Yongjun Zhang added a comment - Many thanks Andrew Wang !
          Hide
          yzhangal Yongjun Zhang added a comment -

          Done real cluster testing. Will be committing this by tomorrow.

          Show
          yzhangal Yongjun Zhang added a comment - Done real cluster testing. Will be committing this by tomorrow.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10624 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10624/)
          HDFS-9820. Improve distcp to support efficient restore to an earlier (yzhang: rev 412c4c9a342b73bf1c1a7f43ea91245cbf94d02d)

          • (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java
          • (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java
          • (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java
          • (add) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSyncReverseFromSource.java
          • (edit) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java
          • (add) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSyncReverseFromTarget.java
          • (edit) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java
          • (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
          • (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java
          • (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java
          • (add) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSyncReverseBase.java
          • (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java
          • (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10624 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10624/ ) HDFS-9820 . Improve distcp to support efficient restore to an earlier (yzhang: rev 412c4c9a342b73bf1c1a7f43ea91245cbf94d02d) (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java (add) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSyncReverseFromSource.java (edit) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java (add) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSyncReverseFromTarget.java (edit) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java (add) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSyncReverseBase.java (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java
          Hide
          andrew.wang Andrew Wang added a comment -

          Hi Yongjun, looks like this was committed to trunk but not the branch-2's. Is this intentional? Wondering if we should set fix versions and resolve this JIRA or not.

          Show
          andrew.wang Andrew Wang added a comment - Hi Yongjun, looks like this was committed to trunk but not the branch-2's. Is this intentional? Wondering if we should set fix versions and resolve this JIRA or not.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Andrew Wang,

          Thanks for asking, I'm actually in the middle of backporting to branch-2. Planned to update here when I'm done. There are quite some conflicts to resolve it turned out.

          Show
          yzhangal Yongjun Zhang added a comment - Hi Andrew Wang , Thanks for asking, I'm actually in the middle of backporting to branch-2. Planned to update here when I'm done. There are quite some conflicts to resolve it turned out.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Hi Andrew Wang,

          I had to cherry-pick HADOOP-13034 from trunk to branch-2, and resolve conflicts with HDFS-10556 (which was only applied to branch-2). Updated branch-2 patch here, would you please help take a look?

          Thanks a lot.

          Show
          yzhangal Yongjun Zhang added a comment - Hi Andrew Wang , I had to cherry-pick HADOOP-13034 from trunk to branch-2, and resolve conflicts with HDFS-10556 (which was only applied to branch-2). Updated branch-2 patch here, would you please help take a look? Thanks a lot.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          -1 docker 7m 14s Docker failed to build yetus/hadoop:b59b8b7.



          Subsystem Report/Notes
          JIRA Issue HDFS-9820
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12833844/HDFS-9820.branch-2.patch
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17193/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. -1 docker 7m 14s Docker failed to build yetus/hadoop:b59b8b7. Subsystem Report/Notes JIRA Issue HDFS-9820 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12833844/HDFS-9820.branch-2.patch Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17193/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Found an issue with the fallback logic, that is, even if sync() failed (return false), it still proceeds to create copyListing with snapshot diff and caused failure. Thus I reverted my earlier commit.

          I kind of forgot that the fallback logic was already disabled per an earlier jira, however, the fallback code was still in place, and caused some confusion. See

          https://issues.apache.org/jira/browse/HDFS-10313?focusedCommentId=15250582&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15250582

          The updated patch rev00 simply quits with error message when sync() fails, same as the behaviour before this patch.

          Would you please help taking a look Andrew Wang?

          I plan to update the branch-2 patch with the same change after your +1.

          Thanks a lot.

          Show
          yzhangal Yongjun Zhang added a comment - Found an issue with the fallback logic, that is, even if sync() failed (return false), it still proceeds to create copyListing with snapshot diff and caused failure. Thus I reverted my earlier commit. I kind of forgot that the fallback logic was already disabled per an earlier jira, however, the fallback code was still in place, and caused some confusion. See https://issues.apache.org/jira/browse/HDFS-10313?focusedCommentId=15250582&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15250582 The updated patch rev00 simply quits with error message when sync() fails, same as the behaviour before this patch. Would you please help taking a look Andrew Wang ? I plan to update the branch-2 patch with the same change after your +1. Thanks a lot.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10629 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10629/)
          Revert "HDFS-9820. Improve distcp to support efficient restore to an (yzhang: rev 0bc6d37f3c1e7c2a8682dffa95461a884bd6ba17)

          • (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
          • (delete) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSyncReverseFromTarget.java
          • (delete) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSyncReverseBase.java
          • (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java
          • (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java
          • (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java
          • (edit) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java
          • (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java
          • (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java
          • (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java
          • (edit) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java
          • (delete) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSyncReverseFromSource.java
          • (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10629 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10629/ ) Revert " HDFS-9820 . Improve distcp to support efficient restore to an (yzhang: rev 0bc6d37f3c1e7c2a8682dffa95461a884bd6ba17) (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java (delete) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSyncReverseFromTarget.java (delete) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSyncReverseBase.java (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java (edit) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java (edit) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java (delete) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSyncReverseFromSource.java (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 28s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 5 new or modified test files.
          +1 mvninstall 7m 53s trunk passed
          +1 compile 0m 20s trunk passed
          +1 checkstyle 0m 18s trunk passed
          +1 mvnsite 0m 22s trunk passed
          +1 mvneclipse 0m 15s trunk passed
          +1 findbugs 0m 33s trunk passed
          +1 javadoc 0m 16s trunk passed
          +1 mvninstall 0m 18s the patch passed
          +1 compile 0m 14s the patch passed
          +1 javac 0m 14s the patch passed
          -0 checkstyle 0m 12s hadoop-tools/hadoop-distcp: The patch generated 24 new + 174 unchanged - 12 fixed = 198 total (was 186)
          +1 mvnsite 0m 18s the patch passed
          +1 mvneclipse 0m 11s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 0m 32s the patch passed
          +1 javadoc 0m 13s the patch passed
          +1 unit 11m 41s hadoop-distcp in the patch passed.
          +1 asflicense 0m 15s The patch does not generate ASF License warnings.
          25m 7s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue HDFS-9820
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12833897/HDFS-9820.008.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux e05f337300d9 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / c023c74
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/17199/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-distcp.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17199/testReport/
          modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17199/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 28s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 5 new or modified test files. +1 mvninstall 7m 53s trunk passed +1 compile 0m 20s trunk passed +1 checkstyle 0m 18s trunk passed +1 mvnsite 0m 22s trunk passed +1 mvneclipse 0m 15s trunk passed +1 findbugs 0m 33s trunk passed +1 javadoc 0m 16s trunk passed +1 mvninstall 0m 18s the patch passed +1 compile 0m 14s the patch passed +1 javac 0m 14s the patch passed -0 checkstyle 0m 12s hadoop-tools/hadoop-distcp: The patch generated 24 new + 174 unchanged - 12 fixed = 198 total (was 186) +1 mvnsite 0m 18s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 32s the patch passed +1 javadoc 0m 13s the patch passed +1 unit 11m 41s hadoop-distcp in the patch passed. +1 asflicense 0m 15s The patch does not generate ASF License warnings. 25m 7s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HDFS-9820 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12833897/HDFS-9820.008.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux e05f337300d9 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / c023c74 Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/17199/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-distcp.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17199/testReport/ modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17199/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          andrew.wang Andrew Wang added a comment -

          One little nit then, this code block:

          80	  private void prepareFileListing(Job job) throws Exception {
          81	    if (inputOptions.shouldUseSnapshotDiff()) {
          82	      DistCpSync distCpSync = new DistCpSync(inputOptions, getConf());
          83	      if (distCpSync.sync()) {
          84	        createInputFileListingWithDiff(job, distCpSync);
          85	      } else {
          86	        throw new Exception("DistCp sync failed, input options: "
          87	            + inputOptions);
          88	      }
          89	    }
          90	
          91	    // Fallback to default DistCp if without "diff" option or sync failed.
          92	    if (!inputOptions.shouldUseSnapshotDiff()) {
          93	      createInputFileListing(job);
          94	    }
          95	  }
          

          I know this was copy pasted, but the comment seems wrong since there is no fallback. This could also be structured as a simple if/else for clarity.

          +1 pending though, this is a stylistic rather than functional issue. Thanks Yongjun!

          Show
          andrew.wang Andrew Wang added a comment - One little nit then, this code block: 80 private void prepareFileListing(Job job) throws Exception { 81 if (inputOptions.shouldUseSnapshotDiff()) { 82 DistCpSync distCpSync = new DistCpSync(inputOptions, getConf()); 83 if (distCpSync.sync()) { 84 createInputFileListingWithDiff(job, distCpSync); 85 } else { 86 throw new Exception( "DistCp sync failed, input options: " 87 + inputOptions); 88 } 89 } 90 91 // Fallback to default DistCp if without "diff" option or sync failed. 92 if (!inputOptions.shouldUseSnapshotDiff()) { 93 createInputFileListing(job); 94 } 95 } I know this was copy pasted, but the comment seems wrong since there is no fallback. This could also be structured as a simple if/else for clarity. +1 pending though, this is a stylistic rather than functional issue. Thanks Yongjun!
          Hide
          yzhangal Yongjun Zhang added a comment -

          Thanks Andrew Wang.

          New rev 009 to fix the comments.

          Show
          yzhangal Yongjun Zhang added a comment - Thanks Andrew Wang . New rev 009 to fix the comments.
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 11s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 5 new or modified test files.
          +1 mvninstall 6m 34s trunk passed
          +1 compile 0m 16s trunk passed
          +1 checkstyle 0m 15s trunk passed
          +1 mvnsite 0m 20s trunk passed
          +1 mvneclipse 0m 14s trunk passed
          +1 findbugs 0m 22s trunk passed
          +1 javadoc 0m 12s trunk passed
          +1 mvninstall 0m 15s the patch passed
          +1 compile 0m 14s the patch passed
          +1 javac 0m 14s the patch passed
          -0 checkstyle 0m 12s hadoop-tools/hadoop-distcp: The patch generated 24 new + 174 unchanged - 12 fixed = 198 total (was 186)
          +1 mvnsite 0m 18s the patch passed
          +1 mvneclipse 0m 9s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 0m 28s the patch passed
          +1 javadoc 0m 10s the patch passed
          +1 unit 11m 3s hadoop-distcp in the patch passed.
          +1 asflicense 0m 14s The patch does not generate ASF License warnings.
          22m 5s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue HDFS-9820
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12834276/HDFS-9820.009.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 5580be8829de 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / e9c4616
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/17228/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-distcp.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17228/testReport/
          modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17228/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 11s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 5 new or modified test files. +1 mvninstall 6m 34s trunk passed +1 compile 0m 16s trunk passed +1 checkstyle 0m 15s trunk passed +1 mvnsite 0m 20s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 0m 22s trunk passed +1 javadoc 0m 12s trunk passed +1 mvninstall 0m 15s the patch passed +1 compile 0m 14s the patch passed +1 javac 0m 14s the patch passed -0 checkstyle 0m 12s hadoop-tools/hadoop-distcp: The patch generated 24 new + 174 unchanged - 12 fixed = 198 total (was 186) +1 mvnsite 0m 18s the patch passed +1 mvneclipse 0m 9s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 28s the patch passed +1 javadoc 0m 10s the patch passed +1 unit 11m 3s hadoop-distcp in the patch passed. +1 asflicense 0m 14s The patch does not generate ASF License warnings. 22m 5s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HDFS-9820 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12834276/HDFS-9820.009.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 5580be8829de 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / e9c4616 Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/17228/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-distcp.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17228/testReport/ modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17228/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10640 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10640/)
          HDFS-9820. Improve distcp to support efficient restore to an earlier (yzhang: rev 8650cc84f20e7d8c32dcdcd91c94372d476e2276)

          • (add) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSyncReverseFromSource.java
          • (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java
          • (add) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSyncReverseBase.java
          • (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java
          • (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java
          • (edit) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java
          • (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java
          • (add) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSyncReverseFromTarget.java
          • (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java
          • (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java
          • (edit) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java
          • (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java
          • (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10640 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10640/ ) HDFS-9820 . Improve distcp to support efficient restore to an earlier (yzhang: rev 8650cc84f20e7d8c32dcdcd91c94372d476e2276) (add) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSyncReverseFromSource.java (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DiffInfo.java (add) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSyncReverseBase.java (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptions.java (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/OptionsParser.java (edit) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSync.java (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java (add) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestDistCpSyncReverseFromTarget.java (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpOptionSwitch.java (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java (edit) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestOptionsParser.java (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java
          Hide
          yzhangal Yongjun Zhang added a comment -

          Many thanks again Andrew Wang, I have committed to trunk.

          Revised branch-2 patch accordingly with the same change, and uploaded HDFS-9820.branch-2.002.patch, would you please help taking a look?

          Show
          yzhangal Yongjun Zhang added a comment - Many thanks again Andrew Wang , I have committed to trunk. Revised branch-2 patch accordingly with the same change, and uploaded HDFS-9820 .branch-2.002.patch, would you please help taking a look?
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 25s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 6 new or modified test files.
          +1 mvninstall 7m 31s branch-2 passed
          +1 compile 0m 23s branch-2 passed with JDK v1.8.0_101
          +1 compile 0m 21s branch-2 passed with JDK v1.7.0_111
          +1 checkstyle 0m 22s branch-2 passed
          +1 mvnsite 0m 27s branch-2 passed
          +1 mvneclipse 1m 3s branch-2 passed
          +1 findbugs 0m 35s branch-2 passed
          +1 javadoc 0m 15s branch-2 passed with JDK v1.8.0_101
          +1 javadoc 0m 14s branch-2 passed with JDK v1.7.0_111
          +1 mvninstall 0m 18s the patch passed
          +1 compile 0m 13s the patch passed with JDK v1.8.0_101
          +1 javac 0m 13s the patch passed
          +1 compile 0m 16s the patch passed with JDK v1.7.0_111
          +1 javac 0m 16s the patch passed
          -0 checkstyle 0m 15s hadoop-tools/hadoop-distcp: The patch generated 27 new + 208 unchanged - 12 fixed = 235 total (was 220)
          +1 mvnsite 0m 23s the patch passed
          +1 mvneclipse 0m 13s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 0m 40s the patch passed
          +1 javadoc 0m 11s the patch passed with JDK v1.8.0_101
          +1 javadoc 0m 12s the patch passed with JDK v1.7.0_111
          +1 unit 8m 34s hadoop-distcp in the patch passed with JDK v1.7.0_111.
          +1 asflicense 0m 18s The patch does not generate ASF License warnings.
          33m 25s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:b59b8b7
          JIRA Issue HDFS-9820
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12834294/HDFS-9820.branch-2.002.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 5032699e3769 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision branch-2 / a3cbaf0
          Default Java 1.7.0_111
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_101 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_111
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/17230/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-distcp.txt
          JDK v1.7.0_111 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17230/testReport/
          modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17230/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 25s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 6 new or modified test files. +1 mvninstall 7m 31s branch-2 passed +1 compile 0m 23s branch-2 passed with JDK v1.8.0_101 +1 compile 0m 21s branch-2 passed with JDK v1.7.0_111 +1 checkstyle 0m 22s branch-2 passed +1 mvnsite 0m 27s branch-2 passed +1 mvneclipse 1m 3s branch-2 passed +1 findbugs 0m 35s branch-2 passed +1 javadoc 0m 15s branch-2 passed with JDK v1.8.0_101 +1 javadoc 0m 14s branch-2 passed with JDK v1.7.0_111 +1 mvninstall 0m 18s the patch passed +1 compile 0m 13s the patch passed with JDK v1.8.0_101 +1 javac 0m 13s the patch passed +1 compile 0m 16s the patch passed with JDK v1.7.0_111 +1 javac 0m 16s the patch passed -0 checkstyle 0m 15s hadoop-tools/hadoop-distcp: The patch generated 27 new + 208 unchanged - 12 fixed = 235 total (was 220) +1 mvnsite 0m 23s the patch passed +1 mvneclipse 0m 13s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 40s the patch passed +1 javadoc 0m 11s the patch passed with JDK v1.8.0_101 +1 javadoc 0m 12s the patch passed with JDK v1.7.0_111 +1 unit 8m 34s hadoop-distcp in the patch passed with JDK v1.7.0_111. +1 asflicense 0m 18s The patch does not generate ASF License warnings. 33m 25s Subsystem Report/Notes Docker Image:yetus/hadoop:b59b8b7 JIRA Issue HDFS-9820 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12834294/HDFS-9820.branch-2.002.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 5032699e3769 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision branch-2 / a3cbaf0 Default Java 1.7.0_111 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_101 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_111 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/17230/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-distcp.txt JDK v1.7.0_111 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17230/testReport/ modules C: hadoop-tools/hadoop-distcp U: hadoop-tools/hadoop-distcp Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17230/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          yzhangal Yongjun Zhang added a comment -

          Many thanks to Jing Zhao for earlier review and discussion, and to Andrew Wang for the review and commit.

          I'm marking it fixed, and will add branch-2 when we are done.

          Show
          yzhangal Yongjun Zhang added a comment - Many thanks to Jing Zhao for earlier review and discussion, and to Andrew Wang for the review and commit. I'm marking it fixed, and will add branch-2 when we are done.
          Hide
          andrew.wang Andrew Wang added a comment -

          branch-2.002 LGTM, thanks Yongjun

          Show
          andrew.wang Andrew Wang added a comment - branch-2.002 LGTM, thanks Yongjun
          Hide
          yzhangal Yongjun Zhang added a comment -

          Many thanks Andrew Wang, committed to branch-2.

          Show
          yzhangal Yongjun Zhang added a comment - Many thanks Andrew Wang , committed to branch-2.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10676 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10676/)
          HDFS-11040. Add documentation for HDFS-9820 distcp improvement. (yzhang: rev 0f0c15f7a5ea33ced781978bea971f3750883f41)

          • (edit) hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10676 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10676/ ) HDFS-11040 . Add documentation for HDFS-9820 distcp improvement. (yzhang: rev 0f0c15f7a5ea33ced781978bea971f3750883f41) (edit) hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm

            People

            • Assignee:
              yzhangal Yongjun Zhang
              Reporter:
              yzhangal Yongjun Zhang
            • Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development