Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.7.1
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: distcp
    • Labels:
    • Hadoop Flags:
      Reviewed

      Description

      MAPREDUCE-5899 added distcp support for incremental copy with a new append flag.

      It should be documented.

      1. MAPREDUCE-6471.004.patch
        3 kB
        Neelesh Srinivas Salian
      2. MAPREDUCE-6471.003.patch
        3 kB
        Neelesh Srinivas Salian
      3. MAPREDUCE-6471.002.patch
        2 kB
        Neelesh Srinivas Salian
      4. MAPREDUCE-6471.001.patch
        0.9 kB
        Neelesh Srinivas Salian

        Issue Links

          Activity

          Hide
          nijel nijel added a comment -

          Please feel free to re assign if the work is started

          Show
          nijel nijel added a comment - Please feel free to re assign if the work is started
          Hide
          neelesh77 Neelesh Srinivas Salian added a comment -

          I would like to work on this JIRA.
          Could you please assign it to me?

          Thank you.

          Show
          neelesh77 Neelesh Srinivas Salian added a comment - I would like to work on this JIRA. Could you please assign it to me? Thank you.
          Hide
          nijel nijel added a comment -

          yes.. assigned to you. please go ahead.

          Show
          nijel nijel added a comment - yes.. assigned to you. please go ahead.
          Hide
          neelesh77 Neelesh Srinivas Salian added a comment -

          Thanks nijel. Will start on this.

          Shall I post here if I have questions?

          Show
          neelesh77 Neelesh Srinivas Salian added a comment - Thanks nijel . Will start on this. Shall I post here if I have questions?
          Hide
          neelesh77 Neelesh Srinivas Salian added a comment -

          Added the 1st version of the patch.
          It includes the text informing about the incremental copy feature.

          Requesting review.

          Show
          neelesh77 Neelesh Srinivas Salian added a comment - Added the 1st version of the patch. It includes the text informing about the incremental copy feature. Requesting review.
          Hide
          neelesh77 Neelesh Srinivas Salian added a comment -

          Added the 1st version of the patch.
          It includes the text informing about the incremental copy feature.

          Show
          neelesh77 Neelesh Srinivas Salian added a comment - Added the 1st version of the patch. It includes the text informing about the incremental copy feature.
          Hide
          hadoopqa Hadoop QA added a comment -



          +1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 3m 1s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 release audit 0m 20s The applied patch does not increase the total number of release audit warnings.
          +1 site 3m 2s Site still builds.
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
              6m 26s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12761313/MAPREDUCE-6471.001.patch
          Optional Tests site
          git revision trunk / 3a9c707
          Java 1.7.0_55
          uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6008/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 pre-patch 3m 1s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 release audit 0m 20s The applied patch does not increase the total number of release audit warnings. +1 site 3m 2s Site still builds. +1 whitespace 0m 0s The patch has no lines that end in whitespace.     6m 26s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12761313/MAPREDUCE-6471.001.patch Optional Tests site git revision trunk / 3a9c707 Java 1.7.0_55 uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6008/console This message was automatically generated.
          Hide
          neelesh77 Neelesh Srinivas Salian added a comment -

          Added a -append option in the list of "Command line options"
          and mentioned the support for incremental copy below updates.

          Show
          neelesh77 Neelesh Srinivas Salian added a comment - Added a -append option in the list of "Command line options" and mentioned the support for incremental copy below updates.
          Hide
          neelesh77 Neelesh Srinivas Salian added a comment -

          Added a -append option in the list of "Command line options"
          and mentioned the support for incremental copy below updates.

          Show
          neelesh77 Neelesh Srinivas Salian added a comment - Added a -append option in the list of "Command line options" and mentioned the support for incremental copy below updates.
          Hide
          hadoopqa Hadoop QA added a comment -



          +1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 2m 59s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 release audit 0m 20s The applied patch does not increase the total number of release audit warnings.
          +1 site 2m 59s Site still builds.
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
              6m 21s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12761354/MAPREDUCE-6471.002.patch
          Optional Tests site
          git revision trunk / 3a9c707
          Java 1.7.0_55
          uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6009/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 pre-patch 2m 59s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 release audit 0m 20s The applied patch does not increase the total number of release audit warnings. +1 site 2m 59s Site still builds. +1 whitespace 0m 0s The patch has no lines that end in whitespace.     6m 21s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12761354/MAPREDUCE-6471.002.patch Optional Tests site git revision trunk / 3a9c707 Java 1.7.0_55 uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6009/console This message was automatically generated.
          Hide
          qwertymaniac Harsh J added a comment -

          The changes in the options section look good to me, but the note added in the -update and -overwrite behaviour got me reading it entirely, and somehow the last example block looks incorrect. Its not directly related to your change, but there's a few problems in the below, unless am wrong:

          Now, consider the following copy operation:
          
          distcp hdfs://nn1:8020/source/first hdfs://nn1:8020/source/second hdfs://nn2:8020/target
          
          With sources/sizes:
          
          hdfs://nn1:8020/source/first/1 32
          hdfs://nn1:8020/source/first/2 32
          hdfs://nn1:8020/source/second/10 64
          hdfs://nn1:8020/source/second/20 32
          
          And destination/sizes:
          
          hdfs://nn2:8020/target/1 32
          hdfs://nn2:8020/target/10 32
          hdfs://nn2:8020/target/20 64
          
          Will effect:
          
          hdfs://nn2:8020/target/1 32
          hdfs://nn2:8020/target/2 32
          hdfs://nn2:8020/target/10 64
          hdfs://nn2:8020/target/20 32
          
          1 is skipped because the file-length and contents match. 2 is copied because it doesn’t exist at the target. 10 and 20 are overwritten since the contents don’t match the source.
          
          If -update is used, 1 is overwritten as well.
          

          Those last two lines, I think should read instead as:

          If `-update` is used, 1 is skipped because the file-length and contents match. 2 is copied because it doesn’t exist at the target. 10 and 20 are overwritten since the contents don’t match the source.
          
          If `-overwrite` is used, 1 is overwritten as well.
          

          Or with the -append change added:

          If `-update` is used, 1 is skipped because the file-length and contents match. 2 is copied because it doesn’t exist at the target. 10 and 20 are overwritten since the contents don’t match the source. However, if `-append` is additionally used, then only 10 is overwritten (source length less than destination) and 20 is appended with the change in file (if the files match up to the destination's original length).
          
          If `-overwrite` is used, 1 is overwritten as well.
          

          Thoughts?

          Show
          qwertymaniac Harsh J added a comment - The changes in the options section look good to me, but the note added in the -update and -overwrite behaviour got me reading it entirely, and somehow the last example block looks incorrect. Its not directly related to your change, but there's a few problems in the below, unless am wrong: Now, consider the following copy operation: distcp hdfs: //nn1:8020/source/first hdfs://nn1:8020/source/second hdfs://nn2:8020/target With sources/sizes: hdfs: //nn1:8020/source/first/1 32 hdfs: //nn1:8020/source/first/2 32 hdfs: //nn1:8020/source/second/10 64 hdfs: //nn1:8020/source/second/20 32 And destination/sizes: hdfs: //nn2:8020/target/1 32 hdfs: //nn2:8020/target/10 32 hdfs: //nn2:8020/target/20 64 Will effect: hdfs: //nn2:8020/target/1 32 hdfs: //nn2:8020/target/2 32 hdfs: //nn2:8020/target/10 64 hdfs: //nn2:8020/target/20 32 1 is skipped because the file-length and contents match. 2 is copied because it doesn’t exist at the target. 10 and 20 are overwritten since the contents don’t match the source. If -update is used, 1 is overwritten as well. Those last two lines, I think should read instead as: If `-update` is used, 1 is skipped because the file-length and contents match. 2 is copied because it doesn’t exist at the target. 10 and 20 are overwritten since the contents don’t match the source. If `-overwrite` is used, 1 is overwritten as well. Or with the -append change added: If `-update` is used, 1 is skipped because the file-length and contents match. 2 is copied because it doesn’t exist at the target. 10 and 20 are overwritten since the contents don’t match the source. However, if `-append` is additionally used, then only 10 is overwritten (source length less than destination) and 20 is appended with the change in file ( if the files match up to the destination's original length). If `-overwrite` is used, 1 is overwritten as well. Thoughts?
          Hide
          neelesh77 Neelesh Srinivas Salian added a comment -

          Harsh J that looks good to add for the `-update` information.
          Uploaded new patch with the additions.

          Show
          neelesh77 Neelesh Srinivas Salian added a comment - Harsh J that looks good to add for the `-update` information. Uploaded new patch with the additions.
          Hide
          hadoopqa Hadoop QA added a comment -



          +1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 3m 6s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 release audit 0m 21s The applied patch does not increase the total number of release audit warnings.
          +1 site 3m 5s Site still builds.
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
              6m 36s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12762425/MAPREDUCE-6471.003.patch
          Optional Tests site
          git revision trunk / 83e65c5
          Java 1.7.0_55
          uname Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6018/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 pre-patch 3m 6s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 release audit 0m 21s The applied patch does not increase the total number of release audit warnings. +1 site 3m 5s Site still builds. +1 whitespace 0m 0s The patch has no lines that end in whitespace.     6m 36s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12762425/MAPREDUCE-6471.003.patch Optional Tests site git revision trunk / 83e65c5 Java 1.7.0_55 uname Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6018/console This message was automatically generated.
          Hide
          qwertymaniac Harsh J added a comment -

          Thanks for the new revision - text looks good to me, but could you post another version with the filenames back-tick quoted like in the original line? Sorry I didn't place it that way in my comment earlier.

          +1 with the backticks added.

          Show
          qwertymaniac Harsh J added a comment - Thanks for the new revision - text looks good to me, but could you post another version with the filenames back-tick quoted like in the original line? Sorry I didn't place it that way in my comment earlier. +1 with the backticks added.
          Hide
          hadoopqa Hadoop QA added a comment -



          +1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 3m 13s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 release audit 0m 21s The applied patch does not increase the total number of release audit warnings.
          +1 site 3m 4s Site still builds.
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
              6m 41s  



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12762613/MAPREDUCE-6471.004.patch
          Optional Tests site
          git revision trunk / 1c030c6
          Java 1.7.0_55
          uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6022/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 pre-patch 3m 13s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 release audit 0m 21s The applied patch does not increase the total number of release audit warnings. +1 site 3m 4s Site still builds. +1 whitespace 0m 0s The patch has no lines that end in whitespace.     6m 41s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12762613/MAPREDUCE-6471.004.patch Optional Tests site git revision trunk / 1c030c6 Java 1.7.0_55 uname Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6022/console This message was automatically generated.
          Hide
          qwertymaniac Harsh J added a comment -

          Committed to branch-2 and trunk. Thank you Neelesh!

          Show
          qwertymaniac Harsh J added a comment - Committed to branch-2 and trunk. Thank you Neelesh!
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #8529 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8529/)
          MAPREDUCE-6471. Document distcp incremental copy. Contributed by Neelesh Srinivas Salian. (harsh: rev 66dad854c0aea8c137017fcf198b165cc1bd8bdd)

          • hadoop-mapreduce-project/CHANGES.txt
          • hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8529 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8529/ ) MAPREDUCE-6471 . Document distcp incremental copy. Contributed by Neelesh Srinivas Salian. (harsh: rev 66dad854c0aea8c137017fcf198b165cc1bd8bdd) hadoop-mapreduce-project/CHANGES.txt hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #1189 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1189/)
          MAPREDUCE-6471. Document distcp incremental copy. Contributed by Neelesh Srinivas Salian. (harsh: rev 66dad854c0aea8c137017fcf198b165cc1bd8bdd)

          • hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm
          • hadoop-mapreduce-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #1189 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1189/ ) MAPREDUCE-6471 . Document distcp incremental copy. Contributed by Neelesh Srinivas Salian. (harsh: rev 66dad854c0aea8c137017fcf198b165cc1bd8bdd) hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm hadoop-mapreduce-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #450 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/450/)
          MAPREDUCE-6471. Document distcp incremental copy. Contributed by Neelesh Srinivas Salian. (harsh: rev 66dad854c0aea8c137017fcf198b165cc1bd8bdd)

          • hadoop-mapreduce-project/CHANGES.txt
          • hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #450 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/450/ ) MAPREDUCE-6471 . Document distcp incremental copy. Contributed by Neelesh Srinivas Salian. (harsh: rev 66dad854c0aea8c137017fcf198b165cc1bd8bdd) hadoop-mapreduce-project/CHANGES.txt hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #2394 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2394/)
          MAPREDUCE-6471. Document distcp incremental copy. Contributed by Neelesh Srinivas Salian. (harsh: rev 66dad854c0aea8c137017fcf198b165cc1bd8bdd)

          • hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm
          • hadoop-mapreduce-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2394 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2394/ ) MAPREDUCE-6471 . Document distcp incremental copy. Contributed by Neelesh Srinivas Salian. (harsh: rev 66dad854c0aea8c137017fcf198b165cc1bd8bdd) hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm hadoop-mapreduce-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #456 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/456/)
          MAPREDUCE-6471. Document distcp incremental copy. Contributed by Neelesh Srinivas Salian. (harsh: rev 66dad854c0aea8c137017fcf198b165cc1bd8bdd)

          • hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm
          • hadoop-mapreduce-project/CHANGES.txt
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #456 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/456/ ) MAPREDUCE-6471 . Document distcp incremental copy. Contributed by Neelesh Srinivas Salian. (harsh: rev 66dad854c0aea8c137017fcf198b165cc1bd8bdd) hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm hadoop-mapreduce-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #2367 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2367/)
          MAPREDUCE-6471. Document distcp incremental copy. Contributed by Neelesh Srinivas Salian. (harsh: rev 66dad854c0aea8c137017fcf198b165cc1bd8bdd)

          • hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm
          • hadoop-mapreduce-project/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2367 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2367/ ) MAPREDUCE-6471 . Document distcp incremental copy. Contributed by Neelesh Srinivas Salian. (harsh: rev 66dad854c0aea8c137017fcf198b165cc1bd8bdd) hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm hadoop-mapreduce-project/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #427 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/427/)
          MAPREDUCE-6471. Document distcp incremental copy. Contributed by Neelesh Srinivas Salian. (harsh: rev 66dad854c0aea8c137017fcf198b165cc1bd8bdd)

          • hadoop-mapreduce-project/CHANGES.txt
          • hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #427 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/427/ ) MAPREDUCE-6471 . Document distcp incremental copy. Contributed by Neelesh Srinivas Salian. (harsh: rev 66dad854c0aea8c137017fcf198b165cc1bd8bdd) hadoop-mapreduce-project/CHANGES.txt hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm

            People

            • Assignee:
              neelesh77 Neelesh Srinivas Salian
              Reporter:
              arpitagarwal Arpit Agarwal
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development