Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-2984

Distcp should have forrest documentation

Details

    • Task
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • None
    • 0.18.0
    • util
    • None
    • Reviewed

    Description

      We really should have a page on how to use distcp.

      Attachments

        1. 2984-0.patch
          97 kB
          Christopher Douglas
        2. 2984-1.patch
          63 kB
          Christopher Douglas
        3. 2984-2.patch
          64 kB
          Christopher Douglas
        4. 2984-3.patch
          70 kB
          Christopher Douglas

        Activity

          First draft

          cdouglas Christopher Douglas added a comment - First draft
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12383727/2984-0.patch
          against trunk revision 665937.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          -1 release audit. The applied patch generated 203 release audit warnings (more than the trunk's current 201 warnings).

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2627/testReport/
          Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2627/artifact/trunk/current/releaseAuditDiffWarnings.txt
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2627/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2627/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2627/console

          This message is automatically generated.

          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12383727/2984-0.patch against trunk revision 665937. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 203 release audit warnings (more than the trunk's current 201 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2627/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2627/artifact/trunk/current/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2627/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2627/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2627/console This message is automatically generated.
          cdouglas Christopher Douglas added a comment - - edited

          I changed the name of the file, and might have missed the license somehow... trying again.

          cdouglas Christopher Douglas added a comment - - edited I changed the name of the file, and might have missed the license somehow... trying again.
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12383764/2984-1.patch
          against trunk revision 666056.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          -1 release audit. The applied patch generated 202 release audit warnings (more than the trunk's current 201 warnings).

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2632/testReport/
          Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2632/artifact/trunk/current/releaseAuditDiffWarnings.txt
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2632/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2632/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2632/console

          This message is automatically generated.

          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12383764/2984-1.patch against trunk revision 666056. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 202 release audit warnings (more than the trunk's current 201 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2632/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2632/artifact/trunk/current/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2632/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2632/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2632/console This message is automatically generated.
          szetszwo Tsz-wo Sze added a comment - - edited
          • Provide the full class name of DispCp in the beginning of the document.
          • Add a line to explain the name distcp
          • Some examples have long-single-line-commands. These long-single-line-commands got wrapped up in the document, especially in the pdf. It is good to use \ to break the long command into several lines. e.g.
            Use
            hadoop distcp               \
                hdfs://nn1:8020/foo/a   \
                hdfs://nn1:8020/foo/b   \
                hdfs://nn2:8020/bar/foo
            

            instead of

            hadoop distcp hdfs://nn1:8020/foo/a hdfs://nn1:8020/foo/b hdfs://nn2:8020/bar/foo
            
          • I think it is more clear to add a command prompt for shell commands
            e.g.
            bash$ hadoop distcp ...
            
          szetszwo Tsz-wo Sze added a comment - - edited Provide the full class name of DispCp in the beginning of the document. Add a line to explain the name distcp Some examples have long-single-line-commands. These long-single-line-commands got wrapped up in the document, especially in the pdf. It is good to use \ to break the long command into several lines. e.g. Use hadoop distcp \ hdfs://nn1:8020/foo/a \ hdfs://nn1:8020/foo/b \ hdfs://nn2:8020/bar/foo instead of hadoop distcp hdfs://nn1:8020/foo/a hdfs://nn1:8020/foo/b hdfs://nn2:8020/bar/foo I think it is more clear to add a command prompt for shell commands e.g. bash$ hadoop distcp ...
          szetszwo Tsz-wo Sze added a comment -

          BTW, there are some unrelated changes of docs/hadoop-default.html in the patch.

          szetszwo Tsz-wo Sze added a comment - BTW, there are some unrelated changes of docs/hadoop-default.html in the patch.

          Provide the full class name of DispCp in the beginning of the document.

          Since this is a guide to users of distcp and not developers, I left this out.

          Add a line to explain the name distcp

          Some examples have long-single-line-commands. These long-single-line-commands got wrapped up in the document, especially in the pdf. It is good to use \ to break the long command into several lines

          I think it is more clear to add a command prompt for shell commands

          Good ideas; done.

          Thanks for the review

          cdouglas Christopher Douglas added a comment - Provide the full class name of DispCp in the beginning of the document. Since this is a guide to users of distcp and not developers, I left this out. Add a line to explain the name distcp Some examples have long-single-line-commands. These long-single-line-commands got wrapped up in the document, especially in the pdf. It is good to use \ to break the long command into several lines I think it is more clear to add a command prompt for shell commands Good ideas; done. Thanks for the review
          knoguchi Koji Noguchi added a comment -

          +1.

          It's also worth noting that if another client is still writing to a source file, the copy will likely fail.

          Maybe also mention,

          • if any source files are deleted after distcp has started, mappers would fail (with file not found).
          • If speculative execution is turned on as 'final', behavior of distcp is undefined.

          It's worth giving some examples of -update and -overwrite.

          I always had trouble with these options.
          Could you show how the target directory structures look like after the distcp?
          (with and without -update/overwrite option)

          knoguchi Koji Noguchi added a comment - +1. It's also worth noting that if another client is still writing to a source file, the copy will likely fail. Maybe also mention, if any source files are deleted after distcp has started, mappers would fail (with file not found). If speculative execution is turned on as 'final', behavior of distcp is undefined. It's worth giving some examples of -update and -overwrite. I always had trouble with these options. Could you show how the target directory structures look like after the distcp? (with and without -update/overwrite option)

          Incorporated Koji's feedback

          cdouglas Christopher Douglas added a comment - Incorporated Koji's feedback

          I just committed this.

          cdouglas Christopher Douglas added a comment - I just committed this.
          hudson Hudson added a comment -
          hudson Hudson added a comment - Integrated in Hadoop-trunk #520 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/520/ )

          People

            cdouglas Christopher Douglas
            omalley Owen O'Malley
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: