Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-654

Add an option -count to distcp for displaying some info about the src files

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 0.21.0
    • Component/s: distcp
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Add an option -count to distcp for displaying metadata about src files like number of files to be copied and total size of src files to be copied.
      WIth -count, distcp doesn't do any copy. Just displays info and exits.
      This is useful specifically when used with -update.
      distcp -update -count <src>* <dst>
      would display the number of files to be updated and the total size of copy needs to be done(by comparing the file sizes and checksums at src and dst). Based on this info, users could allocate the number of nodes needed for the actual update job.

      1. d_count_v1.patch
        7 kB
        Ravi Gummadi
      2. d_count.patch
        6 kB
        Ravi Gummadi
      3. d_count654.patch
        8 kB
        Ravi Gummadi
      4. M654-2.patch
        7 kB
        Chris Douglas

        Activity

        Ravi Gummadi created issue -
        Hide
        Ravi Gummadi added a comment -

        Attaching patch that adds -count option to distcp.

        Please review and provide your comments.

        Show
        Ravi Gummadi added a comment - Attaching patch that adds -count option to distcp. Please review and provide your comments.
        Ravi Gummadi made changes -
        Field Original Value New Value
        Attachment d_count.patch [ 12411436 ]
        Hide
        Ravi Gummadi added a comment -

        With -count option, last argument is considered as destination only when -update is given. In other cases, all the paths are consiudered source paths only.

        Show
        Ravi Gummadi added a comment - With -count option, last argument is considered as destination only when -update is given. In other cases, all the paths are consiudered source paths only.
        Hide
        Ravi Gummadi added a comment -

        Attaching new patch which also displays
        (1) the number of files that are going to be skipped copying by distcp with -update and
        (2) the number of bytes that are going to be skipped copying by distcp with -update

        Please review and provide your comments.

        Show
        Ravi Gummadi added a comment - Attaching new patch which also displays (1) the number of files that are going to be skipped copying by distcp with -update and (2) the number of bytes that are going to be skipped copying by distcp with -update Please review and provide your comments.
        Ravi Gummadi made changes -
        Attachment d_count_v1.patch [ 12411513 ]
        Ravi Gummadi made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12411513/d_count_v1.patch
        against trunk revision 808320.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 patch. The patch command could not apply the patch.

        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/525/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12411513/d_count_v1.patch against trunk revision 808320. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/525/console This message is automatically generated.
        Ravi Gummadi made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Ravi Gummadi added a comment -

        Attaching patch that applies after MAPREDUCE-649 is committed.

        Please review and provide your comments.

        Show
        Ravi Gummadi added a comment - Attaching patch that applies after MAPREDUCE-649 is committed. Please review and provide your comments.
        Ravi Gummadi made changes -
        Attachment d_count654.patch [ 12418918 ]
        Ravi Gummadi made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12418918/d_count654.patch
        against trunk revision 812546.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/16/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/16/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/16/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/16/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12418918/d_count654.patch against trunk revision 812546. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/16/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/16/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/16/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/16/console This message is automatically generated.
        Hide
        Tom White added a comment -

        How about calling the new option -dryrun, to make it clearer that it doesn't do a copy?

        Show
        Tom White added a comment - How about calling the new option -dryrun , to make it clearer that it doesn't do a copy?
        Hide
        Ravi Gummadi added a comment -

        Yes. Venkatesh is working on the new option -dryrun that displays the files to be copied by distcp also. This option will be renamed to -dryrun.

        Show
        Ravi Gummadi added a comment - Yes. Venkatesh is working on the new option -dryrun that displays the files to be copied by distcp also. This option will be renamed to -dryrun.
        Hide
        Chris Douglas added a comment -

        Spoke with Ravi and Venkatesh. Will call this option -dryrun for now, to be upgraded with filename listings later.

        Patch is identical to last posted, save countOnly var is renamed to dryrun, the command-line is changed as discussed, and it is merged with trunk.

        Show
        Chris Douglas added a comment - Spoke with Ravi and Venkatesh. Will call this option -dryrun for now, to be upgraded with filename listings later. Patch is identical to last posted, save countOnly var is renamed to dryrun , the command-line is changed as discussed, and it is merged with trunk.
        Chris Douglas made changes -
        Attachment M654-2.patch [ 12419974 ]
        Hide
        Chris Douglas added a comment -

        +1

        Test results from the earlier Hudson run are good, as the patch only changes the name of the command line param and a variable.

        Verified correctness of the param manually on my dev box and ran TestCopyFiles to confirm that distcp works as expected.

        Show
        Chris Douglas added a comment - +1 Test results from the earlier Hudson run are good, as the patch only changes the name of the command line param and a variable. Verified correctness of the param manually on my dev box and ran TestCopyFiles to confirm that distcp works as expected.
        Chris Douglas made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Resolution Fixed [ 1 ]
        Chris Douglas made changes -
        Issue Type New Feature [ 2 ] Improvement [ 4 ]
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #49 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/49/)
        . Add a -dryrun option to distcp printing a summary of the
        file data to be copied, without actually performing the copy. Contributed by Ravi Gummadi

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #49 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/49/ ) . Add a -dryrun option to distcp printing a summary of the file data to be copied, without actually performing the copy. Contributed by Ravi Gummadi
        Hide
        Nigel Daley added a comment -

        Why no automated unit test for this? CLI test could have been used, no?

        Show
        Nigel Daley added a comment - Why no automated unit test for this? CLI test could have been used, no?
        Hide
        Chris Douglas added a comment -

        We don't have any CLI tests for distcp, as far as I know. Adding some would be a good idea, since this tool is often included in automated environments.

        Filed MAPREDUCE-1008

        Show
        Chris Douglas added a comment - We don't have any CLI tests for distcp, as far as I know. Adding some would be a good idea, since this tool is often included in automated environments. Filed MAPREDUCE-1008
        Tom White made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Ravi Gummadi
            Reporter:
            Ravi Gummadi
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development