Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: distcp
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      distcp should validate the files copied by checking the checksums, if the filesystem supports checksums.

      1. d_verify.patch
        10 kB
        Ravi Gummadi
      2. d_verify649.patch
        10 kB
        Ravi Gummadi
      3. d_verify649.v1.patch
        9 kB
        Ravi Gummadi

        Activity

        Ravi Gummadi created issue -
        Hide
        Ravi Gummadi added a comment -

        Attaching patch that makes distcp validate copy of files within map task just after copy of each file. Validation is done by comparing file sizes and checksums, if both the file systems support checksums. In case of validation failure, distcp retries to copy the file again within the same map task(max number of tries can be configured using distcp.file.retries(with default value of 3)).

        Please review and provide your comments.

        Show
        Ravi Gummadi added a comment - Attaching patch that makes distcp validate copy of files within map task just after copy of each file. Validation is done by comparing file sizes and checksums, if both the file systems support checksums. In case of validation failure, distcp retries to copy the file again within the same map task(max number of tries can be configured using distcp.file.retries(with default value of 3)). Please review and provide your comments.
        Ravi Gummadi made changes -
        Field Original Value New Value
        Attachment d_verify.patch [ 12410779 ]
        Owen O'Malley made changes -
        Project Hadoop Common [ 12310240 ] Hadoop Map/Reduce [ 12310941 ]
        Key HADOOP-6054 MAPREDUCE-649
        Affects Version/s 0.21.0 [ 12313563 ]
        Component/s distcp [ 12312902 ]
        Component/s tools/distcp [ 12312387 ]
        Fix Version/s 0.21.0 [ 12313563 ]
        Ravi Gummadi made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Ravi Gummadi made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Ravi Gummadi added a comment -

        Attaching patch that applies to current trunk.

        Please review and provide your comments.

        Show
        Ravi Gummadi added a comment - Attaching patch that applies to current trunk. Please review and provide your comments.
        Ravi Gummadi made changes -
        Attachment d_verify649.patch [ 12418915 ]
        Ravi Gummadi made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12418915/d_verify649.patch
        against trunk revision 812537.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/47/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/47/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/47/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/47/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12418915/d_verify649.patch against trunk revision 812537. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/47/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/47/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/47/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/47/console This message is automatically generated.
        Hide
        Chris Douglas added a comment -

        Just to make sure I understand the change, this validates the source and destination checksums, retrying if they do not match. However, failure to match the checksum doesn't count as a failed copy, but as an attempt in a different retry mechanism. Are invalid/corrupt copies more common than other failures that throw exceptions (e.g. (re)moved source files, IOExceptions, etc.)? This overlapping retry logic- within a map and within a job- is a little odd. The other refactoring changes look fine. Writing a unit test for this new functionality would be both difficult and expensive, so as long as it's been tested then that's OK.

        Show
        Chris Douglas added a comment - Just to make sure I understand the change, this validates the source and destination checksums, retrying if they do not match. However, failure to match the checksum doesn't count as a failed copy, but as an attempt in a different retry mechanism. Are invalid/corrupt copies more common than other failures that throw exceptions (e.g. (re)moved source files, IOExceptions, etc.)? This overlapping retry logic- within a map and within a job- is a little odd. The other refactoring changes look fine. Writing a unit test for this new functionality would be both difficult and expensive, so as long as it's been tested then that's OK.
        Hide
        Ravi Gummadi added a comment -

        >>However, failure to match the checksum doesn't count as a failed copy, but as an attempt in a different retry mechanism.

        After specified number of retries(distcp.file.retries is the config property, with a default value of 3 => by default, 2 more times copy of file is tried in case of checksum mismatch), it is considered as a failure.

        >>Are invalid/corrupt copies more common than other failures that throw exceptions (e.g. (re)moved source files, IOExceptions, etc.)?

        Am not sure. But this feature would give more confidence to users about copy done by distcp.

        >> as long as it's been tested then that's OK.

        Yes. It is tested with different cases.

        Show
        Ravi Gummadi added a comment - >>However, failure to match the checksum doesn't count as a failed copy, but as an attempt in a different retry mechanism. After specified number of retries(distcp.file.retries is the config property, with a default value of 3 => by default, 2 more times copy of file is tried in case of checksum mismatch), it is considered as a failure. >>Are invalid/corrupt copies more common than other failures that throw exceptions (e.g. (re)moved source files, IOExceptions, etc.)? Am not sure. But this feature would give more confidence to users about copy done by distcp. >> as long as it's been tested then that's OK. Yes. It is tested with different cases.
        Hide
        Chris Douglas added a comment -

        I understand that it counts as a failure after distcp.file.retries attempts. What's odd is that this retry mechanism only works for mismatched checksums and not other errors. If there are no empirical data showing that mismatched checksums are either more common or more likely to succeed with more than one attempt, why include a separate retry mechanism covering that particular case?

        Show
        Chris Douglas added a comment - I understand that it counts as a failure after distcp.file.retries attempts. What's odd is that this retry mechanism only works for mismatched checksums and not other errors. If there are no empirical data showing that mismatched checksums are either more common or more likely to succeed with more than one attempt, why include a separate retry mechanism covering that particular case?
        Chris Douglas made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Ravi Gummadi added a comment -

        Attaching patch that makes copy of file to be retried distcp.file.retries(with default value of 3) times in case of failures. Failure could be failure of validation of copy(through file checksums mismatch). These retries are at done within the same map irrespective of -i option.

        Please review and provide your comments.

        Show
        Ravi Gummadi added a comment - Attaching patch that makes copy of file to be retried distcp.file.retries(with default value of 3) times in case of failures. Failure could be failure of validation of copy(through file checksums mismatch). These retries are at done within the same map irrespective of -i option. Please review and provide your comments.
        Ravi Gummadi made changes -
        Attachment d_verify649.v1.patch [ 12419864 ]
        Chris Douglas made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12419864/d_verify649.v1.patch
        against trunk revision 816147.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/44/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/44/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/44/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/44/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12419864/d_verify649.v1.patch against trunk revision 816147. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/44/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/44/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/44/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/44/console This message is automatically generated.
        Hide
        Chris Douglas added a comment -

        +1

        I committed this. Thanks, Ravi!

        Show
        Chris Douglas added a comment - +1 I committed this. Thanks, Ravi!
        Chris Douglas made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Fix Version/s 0.21.0 [ 12314045 ]
        Resolution Fixed [ 1 ]
        Chris Douglas made changes -
        Issue Type New Feature [ 2 ] Improvement [ 4 ]
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #49 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/49/)
        . Validate a copy by comparing the source and destination
        checksums in distcp. Also adds an intra-task retry mechanism for errors
        detected during the copy. Contributed by Ravi Gummadi

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #49 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/49/ ) . Validate a copy by comparing the source and destination checksums in distcp. Also adds an intra-task retry mechanism for errors detected during the copy. Contributed by Ravi Gummadi
        Hide
        gary murry added a comment -

        Since there are no unit tests, how was this tested? What manual scenarios were ran?

        Show
        gary murry added a comment - Since there are no unit tests, how was this tested? What manual scenarios were ran?
        Tom White made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Ravi Gummadi
            Reporter:
            Ravi Gummadi
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development