Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.2, 2.0.0-alpha, 2.0.1-alpha, 2.0.2-alpha
    • Fix Version/s: 3.0.0, 2.0.2-alpha
    • Component/s: tools
    • Labels:
      None

      Description

      Using distcp with '-skipcrccheck' still seems to cause CRC checksums to happen.

      Ran into this while debugging an issue associated with source and destination having different blocksizes, and not using the preserve blocksize parameter (-pb). In both 23.1 and 23.2 builds, trying to bypass the checksum verification by using the '-skipcrcrcheck' parameter had no effect, the distcp still failed on checksum errors.

      Test scenario to reproduce;
      do not use '-pb' and try a distcp from 20.205 (default blksize=128M) to .23 (default blksize=256M), the distcp fails on checksum errors, which is expected due to checksum calculation (tiered aggregation of all blks). Trying the same distcp only providing '-skipcrccheck' still fails with the same checksum error, it is expected that checksum would now be bypassed and the distcp would proceed.

      1. HDFS-3054.002.patch
        3 kB
        Colin Patrick McCabe
      2. HDFS-3054.004.patch
        3 kB
        Colin Patrick McCabe
      3. hdfs-3054.patch
        3 kB
        Rahul Jain

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1188 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1188/)
          HDFS-3054. distcp -skipcrccheck has no effect. Contributed by Colin Patrick McCabe. (Revision 1381296)

          Result = ABORTED
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381296
          Files :

          • /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java
          • /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1188 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1188/ ) HDFS-3054 . distcp -skipcrccheck has no effect. Contributed by Colin Patrick McCabe. (Revision 1381296) Result = ABORTED todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381296 Files : /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1157 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1157/)
          HDFS-3054. distcp -skipcrccheck has no effect. Contributed by Colin Patrick McCabe. (Revision 1381296)

          Result = FAILURE
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381296
          Files :

          • /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java
          • /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1157 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1157/ ) HDFS-3054 . distcp -skipcrccheck has no effect. Contributed by Colin Patrick McCabe. (Revision 1381296) Result = FAILURE todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381296 Files : /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #2709 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2709/)
          HDFS-3054. distcp -skipcrccheck has no effect. Contributed by Colin Patrick McCabe. (Revision 1381296)

          Result = FAILURE
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381296
          Files :

          • /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java
          • /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2709 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2709/ ) HDFS-3054 . distcp -skipcrccheck has no effect. Contributed by Colin Patrick McCabe. (Revision 1381296) Result = FAILURE todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381296 Files : /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #2748 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2748/)
          HDFS-3054. distcp -skipcrccheck has no effect. Contributed by Colin Patrick McCabe. (Revision 1381296)

          Result = SUCCESS
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381296
          Files :

          • /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java
          • /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2748 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2748/ ) HDFS-3054 . distcp -skipcrccheck has no effect. Contributed by Colin Patrick McCabe. (Revision 1381296) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381296 Files : /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #2685 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2685/)
          HDFS-3054. distcp -skipcrccheck has no effect. Contributed by Colin Patrick McCabe. (Revision 1381296)

          Result = SUCCESS
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381296
          Files :

          • /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java
          • /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2685 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2685/ ) HDFS-3054 . distcp -skipcrccheck has no effect. Contributed by Colin Patrick McCabe. (Revision 1381296) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381296 Files : /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java /hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java
          Hide
          Todd Lipcon added a comment -

          Committed to branch-2 and trunk, thanks Colin.

          Show
          Todd Lipcon added a comment - Committed to branch-2 and trunk, thanks Colin.
          Hide
          Todd Lipcon added a comment -

          +1, will commit momentarily.

          Show
          Todd Lipcon added a comment - +1, will commit momentarily.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12543879/HDFS-3054.004.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-tools/hadoop-distcp.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3149//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3149//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12543879/HDFS-3054.004.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-tools/hadoop-distcp. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3149//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3149//console This message is automatically generated.
          Hide
          Colin Patrick McCabe added a comment -

          Fix JavaDoc, whitespace issues

          Show
          Colin Patrick McCabe added a comment - Fix JavaDoc, whitespace issues
          Hide
          Todd Lipcon added a comment -

          style nits:

          • missing space after ',' in RetriableFileCopyCommand's constructor definition, and its usage
          • please reorder the '@param' in the javadoc to match the order of arguments

          otherwise +1

          Show
          Todd Lipcon added a comment - style nits: missing space after ',' in RetriableFileCopyCommand's constructor definition, and its usage please reorder the '@param' in the javadoc to match the order of arguments otherwise +1
          Hide
          Colin Patrick McCabe added a comment -

          testing done:

          I confirmed that copying files from a cluster running branch-1 derived code to a cluster running branch-2 derived code did not work unless -skipcrccheck was supplied.

          The exception was this:

          Error: java.io.IOException: File copy failed: hftp://172.22.1.204:6001/a/xxxxxx --> hdfs://localhost:6000/b/a/xxxxxx
                  at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:267)
                  at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:229)
                  at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
                  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
                  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:726)
                  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)
                  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:153)
                  at java.security.AccessController.doPrivileged(Native Method)
                  at javax.security.auth.Subject.doAs(Subject.java:396)
                  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367)
                  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:148)
          Caused by: java.io.IOException: Couldn't run retriable-command: Copying hftp://172.22.1.204:6001/a/xxxxxx to hdfs://localhost:6000/b/a/xxxxxx
                  at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
                  at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:263)
                  ... 10 more
          Caused by: java.io.IOException: Check-sum mismatch between hftp://172.22.1.204:6001/a/xxxxxx and hdfs://localhost:6000/b/.distcp.tmp.attempt_1346456743556_0010_m_000001_0
                  at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.compareCheckSums(RetriableFileCopyCommand.java:145)
                  at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:107)
                  at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:83)
                  at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)
                  ... 11 more
          

          With -skipcrccheck, this problem did not occur.

          Show
          Colin Patrick McCabe added a comment - testing done: I confirmed that copying files from a cluster running branch-1 derived code to a cluster running branch-2 derived code did not work unless -skipcrccheck was supplied. The exception was this: Error: java.io.IOException: File copy failed: hftp: //172.22.1.204:6001/a/xxxxxx --> hdfs://localhost:6000/b/a/xxxxxx at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:267) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:229) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:726) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:153) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:148) Caused by: java.io.IOException: Couldn't run retriable-command: Copying hftp: //172.22.1.204:6001/a/xxxxxx to hdfs://localhost:6000/b/a/xxxxxx at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101) at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:263) ... 10 more Caused by: java.io.IOException: Check-sum mismatch between hftp: //172.22.1.204:6001/a/xxxxxx and hdfs://localhost:6000/b/.distcp.tmp.attempt_1346456743556_0010_m_000001_0 at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.compareCheckSums(RetriableFileCopyCommand.java:145) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:107) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:83) at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87) ... 11 more With -skipcrccheck , this problem did not occur.
          Hide
          Colin Patrick McCabe added a comment -

          I confirmed through manual testing that -skipcrccheck does indeed cause the crc checking paths to be bypassed.

          However, I found this in this code, in DistCpUtils#checksumsAreEquals:

              try {
                sourceChecksum = sourceFS.getFileChecksum(source);
                targetChecksum = targetFS.getFileChecksum(target);
              } catch (IOException e) {
                LOG.error("Unable to retrieve checksum for " + source + " or " + target, e);
              }
          

          I think this should be a fatal error for the distcp operation unless -skipcrccheck is set. Silently ignoring checksums if we can't find them doesn't seem like a good behavior. Perhaps we should open a different JIRA for that, though...

          Show
          Colin Patrick McCabe added a comment - I confirmed through manual testing that -skipcrccheck does indeed cause the crc checking paths to be bypassed. However, I found this in this code, in DistCpUtils#checksumsAreEquals : try { sourceChecksum = sourceFS.getFileChecksum(source); targetChecksum = targetFS.getFileChecksum(target); } catch (IOException e) { LOG.error( "Unable to retrieve checksum for " + source + " or " + target, e); } I think this should be a fatal error for the distcp operation unless -skipcrccheck is set. Silently ignoring checksums if we can't find them doesn't seem like a good behavior. Perhaps we should open a different JIRA for that, though...
          Hide
          Colin Patrick McCabe added a comment -

          Looks like the entirety of TestDistCp is @ignored right now. Let's fix that in a separate JIRA.

          Show
          Colin Patrick McCabe added a comment - Looks like the entirety of TestDistCp is @ignored right now. Let's fix that in a separate JIRA.
          Hide
          Rahul Jain added a comment -

          Hi Colin,

          Thanks for taking this forward. I am ok with your comments on the unit testing idea; this was a case of missing functionality rather than a bug, skipping CRC check by itself is a low value feature; just that it is needed when working with different variants of hadoop or other filesystems.

          Show
          Rahul Jain added a comment - Hi Colin, Thanks for taking this forward. I am ok with your comments on the unit testing idea; this was a case of missing functionality rather than a bug, skipping CRC check by itself is a low value feature; just that it is needed when working with different variants of hadoop or other filesystems.
          Hide
          Colin Patrick McCabe added a comment -

          How about just corrupting the block files manually itself ala TestFSInputChecker?

          Probably best to add a @VisibleForTesting method in MiniDFSCluster that corrupts the block. MiniDFSCluster is part of HDFS, this isn't.

          Show
          Colin Patrick McCabe added a comment - How about just corrupting the block files manually itself ala TestFSInputChecker? Probably best to add a @VisibleForTesting method in MiniDFSCluster that corrupts the block. MiniDFSCluster is part of HDFS, this isn't.
          Hide
          Eli Collins added a comment -

          How about just corrupting the block files manually itself ala TestFSInputChecker?

          Also, let's manually test that we can distcp from a v1 cluster to a v2 cluster with this patch (using skipcrc since HADOOP-8060 is not yet ready).

          Show
          Eli Collins added a comment - How about just corrupting the block files manually itself ala TestFSInputChecker? Also, let's manually test that we can distcp from a v1 cluster to a v2 cluster with this patch (using skipcrc since HADOOP-8060 is not yet ready).
          Hide
          Colin Patrick McCabe added a comment -

          Hi Rahul,
          This patch looks good to me overall. I created a rebased version of it that incorporates changes from trunk.

          I wish we could have a unit test for this. It's a little difficult to create a good one without getting kind of uncomfortably coupled to the HDFS code (after all this is in tools). We would probably have to create an API to corrupt file checksums in HDFS, and use that. Then we could be sure that distcp -skipcrccheck was not consulting the CRC. I'm not sure if this is worth the extra work, though... thoughts?

          Show
          Colin Patrick McCabe added a comment - Hi Rahul, This patch looks good to me overall. I created a rebased version of it that incorporates changes from trunk. I wish we could have a unit test for this. It's a little difficult to create a good one without getting kind of uncomfortably coupled to the HDFS code (after all this is in tools). We would probably have to create an API to corrupt file checksums in HDFS, and use that. Then we could be sure that distcp -skipcrccheck was not consulting the CRC. I'm not sure if this is worth the extra work, though... thoughts?
          Hide
          Colin Patrick McCabe added a comment -
          • rebase on trunk
          Show
          Colin Patrick McCabe added a comment - rebase on trunk
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12540262/hdfs-3054.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 javac. The patch appears to cause the build to fail.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2985//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12540262/hdfs-3054.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javac. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2985//console This message is automatically generated.
          Hide
          Rahul Jain added a comment -

          I've applied the attached patch on hadoop trunk as well as ported to hadoop 2.0.0-alpha to be able to do distcp between current and older 0.20.x versions of hadoop.

          Show
          Rahul Jain added a comment - I've applied the attached patch on hadoop trunk as well as ported to hadoop 2.0.0-alpha to be able to do distcp between current and older 0.20.x versions of hadoop.

            People

            • Assignee:
              Colin Patrick McCabe
              Reporter:
              patrick white
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development