Hadoop Common
  1. Hadoop Common
  2. HADOOP-8648

libhadoop: native CRC32 validation crashes when io.bytes.per.checksum=1

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0-alpha
    • Fix Version/s: 2.0.2-alpha
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The native CRC32 code, found in pipelined_crc32c, crashes when chunksize is set to 1.

      12:27:14,886  INFO NativeCodeLoader:50 - Loaded the native-hadoop library
      #
      # A fatal error has been detected by the Java Runtime Environment:
      #
      #  SIGSEGV (0xb) at pc=0x00007fa00ee5a340, pid=24100, tid=140326058854144
      #
      # JRE version: 6.0_29-b11
      # Java VM: Java HotSpot(TM) 64-Bit Server VM (20.4-b02 mixed mode linux-amd64 compressed oops)
      # Problematic frame:
      # C  [libhadoop.so.1.0.0+0x8340]  pipelined_crc32c+0xa0
      #
      # An error report file with more information is saved as:
      # /h/hs_err_pid24100.log
      #
      # If you would like to submit a bug report, please visit:
      #   http://java.sun.com/webapps/bugreport/crash.jsp
      #
      Aborted
      

      The Java CRC code works fine in this case.

      Choosing blocksize=1 is a _very_ odd choice. It means that we're storing a 4-byte checksum for every byte.

      -rw-r--r--  1 cmccabe users  49398 Aug  3 11:33 blk_4702510289566780538
      -rw-r--r--  1 cmccabe users 197599 Aug  3 11:33 blk_4702510289566780538_1199.meta
      

      However, obviously crashing is never the right thing to do.

      1. HADOOP-8648.005.patch
        13 kB
        Colin Patrick McCabe
      2. HADOOP-8648.004.patch
        13 kB
        Colin Patrick McCabe
      3. HADOOP-8648.003.patch
        13 kB
        Colin Patrick McCabe
      4. HADOOP-8648.002.patch
        7 kB
        Colin Patrick McCabe
      5. HADOOP-8648.001.patch
        7 kB
        Colin Patrick McCabe

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1188 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1188/)
          HADOOP-8648. libhadoop: native CRC32 validation crashes when io.bytes.per.checksum=1. Contributed by Colin Patrick McCabe (Revision 1381419)

          Result = ABORTED
          eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381419
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/pom.xml
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/CMakeLists.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/util/bulk_crc32.c
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/util/bulk_crc32.h
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache/hadoop
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache/hadoop/util
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache/hadoop/util/test_bulk_crc32.c
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1188 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1188/ ) HADOOP-8648 . libhadoop: native CRC32 validation crashes when io.bytes.per.checksum=1. Contributed by Colin Patrick McCabe (Revision 1381419) Result = ABORTED eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381419 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/pom.xml /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/CMakeLists.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/util/bulk_crc32.c /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/util/bulk_crc32.h /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache/hadoop /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache/hadoop/util /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache/hadoop/util/test_bulk_crc32.c
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1157 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1157/)
          HADOOP-8648. libhadoop: native CRC32 validation crashes when io.bytes.per.checksum=1. Contributed by Colin Patrick McCabe (Revision 1381419)

          Result = FAILURE
          eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381419
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/pom.xml
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/CMakeLists.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/util/bulk_crc32.c
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/util/bulk_crc32.h
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache/hadoop
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache/hadoop/util
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache/hadoop/util/test_bulk_crc32.c
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1157 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1157/ ) HADOOP-8648 . libhadoop: native CRC32 validation crashes when io.bytes.per.checksum=1. Contributed by Colin Patrick McCabe (Revision 1381419) Result = FAILURE eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381419 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/pom.xml /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/CMakeLists.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/util/bulk_crc32.c /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/util/bulk_crc32.h /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache/hadoop /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache/hadoop/util /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache/hadoop/util/test_bulk_crc32.c
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #2712 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2712/)
          HADOOP-8648. libhadoop: native CRC32 validation crashes when io.bytes.per.checksum=1. Contributed by Colin Patrick McCabe (Revision 1381419)

          Result = FAILURE
          eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381419
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/pom.xml
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/CMakeLists.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/util/bulk_crc32.c
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/util/bulk_crc32.h
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache/hadoop
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache/hadoop/util
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache/hadoop/util/test_bulk_crc32.c
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2712 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2712/ ) HADOOP-8648 . libhadoop: native CRC32 validation crashes when io.bytes.per.checksum=1. Contributed by Colin Patrick McCabe (Revision 1381419) Result = FAILURE eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381419 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/pom.xml /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/CMakeLists.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/util/bulk_crc32.c /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/util/bulk_crc32.h /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache/hadoop /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache/hadoop/util /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache/hadoop/util/test_bulk_crc32.c
          Hide
          Eli Collins added a comment -

          I've committed this and merged to branch-2. Thanks Colin!

          Show
          Eli Collins added a comment - I've committed this and merged to branch-2. Thanks Colin!
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #2751 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2751/)
          HADOOP-8648. libhadoop: native CRC32 validation crashes when io.bytes.per.checksum=1. Contributed by Colin Patrick McCabe (Revision 1381419)

          Result = SUCCESS
          eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381419
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/pom.xml
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/CMakeLists.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/util/bulk_crc32.c
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/util/bulk_crc32.h
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache/hadoop
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache/hadoop/util
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache/hadoop/util/test_bulk_crc32.c
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2751 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2751/ ) HADOOP-8648 . libhadoop: native CRC32 validation crashes when io.bytes.per.checksum=1. Contributed by Colin Patrick McCabe (Revision 1381419) Result = SUCCESS eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381419 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/pom.xml /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/CMakeLists.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/util/bulk_crc32.c /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/util/bulk_crc32.h /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache/hadoop /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache/hadoop/util /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache/hadoop/util/test_bulk_crc32.c
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #2688 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2688/)
          HADOOP-8648. libhadoop: native CRC32 validation crashes when io.bytes.per.checksum=1. Contributed by Colin Patrick McCabe (Revision 1381419)

          Result = SUCCESS
          eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381419
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/pom.xml
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/CMakeLists.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/util/bulk_crc32.c
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/util/bulk_crc32.h
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache/hadoop
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache/hadoop/util
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache/hadoop/util/test_bulk_crc32.c
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2688 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2688/ ) HADOOP-8648 . libhadoop: native CRC32 validation crashes when io.bytes.per.checksum=1. Contributed by Colin Patrick McCabe (Revision 1381419) Result = SUCCESS eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1381419 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/pom.xml /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/CMakeLists.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/util/bulk_crc32.c /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/util/bulk_crc32.h /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache/hadoop /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache/hadoop/util /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/test/org/apache/hadoop/util/test_bulk_crc32.c
          Hide
          Eli Collins added a comment -

          +1 lgtm

          Show
          Eli Collins added a comment - +1 lgtm
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12543909/HADOOP-8648.005.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1408//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1408//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12543909/HADOOP-8648.005.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1408//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1408//console This message is automatically generated.
          Hide
          Colin Patrick McCabe added a comment -
          • add test_bulk_crc32 execution to the native profile, not the kerberos profile. (Good catch, Eli.)
          Show
          Colin Patrick McCabe added a comment - add test_bulk_crc32 execution to the native profile, not the kerberos profile. (Good catch, Eli.)
          Hide
          Eli Collins added a comment -

          The new native_tests execution should be in the "native" profile not the "ApacheDS KDC server" profile. Otherwise looks great.

          Excellent work fixing and thanks Andy for the detailed explanation.

          Show
          Eli Collins added a comment - The new native_tests execution should be in the "native" profile not the "ApacheDS KDC server" profile. Otherwise looks great. Excellent work fixing and thanks Andy for the detailed explanation.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12543386/HADOOP-8648.004.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1396//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1396//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12543386/HADOOP-8648.004.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1396//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1396//console This message is automatically generated.
          Hide
          Colin Patrick McCabe added a comment -
          • fix comment
          Show
          Colin Patrick McCabe added a comment - fix comment
          Hide
          Andy Isaacson added a comment -

          I've reviewed the patch closely and agree that it's right. The only tiny improvement I'd make is to fix this misleading comment in the 64-bit version of bulk_crc32.c:

          374   int remainder = block_size % sizeof(uint64_t);
          ...
          401       /* Take care of the remainder. They are only up to three bytes,
          402        * so performing byte-level crc32 won't take much time.
          403        */
          404       bdata = (uint8_t*)data;
          405       while (likely(remainder)) {
          

          The comment says "up to three bytes" but since this is a uint64_t at a time, it should say "up to seven bytes". This came from a copy-and-paste from the 32-bit version.

          Ideally we could refactor the 32-bit and 64-bit versions to one using a #define WORD_T uint32_t or similar, but let's do that in a followup jira.

          I've also reviewed the original patch in HADOOP-7446 and confirmed that there weren't any other similar bugs added.

          Note that the erroneous asm is only hit if the HDFS blocksize is not a multiple of the wordsize, which AFAICS can only happen for blocksize<7. And, the bug lurked because the remainder codepath didn't have any tests, so thanks for adding those.

          Overall, LGTM. Fix the comment if you choose, else ship it.

          Show
          Andy Isaacson added a comment - I've reviewed the patch closely and agree that it's right. The only tiny improvement I'd make is to fix this misleading comment in the 64-bit version of bulk_crc32.c : 374 int remainder = block_size % sizeof(uint64_t); ... 401 /* Take care of the remainder. They are only up to three bytes, 402 * so performing byte -level crc32 won't take much time. 403 */ 404 bdata = (uint8_t*)data; 405 while (likely(remainder)) { The comment says "up to three bytes" but since this is a uint64_t at a time, it should say "up to seven bytes". This came from a copy-and-paste from the 32-bit version. Ideally we could refactor the 32-bit and 64-bit versions to one using a #define WORD_T uint32_t or similar, but let's do that in a followup jira. I've also reviewed the original patch in HADOOP-7446 and confirmed that there weren't any other similar bugs added. Note that the erroneous asm is only hit if the HDFS blocksize is not a multiple of the wordsize, which AFAICS can only happen for blocksize<7. And, the bug lurked because the remainder codepath didn't have any tests, so thanks for adding those. Overall, LGTM. Fix the comment if you choose, else ship it.
          Hide
          Andy Isaacson added a comment -

          I didn't understand why this code was wrong before, so I looked into it in more depth and I agree with Colin's analysis and patch. In the interest of making this easier for others to understand, here are a few references.

          http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html explains the GCC inline assembly syntax, and in particular how the asm("some assembly" : inputconstraints : outputconstraints : clobbers) syntax is parsed, and how the constraints map to the %n in the assembly string.

          http://asm.sourceforge.net/articles/rmiyagi-inline-asm.txt describes the x86 indexed addressing modes, in particular explaining how (%5,%4,1) is interpreted as "the word of memory at %5 + 1 * %4".

          http://softwarecommunity.intel.com/userfiles/en-us/d9156103.pdf describes the details of the SSE4 CRC32 instruction in mind-numbing detail, but that's not especially relevant to this bug. All we need to know is that crc32size operates on 8, 32, or 64 bits depending on size, and its first argument is read-only while its second argument is used as an accumulator (read, modify, write).

          Finally, the comments in bulk_crc32.c are very helpful. Critically, the pipelined_crc32c routine optimizes by computing the CRC of up to 3 blocks in parallel. The block size is passed in to pipelined_crc32c as block_size. As we can see by looking at one of the other asm blocks in pipelined_crc32c, the core idea is that we maintain bdata as a pointer to the word being CRCed in the first block, and then use indexed addressing to compute the appropriate address for the word being CRCed in the second (and possibly third) blocks.

          With all that under our belt, the bug in this code becomes clear:

                  "crc32b (%5), %0;\n\t"
                  "crc32b (%5,%4,1), %1;\n\t"
                   : "=r"(c1), "=r"(c2)
                   : "r"(c1), "r"(c2), "r"(c3), "r"(block_size), "r"(bdata)
          

          The first crc32b instruction dereferences %5 which is block_size, but comparing to any other example of the similar asm block such as:

                  "crc32q (%7), %0;\n\t"
                  "crc32q (%7,%6,1), %1;\n\t"
                  "crc32q (%7,%6,2), %2;\n\t"
                   : "=r"(c1), "=r"(c2), "=r"(c3)
                   : "r"(c1), "r"(c2), "r"(c3), "r"(block_size), "r"(data)
          

          it should be dereferencing bdata. And this is caused because the output constraints list includes c3 even though the input constraints list does not, also different from all other examples of the asm block.

          Therefore, Colin's fix to remove c3 from the list causes the %4 and %5 references to refer to their intended operands block_size and bdata respectively.

          Show
          Andy Isaacson added a comment - I didn't understand why this code was wrong before, so I looked into it in more depth and I agree with Colin's analysis and patch. In the interest of making this easier for others to understand, here are a few references. http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html explains the GCC inline assembly syntax, and in particular how the asm("some assembly" : inputconstraints : outputconstraints : clobbers) syntax is parsed, and how the constraints map to the %n in the assembly string. http://asm.sourceforge.net/articles/rmiyagi-inline-asm.txt describes the x86 indexed addressing modes, in particular explaining how (%5,%4,1) is interpreted as "the word of memory at %5 + 1 * %4 ". http://softwarecommunity.intel.com/userfiles/en-us/d9156103.pdf describes the details of the SSE4 CRC32 instruction in mind-numbing detail, but that's not especially relevant to this bug. All we need to know is that crc32 size operates on 8, 32, or 64 bits depending on size , and its first argument is read-only while its second argument is used as an accumulator (read, modify, write). Finally, the comments in bulk_crc32.c are very helpful. Critically, the pipelined_crc32c routine optimizes by computing the CRC of up to 3 blocks in parallel. The block size is passed in to pipelined_crc32c as block_size . As we can see by looking at one of the other asm blocks in pipelined_crc32c, the core idea is that we maintain bdata as a pointer to the word being CRCed in the first block, and then use indexed addressing to compute the appropriate address for the word being CRCed in the second (and possibly third) blocks. With all that under our belt, the bug in this code becomes clear: "crc32b (%5), %0;\n\t" "crc32b (%5,%4,1), %1;\n\t" : "=r" (c1), "=r" (c2) : "r" (c1), "r" (c2), "r" (c3), "r" (block_size), "r" (bdata) The first crc32b instruction dereferences %5 which is block_size , but comparing to any other example of the similar asm block such as: "crc32q (%7), %0;\n\t" "crc32q (%7,%6,1), %1;\n\t" "crc32q (%7,%6,2), %2;\n\t" : "=r" (c1), "=r" (c2), "=r" (c3) : "r" (c1), "r" (c2), "r" (c3), "r" (block_size), "r" (data) it should be dereferencing bdata . And this is caused because the output constraints list includes c3 even though the input constraints list does not, also different from all other examples of the asm block. Therefore, Colin's fix to remove c3 from the list causes the %4 and %5 references to refer to their intended operands block_size and bdata respectively.
          Hide
          Colin Patrick McCabe added a comment -

          I should say "c3 is possibly uninitialized at this point".

          Show
          Colin Patrick McCabe added a comment - I should say "c3 is possibly uninitialized at this point".
          Hide
          Colin Patrick McCabe added a comment -

          The bug in the inline assembly is that there are extra parameters. You can see this clearly in the patch:

          @@ -433,7 +456,7 @@ static void pipelined_crc32c(uint32_t *crc1, uint32_t *crc2, uint32_t *crc3, con
                   "crc32b (%5), %0;\n\t"
                   "crc32b (%5,%4,1), %1;\n\t"
                    : "=r"(c1), "=r"(c2) 
          -         : "r"(c1), "r"(c2), "r"(c3), "r"(block_size), "r"(bdata)
          +         : "r"(c1), "r"(c2), "r"(block_size), "r"(bdata)
                   );
                   bdata++;
                   remainder--;
          

          You can see that it doesn't make sense for the assembly to have 7 parameters, because only 6 are actually used. And indeed, the fact that 'c3' is inserted in the parameter list is the bug. Another thing to keep in mind is that c3 is actually uninitialized at this point-- another clue that we should not be using it.

          Incidentally, this patch unconditionally initializes c3 to 0xffffffff just to avoid heisenbugs in the future.

          Show
          Colin Patrick McCabe added a comment - The bug in the inline assembly is that there are extra parameters. You can see this clearly in the patch: @@ -433,7 +456,7 @@ static void pipelined_crc32c(uint32_t *crc1, uint32_t *crc2, uint32_t *crc3, con "crc32b (%5), %0;\n\t" "crc32b (%5,%4,1), %1;\n\t" : "=r" (c1), "=r" (c2) - : "r" (c1), "r" (c2), "r" (c3), "r" (block_size), "r" (bdata) + : "r" (c1), "r" (c2), "r" (block_size), "r" (bdata) ); bdata++; remainder--; You can see that it doesn't make sense for the assembly to have 7 parameters, because only 6 are actually used. And indeed, the fact that 'c3' is inserted in the parameter list is the bug. Another thing to keep in mind is that c3 is actually uninitialized at this point-- another clue that we should not be using it. Incidentally, this patch unconditionally initializes c3 to 0xffffffff just to avoid heisenbugs in the future.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12540604/HADOOP-8648.003.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common:

          org.apache.hadoop.ha.TestZKFailoverController

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1280//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1280//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12540604/HADOOP-8648.003.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.ha.TestZKFailoverController +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1280//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1280//console This message is automatically generated.
          Hide
          Colin Patrick McCabe added a comment -

          Is that true? I don't think we always fit this requirement. I thought the native code worked even with unaligned pointers.

          I checked out the Intel reference manual, and you're correct. Also, it seems that unaligned instructions have a smaller penalty on the new i7 CPUs, according to reports online. So I got rid of the alignment check.

          The new patch has a different approach which fixes the inline assembly rather than disallowing it when there were remainder bytes.

          Show
          Colin Patrick McCabe added a comment - Is that true? I don't think we always fit this requirement. I thought the native code worked even with unaligned pointers. I checked out the Intel reference manual, and you're correct. Also, it seems that unaligned instructions have a smaller penalty on the new i7 CPUs, according to reports online. So I got rid of the alignment check. The new patch has a different approach which fixes the inline assembly rather than disallowing it when there were remainder bytes.
          Hide
          Colin Patrick McCabe added a comment -
          • pipelined_crc32c: fix bug in inline assembly for "remainder" bytes
          • add test_bulk_crc32 unit test
          • bulk_crc32.c: use constant rather than function to initialize crc32
          • add bulk_calculate_crc function for unit testing purposes
          • always initialize crc1, crc2, crc3 (not a bug, but using uninitialized values
            is confusing)
          Show
          Colin Patrick McCabe added a comment - pipelined_crc32c: fix bug in inline assembly for "remainder" bytes add test_bulk_crc32 unit test bulk_crc32.c: use constant rather than function to initialize crc32 add bulk_calculate_crc function for unit testing purposes always initialize crc1, crc2, crc3 (not a bug, but using uninitialized values is confusing)
          Hide
          Todd Lipcon added a comment -
          +  if (unlikely((uintptr_t)(sums_addr + sums_offset) & 0x3)) {
          +    char buf[256];
          +    snprintf(buf, sizeof(buf), "sums_addr + sums_offset must be aligned "
          +             "to a 4-byte boundary!  sums_addr = %p, sums_offset = %d.",
          +             sums_addr, sums_offset);
          +    THROW(env, "java/lang/IllegalArgumentException", buf);
          +    return;
          +  }
          

          Is that true? I don't think we always fit this requirement. I thought the native code worked even with unaligned pointers.


          • Can you add a direct unit test to TestDataChecksum for these cases? I don't think testing this from HDFS is necessarily the best way when you could trigger the condition explicitly in the checksum code.

          I'm also a little confused by the changes in the C code. eg:

          - *   block_size : The size of each block in bytes.
          + *   block_size : The size of each block in bytes.  Must be >= 8
          

          Why is that the case? It seems like the code path should still work fine there, just setting counter = 0 and skipping the initial while loop.

          Show
          Todd Lipcon added a comment - + if (unlikely((uintptr_t)(sums_addr + sums_offset) & 0x3)) { + char buf[256]; + snprintf(buf, sizeof(buf), "sums_addr + sums_offset must be aligned " + "to a 4- byte boundary! sums_addr = %p, sums_offset = %d." , + sums_addr, sums_offset); + THROW(env, "java/lang/IllegalArgumentException" , buf); + return ; + } Is that true? I don't think we always fit this requirement. I thought the native code worked even with unaligned pointers. Can you add a direct unit test to TestDataChecksum for these cases? I don't think testing this from HDFS is necessarily the best way when you could trigger the condition explicitly in the checksum code. I'm also a little confused by the changes in the C code. eg: - * block_size : The size of each block in bytes. + * block_size : The size of each block in bytes. Must be >= 8 Why is that the case? It seems like the code path should still work fine there, just setting counter = 0 and skipping the initial while loop.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12539101/HADOOP-8648.002.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.util.TestDataChecksum
          org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1248//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1248//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12539101/HADOOP-8648.002.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.util.TestDataChecksum org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1248//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1248//console This message is automatically generated.
          Hide
          Colin Patrick McCabe added a comment -

          In the test, re-create the dfsclient each time, to make sure the new settings take effect. The value of DFS_BYTES_PER_CHECKSUM_KEY is cached in DfsClient.

          Show
          Colin Patrick McCabe added a comment - In the test, re-create the dfsclient each time, to make sure the new settings take effect. The value of DFS_BYTES_PER_CHECKSUM_KEY is cached in DfsClient.
          Hide
          Colin Patrick McCabe added a comment -

          note: I tested this patch with libhadoop.so copied to hadoop-hdfs-project/hadoop-hdfs/target/native/target/usr/local/lib/ to work around bug HDFS-3753

          Show
          Colin Patrick McCabe added a comment - note: I tested this patch with libhadoop.so copied to hadoop-hdfs-project/hadoop-hdfs/target/native/target/usr/local/lib/ to work around bug HDFS-3753
          Hide
          Colin Patrick McCabe added a comment -

          yeah, I meant to write chunk size (io.bytes.per.checksum).

          Show
          Colin Patrick McCabe added a comment - yeah, I meant to write chunk size (io.bytes.per.checksum).
          Hide
          Todd Lipcon added a comment -

          you mean checksum chunk size, right? not blocksize.

          Show
          Todd Lipcon added a comment - you mean checksum chunk size, right? not blocksize.

            People

            • Assignee:
              Colin Patrick McCabe
              Reporter:
              Colin Patrick McCabe
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development