Hadoop Common
  1. Hadoop Common
  2. HADOOP-7443

Add CRC32C as another DataChecksum implementation

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.23.0
    • Component/s: io, util
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      CRC32C is another checksum very similar to our existing CRC32, but with a different polynomial. The chief advantage of this other polynomial is that SSE4.2 includes hardware support for its calculation. HDFS-2080 is the umbrella JIRA which proposes using this new polynomial to save substantial amounts of CPU.

      1. hadoop-7443.txt
        36 kB
        Todd Lipcon
      2. hadoop-7443.txt
        36 kB
        Todd Lipcon

        Issue Links

          Activity

          Hide
          Todd Lipcon added a comment -

          Fairly simple patch which adds the new CRC algorithm. The tables are taken from some Intel sample code under the BSD license, as attributed in the LICENSE file in this diff.

          Show
          Todd Lipcon added a comment - Fairly simple patch which adds the new CRC algorithm. The tables are taken from some Intel sample code under the BSD license, as attributed in the LICENSE file in this diff.
          Hide
          Todd Lipcon added a comment -

          this patch applies on top of HADOOP-7444

          Show
          Todd Lipcon added a comment - this patch applies on top of HADOOP-7444
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > ... The tables are taken from some Intel sample code under the BSD license, as attributed in the LICENSE file in this diff.

          Hi Todd, given a polynomial, it is very easy to generate the tables so that we don't have to include the BSD license. Let me generate the tables.

          Show
          Tsz Wo Nicholas Sze added a comment - > ... The tables are taken from some Intel sample code under the BSD license, as attributed in the LICENSE file in this diff. Hi Todd, given a polynomial, it is very easy to generate the tables so that we don't have to include the BSD license. Let me generate the tables.
          Hide
          Todd Lipcon added a comment -

          Thanks Nicholas. I tried to use your table-generating code, but somehow my results were ending up bit-reflected or something

          Show
          Todd Lipcon added a comment - Thanks Nicholas. I tried to use your table-generating code, but somehow my results were ending up bit-reflected or something
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Hi Todd, you actually has updated the table-generating code. You may run it with the (reversed) polynomial, then you will get the CRC32C tables.

          $java -cp build/test/classes/:build/classes/ org.apache.hadoop.util.TestPureJavaCrc32\$Table 82F63B78
          
          Show
          Tsz Wo Nicholas Sze added a comment - Hi Todd, you actually has updated the table-generating code. You may run it with the (reversed) polynomial, then you will get the CRC32C tables. $java -cp build/test/classes/:build/classes/ org.apache.hadoop.util.TestPureJavaCrc32\$Table 82F63B78
          Hide
          Todd Lipcon added a comment -

          Re-generated the table using your code this time. Thanks for giving me the right input.

          Show
          Todd Lipcon added a comment - Re-generated the table using your code this time. Thanks for giving me the right input.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          +1 patch looks good.

          Please test it with the Intel sample code or other implementations.

          Show
          Tsz Wo Nicholas Sze added a comment - +1 patch looks good. Please test it with the Intel sample code or other implementations.
          Hide
          Todd Lipcon added a comment -

          In terms of testing, I've been testing the native implementation of verification (which uses SSE4.2 instructions) against the PureJava checksum calculation, and the tests pass. I'll commit this to trunk.

          Show
          Todd Lipcon added a comment - In terms of testing, I've been testing the native implementation of verification (which uses SSE4.2 instructions) against the PureJava checksum calculation, and the tests pass. I'll commit this to trunk.
          Hide
          Todd Lipcon added a comment -

          (will commit after Hudson +1s, that is)

          Show
          Todd Lipcon added a comment - (will commit after Hudson +1s, that is)
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12486255/hadoop-7443.txt
          against trunk revision 1146111.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 system test framework. The patch passed system test framework compile.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/725//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/725//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/725//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12486255/hadoop-7443.txt against trunk revision 1146111. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/725//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/725//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/725//console This message is automatically generated.
          Hide
          Todd Lipcon added a comment -

          Committed to trunk

          Show
          Todd Lipcon added a comment - Committed to trunk
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #689 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/689/)
          HADOOP-7443. Add CRC32C as another DataChecksum implementation. Contributed by Todd Lipcon.

          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1146300
          Files :

          • /hadoop/common/trunk/common/src/test/core/org/apache/hadoop/util/TestPureJavaCrc32.java
          • /hadoop/common/trunk/common/CHANGES.txt
          • /hadoop/common/trunk/common/src/test/core/org/apache/hadoop/util/TestDataChecksum.java
          • /hadoop/common/trunk/common/src/java/org/apache/hadoop/util/DataChecksum.java
          • /hadoop/common/trunk/common/src/java/org/apache/hadoop/util/PureJavaCrc32C.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #689 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/689/ ) HADOOP-7443 . Add CRC32C as another DataChecksum implementation. Contributed by Todd Lipcon. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1146300 Files : /hadoop/common/trunk/common/src/test/core/org/apache/hadoop/util/TestPureJavaCrc32.java /hadoop/common/trunk/common/CHANGES.txt /hadoop/common/trunk/common/src/test/core/org/apache/hadoop/util/TestDataChecksum.java /hadoop/common/trunk/common/src/java/org/apache/hadoop/util/DataChecksum.java /hadoop/common/trunk/common/src/java/org/apache/hadoop/util/PureJavaCrc32C.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk #746 (See https://builds.apache.org/job/Hadoop-Common-trunk/746/)
          HADOOP-7443. Add CRC32C as another DataChecksum implementation. Contributed by Todd Lipcon.

          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1146300
          Files :

          • /hadoop/common/trunk/common/src/test/core/org/apache/hadoop/util/TestPureJavaCrc32.java
          • /hadoop/common/trunk/common/CHANGES.txt
          • /hadoop/common/trunk/common/src/test/core/org/apache/hadoop/util/TestDataChecksum.java
          • /hadoop/common/trunk/common/src/java/org/apache/hadoop/util/DataChecksum.java
          • /hadoop/common/trunk/common/src/java/org/apache/hadoop/util/PureJavaCrc32C.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk #746 (See https://builds.apache.org/job/Hadoop-Common-trunk/746/ ) HADOOP-7443 . Add CRC32C as another DataChecksum implementation. Contributed by Todd Lipcon. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1146300 Files : /hadoop/common/trunk/common/src/test/core/org/apache/hadoop/util/TestPureJavaCrc32.java /hadoop/common/trunk/common/CHANGES.txt /hadoop/common/trunk/common/src/test/core/org/apache/hadoop/util/TestDataChecksum.java /hadoop/common/trunk/common/src/java/org/apache/hadoop/util/DataChecksum.java /hadoop/common/trunk/common/src/java/org/apache/hadoop/util/PureJavaCrc32C.java

            People

            • Assignee:
              Todd Lipcon
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development