Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-11927

Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to CRC32C)

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.2.0, 2.0.0
    • Component/s: Performance
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Checksumming is cpu intensive. HBase computes additional checksums for HFiles (hdfs does checksums too) and stores them inline with file data. During reading, these checksums are verified to ensure data is not corrupted. This patch tries to use Hadoop Native Library for checksum computation, if it’s available, otherwise falls back to standard Java libraries. Instructions to load NHL in HBase can be found here (http://hbase.apache.org/book.html#hadoop.native.lib).

      Default checksum algorithm has been changed from CRC32 to CRC32C primarily because of two reasons: 1) CRC32C has better error detection properties, and 2) New Intel processors have a dedicated instruction for crc32c computation (SSE4.2 instruction set)*. This change is fully backward compatible. Also, users should not see any differences except decrease in cpu usage. To keep old settings, set configuration ‘hbase.hstore.checksum.algorithm’ to ‘CRC32’.

      * On linux, run 'cat /proc/cpuinfo’ and look for sse4_2 in list of flags to see if your processor supports SSE4.2.
      Show
      Checksumming is cpu intensive. HBase computes additional checksums for HFiles (hdfs does checksums too) and stores them inline with file data. During reading, these checksums are verified to ensure data is not corrupted. This patch tries to use Hadoop Native Library for checksum computation, if it’s available, otherwise falls back to standard Java libraries. Instructions to load NHL in HBase can be found here ( http://hbase.apache.org/book.html#hadoop.native.lib ). Default checksum algorithm has been changed from CRC32 to CRC32C primarily because of two reasons: 1) CRC32C has better error detection properties, and 2) New Intel processors have a dedicated instruction for crc32c computation (SSE4.2 instruction set)*. This change is fully backward compatible. Also, users should not see any differences except decrease in cpu usage. To keep old settings, set configuration ‘hbase.hstore.checksum.algorithm’ to ‘CRC32’. * On linux, run 'cat /proc/cpuinfo’ and look for sse4_2 in list of flags to see if your processor supports SSE4.2.

      Description

      Up in hadoop they have this change. Let me publish some graphs to show that it makes a difference (CRC is a massive amount of our CPU usage in my profiling of an upload because of compacting, flushing, etc.). We should also make use of native CRCings – especially the 2.6 HDFS-6865 and ilk – in hbase but that is another issue for now.

        Attachments

        1. HBASE-11927-v8.patch
          23 kB
          Apekshit Sharma
        2. HBASE-11927-v8.patch
          23 kB
          Michael Stack
        3. HBASE-11927-v7.patch
          22 kB
          Apekshit Sharma
        4. HBASE-11927-v6.patch
          22 kB
          Apekshit Sharma
        5. HBASE-11927-v5.patch
          22 kB
          Apekshit Sharma
        6. HBASE-11927-v4.patch
          19 kB
          Apekshit Sharma
        7. HBASE-11927-v2.patch
          11 kB
          Apekshit Sharma
        8. HBASE-11927-v1.patch
          11 kB
          Apekshit Sharma
        9. HBASE-11927-branch-1.1.patch
          23 kB
          Nick Dimiduk
        10. HBASE-11927.patch
          11 kB
          Apekshit Sharma
        11. crc32ct.svg
          1.17 MB
          Michael Stack
        12. c2021.zip.svg
          1.03 MB
          Michael Stack
        13. c2021.write.2.svg
          641 kB
          Michael Stack
        14. c2021.crc2.svg
          1.41 MB
          Michael Stack
        15. before-randomWrite1M-5%.svg
          3.05 MB
          Apekshit Sharma
        16. before-compact-22%.svg
          1.61 MB
          Apekshit Sharma
        17. after-randomWrite1M-0.5%.svg
          2.94 MB
          Apekshit Sharma
        18. after-compact-2%.svg
          1.79 MB
          Apekshit Sharma

          Issue Links

            Activity

              People

              • Assignee:
                appy Apekshit Sharma
                Reporter:
                stack Michael Stack
              • Votes:
                0 Vote for this issue
                Watchers:
                13 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: