Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-10239 Storage Container Reconciliation
  3. HDDS-11077

Optimize checksum calculations in container merkle tree

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Choosing an Implementation

      There are two main places we can get our checksum implementations from:

      • java.util.zip.CRC32[C] which use native code.
      • PureJavaCrc32[C] which has implementations in Ozone, Hadoop, and Apache Commons that are all more or less copied from each other.

      The considerations in choosing an implementation are:

      • CRC32C is a general improvement over CRC32.
      • java.util.zip.CRC32C does not exist until Java 9. Java 8 only has CRC32.
      • java.util.Checksum#update(ByteBuffer) does not exist until Java 9. This is why Ozone has the ChecksumByteBuffer wrapper class.

      Previous work to determine which checksum to use on data in Ozone was done here and here. These links explain the decision to default to java.util.zip.CRC32 in Ozone. They also implement the ability to swap between PureJavaCrc32C and java.util.zip.CRC32C when CRC32C is specified based on the Java version.

      Choosing an update method

      It looks like java.util.Checksum#update(int) only reads the first byte out of the int. This is based on the Java 9 javadoc for CRC32C. Other implementations do not specify whether the whole int is read or not. Since this is a single byte put, I'm not sure this is any better than using a byte buffer/array to either roll the longs into the checksum one by one, or batch the checksum computation on a buffer of all the longs under a tree node.

      Attachments

        Issue Links

          Activity

            People

              ritesh Ritesh Shukla
              erose Ethan Rose
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: