Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
Choosing an Implementation
There are two main places we can get our checksum implementations from:
- java.util.zip.CRC32[C] which use native code.
- PureJavaCrc32[C] which has implementations in Ozone, Hadoop, and Apache Commons that are all more or less copied from each other.
The considerations in choosing an implementation are:
- CRC32C is a general improvement over CRC32.
- java.util.zip.CRC32C does not exist until Java 9. Java 8 only has CRC32.
- java.util.Checksum#update(ByteBuffer) does not exist until Java 9. This is why Ozone has the ChecksumByteBuffer wrapper class.
Previous work to determine which checksum to use on data in Ozone was done here and here. These links explain the decision to default to java.util.zip.CRC32 in Ozone. They also implement the ability to swap between PureJavaCrc32C and java.util.zip.CRC32C when CRC32C is specified based on the Java version.
Choosing an update method
It looks like java.util.Checksum#update(int) only reads the first byte out of the int. This is based on the Java 9 javadoc for CRC32C. Other implementations do not specify whether the whole int is read or not. Since this is a single byte put, I'm not sure this is any better than using a byte buffer/array to either roll the longs into the checksum one by one, or batch the checksum computation on a buffer of all the longs under a tree node.