While running microbenchmarks for HDFS write codepath, a significant part of the CPU fraction was consumed by the DataChecksum.update().
The attached patch converts the static arrays in CRC32 into a single linear array for a performance boost in the inner loop.
milli-seconds for 1Gig (16400 loop over a 64kb chunk)
The performance improvement on x86 is rather larger than the 64bit case, due to the extra register/stack pressure caused by the static arrays.
A closer analysis of the PureJavaCrc32 JIT code shows the following assembly fragment
Basically, the static variables T8_0 through to T8_7 are being spilled to the stack because of register pressure. The x86_64 case has a lower likelihood of such pessimistic JIT code due to the increased number of registers.