Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.0.3-alpha
-
Ubuntu 10.10 i386
-
Speed up Crc32 by improving the cache hit-ratio of hadoop.util.PureJavaCrc32
Description
While running microbenchmarks for HDFS write codepath, a significant part of the CPU fraction was consumed by the DataChecksum.update().
The attached patch converts the static arrays in CRC32 into a single linear array for a performance boost in the inner loop.
milli-seconds for 1Gig (16400 loop over a 64kb chunk)
platform | original | cache-aware | improvement |
---|---|---|---|
x86 | 3894 | 2304 | 40.83 |
x86_64 | 2131 | 1826 | 14 |
The performance improvement on x86 is rather larger than the 64bit case, due to the extra register/stack pressure caused by the static arrays.
A closer analysis of the PureJavaCrc32 JIT code shows the following assembly fragment
0x40f1e345: mov $0x184,%ecx
0x40f1e34a: mov 0x4415b560(%ecx),%ecx ;*getstatic T8_5
; - PureJavaCrc32::update@95 (line 61)
; {oop('PureJavaCrc32')}
0x40f1e350: mov %ecx,0x2c(%esp)
Basically, the static variables T8_0 through to T8_7 are being spilled to the stack because of register pressure. The x86_64 case has a lower likelihood of such pessimistic JIT code due to the increased number of registers.
Attachments
Attachments
Issue Links
- is related to
-
HADOOP-8971 Backport: hadoop.util.PureJavaCrc32 cache hit-ratio is low for static data (HADOOP-8926)
- Closed