Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Goal
Calculate ChunkBuffer (ByteBuffer) checksum incrementally rather than having to calculating it from scratch every single time in writeChunkToContainer.
Background
Currently ChunkBuffer (ByteBuffer) checksum is always calculated from scratch. As can be seen here in checksum function initialization, which it always calls reset() before feeding any data with update():
private static Function<ByteBuffer, ByteString> newChecksumByteBufferFunction( Supplier<ChecksumByteBuffer> constructor) { final ChecksumByteBuffer algorithm = constructor.get(); return data -> { algorithm.reset(); algorithm.update(data); return int2ByteString((int)algorithm.getValue()); }; }
Each ByteBuffer (4 MB by default) inside a block's ChunkBuffer gets its checksum calculated here:
// Checksum is computed for each bytesPerChecksum number of bytes of data // starting at offset 0. The last checksum might be computed for the // remaining data with length less than bytesPerChecksum. final List<ByteString> checksumList = new ArrayList<>(); for (ByteBuffer b : data.iterate(bytesPerChecksum)) { checksumList.add(computeChecksum(b, function, bytesPerChecksum)); }
which is called from BlockOutputStream#writeChunkToContainer.
And when the function is applied in the inner computeChecksum, it always calls reset() first. So it calculates the whole ByteBuffer from offset 0.
Motivation
While this may not be a big issue before Ozone hsync() is implemented (or in HDFS, where each chunk is much smaller, at 64 KB by default), it can now contribute to ~10% of hsync latency between client-DN if the client is only appending a few bytes between hsyncs, as can be seen from weichiu's flame graph.
Estimated latency improvement is 0%~20% with this change, depending on the client write/hsync pattern.
Attachments
Issue Links
- links to