Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.14.0
-
None
-
None
-
Incompatible change, Reviewed
-
Reduced buffer copies as data is written to HDFS. The order of sending data bytes and control information has changed, but this will not be observed by client applications.
Description
HADOOP-1649 adds extra buffering to improve write performance. The following diagram shows buffers as pointed by (numbers). Each eatra buffer adds an extra copy since most of our read()/write()s match the io.bytes.per.checksum, which is much smaller than buffer size.
(1) (2) (3) (5) +---||----[ CLIENT ]---||----<>-----||---[ DATANODE ]---||--<>-> to Mirror | (buffer) (socket) | (4) | +--||--+ ===== | ===== ===== (disk) =====
Currently loops that read and write block data, handle one checksum chunk at a time. By reading multiple chunks at a time, we can remove buffers (1), (2), (3), and (5).
Similarly some copies can be reduced when clients read data from the DFS.
Attachments
Attachments
Issue Links
- depends upon
-
HADOOP-2758 Reduce memory copies when data is read from DFS
- Closed
- relates to
-
HADOOP-2154 Non-interleaved checksums would optimize block transfers.
- Resolved