Issue Details (XML | Word | Printable)

Key: HADOOP-1702
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Raghu Angadi
Reporter: Raghu Angadi
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Reduce buffer copies when data is written to DFS

Created: 09/Aug/07 09:43 PM   Updated: 08/Jul/09 04:42 PM
Return to search
Component/s: None
Affects Version/s: 0.14.0
Fix Version/s: 0.18.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works HADOOP-1702.patch 2008-05-12 09:45 PM Raghu Angadi 42 kB
Text File Licensed for inclusion in ASF works HADOOP-1702.patch 2008-05-08 07:31 PM Raghu Angadi 42 kB
Text File Licensed for inclusion in ASF works HADOOP-1702.patch 2008-05-08 07:26 PM Raghu Angadi 41 kB
Text File Licensed for inclusion in ASF works HADOOP-1702.patch 2008-05-07 12:16 AM Raghu Angadi 41 kB
Text File Licensed for inclusion in ASF works HADOOP-1702.patch 2008-04-25 01:05 AM Raghu Angadi 39 kB
Text File Licensed for inclusion in ASF works HADOOP-1702.patch 2008-04-17 03:11 PM Raghu Angadi 39 kB
Text File Licensed for inclusion in ASF works HADOOP-1702.patch 2008-04-17 02:55 PM Raghu Angadi 39 kB
Text File Licensed for inclusion in ASF works HADOOP-1702.patch 2008-04-16 10:20 PM Raghu Angadi 39 kB
Text File Licensed for inclusion in ASF works HADOOP-1702.patch 2008-02-23 01:19 AM Raghu Angadi 34 kB
Issue Links:
Reference
 
dependent
 

Hadoop Flags: Reviewed, Incompatible change
Release Note: Reduced buffer copies as data is written to HDFS. The order of sending data bytes and control information has changed, but this will not be observed by client applications.
Resolution Date: 14/May/08 06:35 AM


 Description  « Hide
HADOOP-1649 adds extra buffering to improve write performance. The following diagram shows buffers as pointed by (numbers). Each eatra buffer adds an extra copy since most of our read()/write()s match the io.bytes.per.checksum, which is much smaller than buffer size.
       (1)                 (2)          (3)                 (5)
   +---||----[ CLIENT ]---||----<>-----||---[ DATANODE ]---||--<>-> to Mirror  
   | (buffer)                  (socket)           |  (4)
   |                                              +--||--+
 =====                                                    |
 =====                                                  =====
 (disk)                                                 =====

Currently loops that read and write block data, handle one checksum chunk at a time. By reading multiple chunks at a time, we can remove buffers (1), (2), (3), and (5).

Similarly some copies can be reduced when clients read data from the DFS.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Doug Cutting made changes - 10/Oct/07 09:56 PM
Field Original Value New Value
Fix Version/s 0.15.0 [ 12312565 ]
Konstantin Shvachko made changes - 06/Nov/07 01:55 AM
Link This issue relates to HADOOP-2154 [ HADOOP-2154 ]
Raghu Angadi made changes - 13/Feb/08 11:14 PM
Link This issue depends upon HADOOP-2758 [ HADOOP-2758 ]
Raghu Angadi made changes - 13/Feb/08 11:14 PM
Fix Version/s 0.17.0 [ 12312913 ]
Description
HADOOP-1649 adds extra buffering to improve write performance. The following diagram shows buffers as pointed by (numbers). Each eatra buffer adds an extra copy since most of our read()/write()s match the io.bytes.per.checksum, which is much smaller than buffer size.

{noformat}
       (1) (2) (3) (5)
   +---||----[ CLIENT ]---||----<>-----||---[ DATANODE ]---||--<>-> to Mirror
   | (buffer) (socket) | (4)
   | +--||--+
 ===== |
 ===== =====
 (disk) =====
{noformat}

Currently loops that read and write block data, handle one checksum chunk at a time. By reading multiple chunks at a time, we can remove buffers (1), (2), (3), and (5).

Similarly some copies can be reduced when clients read data from the DFS.
HADOOP-1649 adds extra buffering to improve write performance. The following diagram shows buffers as pointed by (numbers). Each eatra buffer adds an extra copy since most of our read()/write()s match the io.bytes.per.checksum, which is much smaller than buffer size.

{noformat}
       (1) (2) (3) (5)
   +---||----[ CLIENT ]---||----<>-----||---[ DATANODE ]---||--<>-> to Mirror
   | (buffer) (socket) | (4)
   | +--||--+
 ===== |
 ===== =====
 (disk) =====
{noformat}

Currently loops that read and write block data, handle one checksum chunk at a time. By reading multiple chunks at a time, we can remove buffers (1), (2), (3), and (5).

Similarly some copies can be reduced when clients read data from the DFS.
Raghu Angadi made changes - 23/Feb/08 01:19 AM
Attachment HADOOP-1702.patch [ 12376288 ]
Robert Chansler made changes - 25/Mar/08 03:03 AM
Fix Version/s 0.17.0 [ 12312913 ]
Raghu Angadi made changes - 10/Apr/08 11:24 PM
Fix Version/s 0.18.0 [ 12312972 ]
Raghu Angadi made changes - 16/Apr/08 10:20 PM
Attachment HADOOP-1702.patch [ 12380339 ]
Raghu Angadi made changes - 17/Apr/08 02:55 PM
Attachment HADOOP-1702.patch [ 12380397 ]
Raghu Angadi made changes - 17/Apr/08 03:11 PM
Attachment HADOOP-1702.patch [ 12380401 ]
Raghu Angadi made changes - 25/Apr/08 01:05 AM
Attachment HADOOP-1702.patch [ 12380885 ]
Raghu Angadi made changes - 07/May/08 12:16 AM
Attachment HADOOP-1702.patch [ 12381544 ]
Raghu Angadi made changes - 08/May/08 07:26 PM
Attachment HADOOP-1702.patch [ 12381708 ]
Raghu Angadi made changes - 08/May/08 07:26 PM
Hadoop Flags [Reviewed]
Status Open [ 1 ] Patch Available [ 10002 ]
Raghu Angadi made changes - 08/May/08 07:31 PM
Attachment HADOOP-1702.patch [ 12381710 ]
Raghu Angadi made changes - 08/May/08 07:33 PM
Release Note Reduce buffer copies when data is written to DFS. DataNode takes 30% less CPU. As a result, the format of data DFSClient sends changed and is incompatible with previous clients.
Hadoop Flags [Reviewed] [Incompatible change, Reviewed]
Raghu Angadi made changes - 09/May/08 09:32 PM
Status Patch Available [ 10002 ] Open [ 1 ]
Raghu Angadi made changes - 12/May/08 09:45 PM
Attachment HADOOP-1702.patch [ 12381909 ]
Raghu Angadi made changes - 12/May/08 09:45 PM
Hadoop Flags [Reviewed, Incompatible change] [Incompatible change, Reviewed]
Status Open [ 1 ] Patch Available [ 10002 ]
Raghu Angadi made changes - 14/May/08 06:35 AM
Hadoop Flags [Reviewed, Incompatible change] [Incompatible change, Reviewed]
Resolution Fixed [ 1 ]
Status Patch Available [ 10002 ] Resolved [ 5 ]
Robert Chansler made changes - 27/Jun/08 07:54 PM
Release Note Reduce buffer copies when data is written to DFS. DataNode takes 30% less CPU. As a result, the format of data DFSClient sends changed and is incompatible with previous clients. Reduced buffer copies as data is written to HDFS. The order of sending data bytes and control information has changed, but this will not be observed by client applications.
Hadoop Flags [Reviewed, Incompatible change] [Incompatible change, Reviewed]
Nigel Daley made changes - 22/Aug/08 07:50 PM
Status Resolved [ 5 ] Closed [ 6 ]
Owen O'Malley made changes - 08/Jul/09 04:42 PM
Component/s dfs [ 12310710 ]