Issue Details (XML | Word | Printable)

Key: HADOOP-1702
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Raghu Angadi
Reporter: Raghu Angadi
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Reduce buffer copies when data is written to DFS

Created: 09/Aug/07 09:43 PM   Updated: 08/Jul/09 04:42 PM
Return to search
Component/s: None
Affects Version/s: 0.14.0
Fix Version/s: 0.18.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works HADOOP-1702.patch 2008-05-12 09:45 PM Raghu Angadi 42 kB
Text File Licensed for inclusion in ASF works HADOOP-1702.patch 2008-05-08 07:31 PM Raghu Angadi 42 kB
Text File Licensed for inclusion in ASF works HADOOP-1702.patch 2008-05-08 07:26 PM Raghu Angadi 41 kB
Text File Licensed for inclusion in ASF works HADOOP-1702.patch 2008-05-07 12:16 AM Raghu Angadi 41 kB
Text File Licensed for inclusion in ASF works HADOOP-1702.patch 2008-04-25 01:05 AM Raghu Angadi 39 kB
Text File Licensed for inclusion in ASF works HADOOP-1702.patch 2008-04-17 03:11 PM Raghu Angadi 39 kB
Text File Licensed for inclusion in ASF works HADOOP-1702.patch 2008-04-17 02:55 PM Raghu Angadi 39 kB
Text File Licensed for inclusion in ASF works HADOOP-1702.patch 2008-04-16 10:20 PM Raghu Angadi 39 kB
Text File Licensed for inclusion in ASF works HADOOP-1702.patch 2008-02-23 01:19 AM Raghu Angadi 34 kB
Issue Links:
Reference
 
dependent
 

Hadoop Flags: Reviewed, Incompatible change
Release Note: Reduced buffer copies as data is written to HDFS. The order of sending data bytes and control information has changed, but this will not be observed by client applications.
Resolution Date: 14/May/08 06:35 AM


 Description  « Hide
HADOOP-1649 adds extra buffering to improve write performance. The following diagram shows buffers as pointed by (numbers). Each eatra buffer adds an extra copy since most of our read()/write()s match the io.bytes.per.checksum, which is much smaller than buffer size.
       (1)                 (2)          (3)                 (5)
   +---||----[ CLIENT ]---||----<>-----||---[ DATANODE ]---||--<>-> to Mirror  
   | (buffer)                  (socket)           |  (4)
   |                                              +--||--+
 =====                                                    |
 =====                                                  =====
 (disk)                                                 =====

Currently loops that read and write block data, handle one checksum chunk at a time. By reading multiple chunks at a time, we can remove buffers (1), (2), (3), and (5).

Similarly some copies can be reduced when clients read data from the DFS.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Repository Revision Date User Message
ASF #656118 Wed May 14 06:32:42 UTC 2008 rangadi HADOOP-1702. Reduce buffer copies when data is written to DFS.
DataNodes take 30% less CPU while writing data. (rangadi)
Files Changed
MODIFY /hadoop/core/trunk/src/java/org/apache/hadoop/dfs/DataNode.java
MODIFY /hadoop/core/trunk/src/java/org/apache/hadoop/dfs/FSConstants.java
MODIFY /hadoop/core/trunk/CHANGES.txt
MODIFY /hadoop/core/trunk/src/java/org/apache/hadoop/dfs/DFSClient.java
MODIFY /hadoop/core/trunk/src/test/org/apache/hadoop/dfs/TestDataTransferProtocol.java