[HDFS-4070] DFSClient ignores bufferSize argument & always performs small writes - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: 1.0.3, 2.0.3-alpha
Fix Version/s: None
Component/s: hdfs-client
Labels:
- optimization
- performance
Environment:

RHEL 5.5 x86_64 (ec2)

Release Note:
Allow write packet sizes to be configurable for DFS input streams

Description

The following code illustrates the issue at hand

 protected void map(LongWritable offset, Text value, Context context) 
    	        throws IOException, InterruptedException {
			OutputStream out = fs.create(new Path("/tmp/benchmark/",value.toString()), true, 1024*1024); 
			int i;
			for(i = 0; i < 1024*1024; i++) {
				out.write(buffer, 0, 1024);
			}
			out.close();
			context.write(value, new IntWritable(i));
    	}

This code is run as a single map-only task with an input file on disk and map-output to disk.

# su - hdfs -c 'hadoop jar /tmp/dfs-test-1.0-SNAPSHOT-job.jar file:///tmp/list file:///grid/0/hadoop/hdfs/tmp/benchmark'

In the data node disk access patterns, the following consistent pattern was observed irrespective of bufferSize provided.

21119 read(58,  <unfinished ...>
21119 <... read resumed> "\0\1\0\0\0\0\0\0\0034\212\0\0\0\0\0\0\0+\220\0\0\0\376\0\262\252ux\262\252u"..., 65557) = 65557
21119 lseek(107, 0, SEEK_CUR <unfinished ...>
21119 <... lseek resumed> )             = 53774848
21119 write(107, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65024 <unfinished ...>
21119 <... write resumed> )             = 65024
21119 write(108, "\262\252ux\262\252ux\262\252ux\262\252ux\262\252ux\262\252ux\262\252ux\262\252ux"..., 508 <unfinished ...>
21119 <... write resumed> )             = 508

Here fd 58 is the incoming socket, 107 is the blk file and 108 is the .meta file.

The DFS packet size ignores the bufferSize argument and suffers from suboptimal syscall & disk performance because of the default 64kb value, as is obvious from the interrupted read/write operations.

Changing the packet size to a more optimal 1056405 bytes results in a decent spike in performance, by cutting down on disk & network iops.

Average time (milliseconds) for a 10 GB write as 10 files in a single map task

timestamp	65536	1056252
1350469614	88530	78662
1350469827	88610	81680
1350470042	92632	78277
1350470261	89726	79225
1350470476	92272	78265
1350470696	89646	81352
1350470913	92311	77281
1350471132	89632	77601
1350471345	89302	81530
1350471564	91844	80413

That is by average an increase from ~115 MB/s to ~130 MB/s, by modifying the global packet size setting.

This suggests that there is value in adapting the user provided buffer sizes to hadoop packet sizing, per stream.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

gistfe319436b880026cbad4-aad495d50e0d6b538831327752b984e0fdcc74db.tar.gz
18/Oct/12 11:39
3 kB
Gopal Vijayaraghavan
HDFS-4090-dfs+packetsize.patch
18/Oct/12 12:10
3 kB
Gopal Vijayaraghavan

Issue Links

is part of

HDFS-4225 Improve HDFS write performance

Open

Activity

People

Assignee:: Unassigned

Reporter:: Gopal Vijayaraghavan

Votes:: 0 Vote for this issue

Watchers:: 22 Start watching this issue

Dates

Created:: 17/Oct/12 11:07

Updated:: 26/Nov/12 18:17