Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-4070

DFSClient ignores bufferSize argument & always performs small writes

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 1.0.3, 2.0.3-alpha
    • None
    • hdfs-client
    • RHEL 5.5 x86_64 (ec2)

    • Allow write packet sizes to be configurable for DFS input streams

    Description

      The following code illustrates the issue at hand

       protected void map(LongWritable offset, Text value, Context context) 
          	        throws IOException, InterruptedException {
      			OutputStream out = fs.create(new Path("/tmp/benchmark/",value.toString()), true, 1024*1024); 
      			int i;
      			for(i = 0; i < 1024*1024; i++) {
      				out.write(buffer, 0, 1024);
      			}
      			out.close();
      			context.write(value, new IntWritable(i));
          	}
      

      This code is run as a single map-only task with an input file on disk and map-output to disk.

      # su - hdfs -c 'hadoop jar /tmp/dfs-test-1.0-SNAPSHOT-job.jar file:///tmp/list file:///grid/0/hadoop/hdfs/tmp/benchmark'

      In the data node disk access patterns, the following consistent pattern was observed irrespective of bufferSize provided.

      21119 read(58,  <unfinished ...>
      21119 <... read resumed> "\0\1\0\0\0\0\0\0\0034\212\0\0\0\0\0\0\0+\220\0\0\0\376\0\262\252ux\262\252u"..., 65557) = 65557
      21119 lseek(107, 0, SEEK_CUR <unfinished ...>
      21119 <... lseek resumed> )             = 53774848
      21119 write(107, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65024 <unfinished ...>
      21119 <... write resumed> )             = 65024
      21119 write(108, "\262\252ux\262\252ux\262\252ux\262\252ux\262\252ux\262\252ux\262\252ux\262\252ux"..., 508 <unfinished ...>
      21119 <... write resumed> )             = 508
      

      Here fd 58 is the incoming socket, 107 is the blk file and 108 is the .meta file.

      The DFS packet size ignores the bufferSize argument and suffers from suboptimal syscall & disk performance because of the default 64kb value, as is obvious from the interrupted read/write operations.

      Changing the packet size to a more optimal 1056405 bytes results in a decent spike in performance, by cutting down on disk & network iops.

      Average time (milliseconds) for a 10 GB write as 10 files in a single map task

      timestamp 65536 1056252
      1350469614 88530 78662
      1350469827 88610 81680
      1350470042 92632 78277
      1350470261 89726 79225
      1350470476 92272 78265
      1350470696 89646 81352
      1350470913 92311 77281
      1350471132 89632 77601
      1350471345 89302 81530
      1350471564 91844 80413

      That is by average an increase from ~115 MB/s to ~130 MB/s, by modifying the global packet size setting.

      This suggests that there is value in adapting the user provided buffer sizes to hadoop packet sizing, per stream.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              gopalv Gopal Vijayaraghavan
              Votes:
              0 Vote for this issue
              Watchers:
              22 Start watching this issue

              Dates

                Created:
                Updated: