Issue Details (XML | Word | Printable)

Key: HADOOP-3429
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Amareshwari Sriramadasu
Reporter: Devaraj Das
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Increase the buffersize for the streaming parent java process's streams

Created: 21/May/08 01:56 PM   Updated: 08/Jul/09 05:05 PM
Component/s: None
Affects Version/s: None
Fix Version/s: 0.18.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works patch-3429.txt 2008-05-23 09:37 AM Amareshwari Sriramadasu 3 kB

Hadoop Flags: Reviewed
Release Note: Increased the size of the buffer used in the communication between the Java task and the Streaming process to 128KB.
Resolution Date: 03/Jun/08 02:06 PM
Labels:


 Description  « Hide
We saw improved performance when we increased the buffersize for Pipes (HADOOP-1788). In the streaming case, the buffersize is 8K (default for BufferedOutputStream). We should set that to 128k.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Amareshwari Sriramadasu added a comment - 23/May/08 09:37 AM - edited
Here is a patch increasing the buffer size of streaming parent java process's streams.

This shows a significant improvement in maps.
I ran a streaming app which consumes the input, but doesnot output anything. The input size was 1.2GB
The running times of 10 runs of the streaming app with and without the patch are given below.

With Patch Without Patch
2mins, 43sec 6mins, 13sec
2mins, 48sec 7mins, 24sec
2mins, 55sec 6mins, 27sec
3mins, 24sec 8mins, 33sec
2mins, 46sec 7mins, 44sec
2mins, 47sec 5mins, 37sec
2mins, 59sec 5mins, 23sec
2mins, 53sec 5mins, 4sec
3mins, 28sec 5mins, 14sec

Amareshwari Sriramadasu made changes - 23/May/08 09:37 AM
Field Original Value New Value
Attachment patch-3429.txt [ 12382627 ]
Amareshwari Sriramadasu made changes - 23/May/08 09:38 AM
Status Open [ 1 ] Patch Available [ 10002 ]
Amareshwari Sriramadasu added a comment - 23/May/08 09:50 AM
I ran another streaming app doing 'cat' of the input, with input size 640MB.
The running times of 10 runs of the streaming app with and without the patch are given below.
With Patch Without Patch
8mins, 42sec 10mins, 4sec
8mins, 46sec 9mins, 45sec
8mins, 47sec 10mins, 12sec
9mins, 20sec 10mins, 4sec
9mins, 0sec 10mins, 1sec
9mins, 6sec 10mins, 3sec
9mins, 38sec 9mins, 59sec
9mins, 9sec 10mins, 35sec
9mins, 5sec 10mins, 20sec
9mins, 23sec 9mins, 48sec

This also shows a significant improvement. The improvement is about 10%.


Amareshwari Sriramadasu made changes - 23/May/08 10:01 AM
Component/s contrib/streaming [ 12310972 ]
Amareshwari Sriramadasu added a comment - 28/May/08 08:40 AM
trying to run hudson again

Amareshwari Sriramadasu made changes - 28/May/08 08:40 AM
Status Patch Available [ 10002 ] Open [ 1 ]
Amareshwari Sriramadasu made changes - 28/May/08 08:41 AM
Status Open [ 1 ] Patch Available [ 10002 ]
Amareshwari Sriramadasu added a comment - 02/Jun/08 04:29 AM
trying to run hudson again

Amareshwari Sriramadasu made changes - 02/Jun/08 04:29 AM
Status Patch Available [ 10002 ] Open [ 1 ]
Amareshwari Sriramadasu made changes - 02/Jun/08 04:29 AM
Status Open [ 1 ] Patch Available [ 10002 ]
Amareshwari Sriramadasu added a comment - 03/Jun/08 04:29 AM
trying to queue up the patch for hudson test again...

Amareshwari Sriramadasu made changes - 03/Jun/08 04:29 AM
Status Patch Available [ 10002 ] Open [ 1 ]
Amareshwari Sriramadasu made changes - 03/Jun/08 04:29 AM
Status Open [ 1 ] Patch Available [ 10002 ]
Hadoop QA added a comment - 03/Jun/08 11:23 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12382627/patch-3429.txt
against trunk revision 662667.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

-1 core tests. The patch failed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2545/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2545/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2545/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2545/console

This message is automatically generated.


Repository Revision Date User Message
ASF #662805 Tue Jun 03 14:04:47 UTC 2008 ddas HADOOP-3429. Increases the size of the buffers used for the communication for Streaming jobs. Contributed by Amareshwari Sriramadasu.
Files Changed
MODIFY /hadoop/core/trunk/CHANGES.txt
MODIFY /hadoop/core/trunk/src/contrib/streaming/src/java/org/apache/hadoop/streaming/PipeReducer.java
MODIFY /hadoop/core/trunk/src/contrib/streaming/src/java/org/apache/hadoop/streaming/PipeMapRed.java
MODIFY /hadoop/core/trunk/src/contrib/streaming/src/java/org/apache/hadoop/streaming/PipeMapper.java

Devaraj Das added a comment - 03/Jun/08 02:06 PM
I just committed this. Thanks, Amareshwari!

Devaraj Das made changes - 03/Jun/08 02:06 PM
Status Patch Available [ 10002 ] Resolved [ 5 ]
Resolution Fixed [ 1 ]
Hadoop Flags [Reviewed]
Release Note Increases the size of the buffersize used in the communication between the Java task and the Streaming process to 128K. Gives performance improvements.
Robert Chansler made changes - 30/Jun/08 10:02 PM
Release Note Increases the size of the buffersize used in the communication between the Java task and the Streaming process to 128K. Gives performance improvements. Increased the size of the buffer used in the communication between the Java task and the Streaming process to 128KB.
Nigel Daley made changes - 22/Aug/08 07:50 PM
Status Resolved [ 5 ] Closed [ 6 ]
Owen O'Malley made changes - 08/Jul/09 05:05 PM
Component/s contrib/streaming [ 12310972 ]