|
Attached patch splits i/o into 8k chunks. Most RPC server reads and writes are be much smaller than this.
The patch is for 0.18 branch. Thanks Stack. I will check javadoc again, would appreciate any specific fixes there. Updated patch has a small fix : 'ret <= 0' is replaced by 'ret < ioSize'. This avoids extra system call incase of partial i/o (or extra blocking in case of blocking sockets). patch for trunk is attached (essentially renamed).
Another big benefit from this approach is that it avoids many many copies done JDK while writing a large response. E.g. writing a 6MB response might require tens (if not hundreds) of calls to non-blocking write().. and each of these writes copies all the data to be written!. Thanks Stack. I will fix it in the next iteration of the patch.
The patch looks good. Although I did not go into direct buffers implementation details.
My only concern is how do we test that
Performance-wise we can just do a bunch of ls-s for one large directory, measure average rpc time before and after the patch, and post numbers in here. Thanks Konstantin.
Both are already tested (big thanks to Koji). Not only it does not degrade performance, it improves it (as noted in comment on Dec 8th) for large responses. For e.g. if the response is 10MB :
Regd the tests :
Updated patch with minor javadoc corrections.
In one of the Koji's experiments on branch 0.18:
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12395979/HADOOP-4797.patch against trunk revision 726129. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3737/testReport/ This message is automatically generated. (Edit : minor correction in shell command for running the test)
About CPU: Looks like there is orders of magnitude improvement in CPU taken by write operation for 10MB RPC response with the patch. Atached micro benchmark TestRpcCpu.patch runs a simple server that has only one call 'byte[] getBuffer()' that just returns a static byte array. Measured cpu reported by /proc/pid/stat for the server for 10 iterations client fetching 10MB buffer :
It is bit shocking to see that it takes 20 seconds to fetch 10MB even with the patch. I don't think it can be expained by extra copies (as mentioned in HADOOP-4813). It is mostly a problem in object writable. I will try out a fix for that. This patch will show much better CPU improvement once we improve Object writable for arrays. How to run the test : $ ant package # for server $ bin/hadoop jar build/hadoop-0.20.0-dev-test.jar testrpccpu server #client : if you run client on a diff machine set "rpc.test.cpu.server.hostname" in #conf/hadodop-site.xml $ bin/hadoop jar build/hadoop-0.20.0-dev-test.jar testrpccpu Ok, benchmark with much saner results. Only difference is that this one returns (a Writable) ByteArray instead of naked a 'byte []' to avoid ObjectWritable from handling the array. CPU for 100 calls :
I hope 6-7 times less is pretty good for a side benefit. In the he previous version of the benchmark, client reads much slower, so 10MB mostly requires more write() calls. The extra CPU penalty in trunk is directly proportional to number write() calls required to write the full buffer. Updated patches with minor javadoc changes.
I just committed this to 0.18, 0.19, 0.20, and trunk.
Raghu how is this affecting 0.17.2?
Integrated in Hadoop-trunk #698 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/698/
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
JVM :
RPC Server :
I think fix is fairly straight fwd. RPC server read or write in smaller chunks. for e.g. :