|
Attached patch removes extra buffer copies when data is read from the data node (by client or while replicating).
I will post more microbenchmarks similar to last comment. We can reduce one copy on the DFSClient. Current readChunk() interface in FSInputChecker does not allow it. We could add optional readChunks() so that an implementation can get access to user's complete buffer. There will be a default implementation of this. Should I file a jira? This patch changes the DATA_TRANSFER_PROTOCOL a bit. Currently there are no improvements in buffering whilre writing data to DFS. I will do that in a follow up jira. All the unit tests pass. I will run them on windows as well. No new tests are added since this does not actually change any functionality and purely a performance improvement. Also this is friendly to Java's FileChannel.transferTo() interface. That can reduce one more memory copy. This will be good experiment.
With 6 instances of 'dfs -cat 5GbFile > /dev/null' both tests were cpu bound.
with out the patch : ~ 6.5 min
Sorry, I did not get to the essentials of your patch. But it is really hard to understand without Thanks for the detailed review Konstatin.
Just a clarification to the readers: except 4, rest are not introduced by this patch. I will try to fix some of them. When this protocol was first introduced, it was described in the jira ( Note regd 'dfs -cat' numbers: These are end to end tests and numbers vary depending on how many instances we run. Just as in any end-to-end test there are multiple factors that are not affected by this patch. This patch reduces CPU consumed by DataNode while serving data. It cannot be directly comapred from 'dfs -cat' numbers.
I have semi-directly calculated DataNode with the patch takes 35-45% of CPU it used to take before. This calculation uses 9/10 ratio from HADOOP-2144. 'top' on my dev box truncates summed up cpu to 99.9 (unlike on the machine used in HADOOP-2144), other wise we could directly compare CPU taken by DataNode instead of calculating it indirectly. Sameer asked me to compare single instance of 'dfs -cat' and regular shell 'cat'. I will add those numbers in the next comment. Comparision of single instance of 'dfs -cat 5Gbfile > /dev/null" with 'cat 5Gbfile > /dev/null'. All the data resides locally on a 4 disk RAID0 partition :
What would you conclude? Both of the obvious conclusions are incorrect :
If we had a single disk partition, the numbers would be even closer. Regd couple of concerns in Konstantin's review :
The most recent version of the Data Transfer Protocol can be found as an attachment to
Dhruba,
As Konstantin suggested, it is an internal protocol and probably best to keep the protocol documentation in (large) code comments. Its harder to go back I expect you will see big difference if you run 4 concurrent cats, one for a file on different disk, vs. 4 concurrent dfs cats. I understand now DATA_TRANSFER_VERSION is a part of Data Transfer Protocol.
> sendChunks(int) ... sends multiple CRC chunks. I would rather think of it that it is sending portions of data along with corresponding CRCs, rather than CRCs with the data attached. Konstantin, updated patch add all suggested changes except #3 above (reducing the number of member variables). I think it is a large enough change to have its own Jira.
The essence of the patch is at the top of second comment (difference between "before" and "after"). Please feel free to ping me regd explanation of any changes. More changes under src/test caused by fixing DATA_TRANFER_VERSION to DATA_TRANSFER_VERSION.
Thank Konstantin. Update patch includes all the changes.
regd sendBuf and maxChunksPerPacket: these are made local to sendBlock(), I think thats what you meant (instead of sendChunks(), which invoked in sendBlock() in a loop). Will make this patch available once DFSIO benchmark shows the buffering is same as before. -1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12376557/HADOOP-2758.patch against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 6 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs -1. The patch appears to introduce 1 new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1849/testReport/ This message is automatically generated. fixed findbugs warning. Many thanks to Mukund for running benchmarks with the patch. The results look fine and we are going to double check with another run. We want make sure 'effective buffering' hasn't changed. +1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12376788/HADOOP-2758.patch against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 6 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1873/testReport/ This message is automatically generated. Edit : typos
More comparisions. I hope this shows the improvements. Test : Run 6 instances of 'cat 5GbFile > /dev/null' using a single node cluster. All the blocks are located on local disks (RAID0 I think). The hdfs tests include constant costs : Client cpu and kernel cpu not on behalf of user processes. Client cpu is at least as much as DataNodes. This implies, 25% improvement in wall clock time implies more that 50% improvement in DataNode cpu.
Even 21 instances of 'cat allBlocksForFile > /dev/null' was not CPU bound. 'cat' takes virtually zero cpu in user space. +1 test results look encouraging.
Integrated in Hadoop-trunk #419 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/419/
Integrated in Hadoop-trunk #426 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/426/
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Ran 4 instances of 'dfs -cat 5GbFile > /dev/null' similar to the tests in HADOOP-2144. All the blocks are local.
branch-0.16 : ~4 min. cpu bound. user cpu is 3 times the kernel cpu.
trunk + patch : ~3min. disk bound. user cpu is 2 times the kernel cpu. not that much of cpu was left (~10-20%).
Also from HADOOP-2144, datanode cpu is around 0.9 times DFSClient cpu. Even after ignoring idle cpu in the second test, datanode takes less than half of cpu with the patch. This includes both user and kernel cpu taken by datanode. Assuming kernel cpu is same in both cases, the user cpu taken by datanode in second test would much less than half (may be closer 1/3rd).