|
me too. It may deserve a separate JIRA. Attached patch adds an internal config variable "dfs.datanode.socket.write.timeout".
Also, when this is set to 0, DataNode uses standard sockets instead of NIO sockets. Runping, could you use this patch with value set to 0 while looking at I ran the gridmix against a build where the write timeout set to 2 minutes. there seems to be at least one place where the constant is still used as the timeout value: @@ -848,7 +859,7 @@ /* utility function for sending a respose */ private static void sendResponse(Socket s, short opStatus) throws IOException { DataOutputStream reply = - new DataOutputStream(new SocketOutputStream(s, WRITE_TIMEOUT)); + new DataOutputStream(NetUtils.getOutputStream(s, WRITE_TIMEOUT)); Is this intended? yes, otherwise I need to make socketWriteTimeout static. Actually I will just pass it to sendResponse() in the next patch. For now it won't matter since sendResponse() is not used after large writes.
What is the default timeout? I suggest 2 minutes at max. 2 minutes is fine for writes.. though it does not really improve much. Would it matter in the absence of
I am more concerned about clients reading from DFS since this timeout current applies to those connections as well. Currently DFSClient treats these connection failures as real errors and will try different datanode. I think we need to fix DFSClient before being more aggressive about this timeout. 0.17 would be the first release that has such a timeout. I am not sure if we should have an aggressive value in the first release. That said, I am not strongly opposed to reducing it. Updated patch attached. Changes are :
I think we should put in this patch for 0.17 and set the write timeout to a value lesser than 10 minutes. This means that the write will timeout before the entire task fails. The failed write will be retried and the task will probably succeed.
Yes. This patch lowers the value to 8 min. I think 2 min is too short, because 1 min leads to multiple false errors on the cluster I am using for
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12380122/HADOOP-3124.patch against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included -1. The patch doesn't appear to include any new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2249/testReport/ This message is automatically generated. Regd unit tests : This patch essentially just makes a constant configurable.
Regd what the default should be (and why I think 2 min is certainly too low): My understanding of what this time is for :
What this is not for :
If you define this timeout to something else, then it is quite possible that much smaller timeout is ok. Please suggest different value (preferably redefining it). 8min may not be the right value either. Even on 4 disk nodes, just doing 'generateData' stage of gridmix, we have seen that 2 min is not enough. On a heavily loaded cluster running multiple jobs on 2 disk machines, it might be much larger. Thats why making this configurable helps. One change we could do is to use different write-timeout values when data is written to DFS and when data is read from DFS (DataNode to DFSClient write). +1 The grid operators can configure an appropriate write timeout value based on their own specific situations. Integrated in Hadoop-trunk #463 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/463/
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This is already the case : timeout is : 10 min + 5sec * (number of datanodes - position in the pipeline). Client's postion is 0, first datanode's position is 1 etc.
+1 for making this a config.
Note that there was no timeout for this before 0.17, it client would get stuck forever. 10 min was added as a very conservative value. What should be the default?
Though not relevant here, probably we need different write timeouts while receiving a block and while sending a block.
I am curious to know if you have any info on why one of the datanode's was not able to read for 10minutes.