|
Hairong Kuang made changes - 24/Apr/07 10:52 PM
[
Permlink
| « Hide
]
Hairong Kuang added a comment - 24/Apr/07 11:17 PM
The rpc time out problem is not specific to dfs. Shall we fix it in the ipc package? I am thinking to use an exponential backoff algorithm to reduce the load on the name server.
I like the idea of having a small timeout at first and then having exponential backoff so as to not load the server. This is definitely a good thing for "server scalability". If you decide to do it in the ipc package, then all the retry loops in the DFSClient might need to go away.
We should be able to use the org.apache.hadoop.io.retry package for this. See http://lucene.apache.org/hadoop/api/org/apache/hadoop/io/retry/package-summary.html
See also HADOOP-601. Tome, thank you so much. Owen also mentioned the retry package to me yesterday. I start to take a look at it and try to figure out how to use it. The info that you provided is definitely helpful.
On a second thought, I feel that simply adding a retry mechanism to ipc may not work because not all the server operations are idempotent. One option is to add an additional parameter to each call indicating if the call should be retried or not when time out. But it seems that our current rpc framework is hard to support this feature.
The annotation method proposed in HADOOP-601 to provide a general retry framework in rpc seems to be a simple solution, but since it is not implemented, for this jira, I plan to implement the retry mechanism for only ClientProtocol using the retry framework implemented in
1. Add an exponential backoff policy to RetryPolicies.
Any suggestion is welcome! The think the create()and cleanup() RPCs can be retried too. In fact, the DFSClient, in it current incantation, does retry the create() RPC three times and the cleanup() RPC indefinitely. I believe that it is safe to retry them because:
1. The first attempt did not even reach the namenode. The retried second attempt can reach the namenode and successfully complete. Either way, it is safe to retry the create() & cleanup() RPCs. > The think the create()and cleanup() RPCs can be retried too.
I believe that Dhruba meant create() and complete(). +1 on his suggestion. This patch mainly builds a retry mechanism for DFSClient. For all the proposed operations, it retries at most a configurable times using an exponential backoff algorithm when receiving SocketTimeoutException. It retries at most a configurable times using a fixed interval when receiving AlreadyBeingCreatedException.
It also modifies the io.retry package a litle bit.
Hairong Kuang made changes - 01/May/07 10:00 PM
Hairong Kuang made changes - 01/May/07 10:11 PM
Hairong Kuang made changes - 01/May/07 10:12 PM
regd namenode.complete() :
Many times this returns "false" to indicate that not all block have been reported and the client is expected to retry complete(). This patch seems to remove this part. Raghu is right. This patch incorporates his comment.
Hairong Kuang made changes - 02/May/07 12:23 AM
Hairong Kuang made changes - 02/May/07 12:26 AM
Hairong Kuang made changes - 02/May/07 12:43 AM
Hairong Kuang made changes - 02/May/07 12:43 AM
It is worth reviewing the logging, especially regarding the log level/retry count combination. In particular RetryInvocationHandler always logs exceptions at warn level, which is probably wrong. If the operation is to be retried then log at info, and only log at warn when the operation will not be retried again. (Also, it might be nice to change the last log message to say how many times the operation was retried before it finally failed.)
Also, could you add a unit test for ExponentialBackoffRetry please? I notice that ExponentialBackoffRetry uses a delay that incorporates a random number. This is probably a good idea to avoid the problem you are trying to fix, but since the other policies are deterministic I would make this difference clear in the name: e.g. FuzzyExponentialBackoffRetry. > FuzzyExponentialBackoffRetry
I think that's overkill. "Exponential backoff" is still the standard term, even when randomness is involved (e.g., http://en.wikipedia.org/wiki/Truncated_binary_exponential_backoff Should the number of attempts and sleep time be configurable?
Maybe we should start with reasonable defaults in code and make these configurable if we discover situations where they don't work. Let's not introduce config variables unless they are required. Code looks good. One question: what do we do for addBlock()? Is there a way to cleanup the dfslcient retry code for addBlock() and make it part of the IPC-retry mechanism. Especially in the case when this RPC encounters SocketTimeouts.
Also, it would be nice if we can remove these configuration variables from hadoop-default.sh so that the user does not know how to change the timeout/retry settings. AddBlock is not idempotent so we can not simply retry when getting a timeout.
One solution is to make addBlock idempotent by adding an additional parameter block# indicating which block of the file is requested. FSNamesystem keeps AddBlock history in FileUnderConstruction. Another solution is to provide a general framework to support retry in ipc. An ipc client assigns a unique id to each request. A retry is sent with the same request id. Server maintains a table keeping track of the recent operation result history per client. When the server receives a request, check the table to see if the request has been serveed or not. If yes, simply return the result; Otherwise, server the request. This solution works whether a request is idempotent or not. But it takes more effort to make it work. Either way, I do not plan to solve the addBlock problem in this jira. This patch incorporates the following review comments:
1. Add a junit test case for the exponential backoff retry plicy; 2. Change the logging level in RetryInvocationHandler as described by Tom; 3. Remove configuration variable in hadoop-default.xml. I set the intial retry interval to be 200 milliseconds and the max number of retries to be 5.
Hairong Kuang made changes - 03/May/07 07:14 PM
Thank Tom for reviewing it.
Hairong Kuang made changes - 03/May/07 10:37 PM
+1
http://issues.apache.org/jira/secure/attachment/12356731/retry1.patch Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/114/testReport/
I just committed this. Thanks, Hairong!
Doug Cutting made changes - 07/May/07 07:43 PM
Integrated in Hadoop-Nightly #82 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/82/
Doug Cutting made changes - 08/Jun/07 08:40 PM
I am planning to use this framework for some new RPCs I am adding. I just want to confirm if my understanding is correct: This patch adds a random exponential back off timeout starting with 400 milliseconds for 5 times. In all 5 retries, this add a max of 12 seconds. Since client RPC timeout is 60sec, time it takes for such RPC to fail takes between 300-312 seconds over 6 attempts. Is this expected?, because it is not exponential back off but essentially constant timeout of around 60sec for each retry.
> between 300-312 seconds over 6 attempts.
It should be 360-385 sec over 7 attempts. ( It retries maxRetries(5)+1 times starting with timeout of "initialTimeout(200) * 2". ).
Owen O'Malley made changes - 08/Jul/09 04:42 PM
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||