Details
Description
If the NameNode is not available (in, for example, a network partition event separating the client from the NameNode), and an attempt is made to connect, then the FileSystem api will eventually timeout and throw an error. However, that timeout is currently hardcoded to be 20 seconds to connect, with 45 retries, for a total of a 15 minute wait before failure. While in many circumstances this is fine, there are also many circumstances (such as booting a service) where both the connection timeout and the number of retries should be significantly less, so as not to harm availability of other services.
Investigating Client.java, I see that there are two fields in Connection: maxRetries and rpcTimeout. I propose either re-using those fields for initiating the connection as well; alternatively, using the already existing dfs.socket.timeout parameter to set the connection timeout on initialization, and potentially adding a new field such as dfs.connection.retries with a default of 45 to replicate current behaviors.
Attachments
Attachments
Issue Links
- duplicates
-
HADOOP-3456 IPC.Client connect timeout should be configurable
- Resolved
-
HADOOP-9106 Allow configuration of IPC connect timeout
- Closed