Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
-
None
Description
In RetryPolicies.java, RetryUpToMaximumTimeWithFixedSleep is converted to a RetryUpToMaximumCountWithFixedSleep, whose count is the maxTime / sleepTime:
public RetryUpToMaximumTimeWithFixedSleep(long maxTime, long sleepTime, TimeUnit timeUnit) { super((int) (maxTime / sleepTime), sleepTime, timeUnit); this.maxTime = maxTime; this.timeUnit = timeUnit; }
But if retries take a long time, then the maxTime passed to the RetryUpToMaximumTimeWithFixedSleep is exceeded.
As an example, while doing NM restarts, we saw an issue where the NMProxy creates a retry policy which specifies a maximum wait time of 15 minutes and a 10 sec interval (which is converted to a MaximumCount policy with 15 min / 10 sec = 90 tries). But each NMProxy retry policy invokes o.a.h.ipc.Client's retry policy:
if (connectionRetryPolicy == null) { final int max = conf.getInt( CommonConfigurationKeysPublic.IPC_CLIENT_CONNECT_MAX_RETRIES_KEY, CommonConfigurationKeysPublic.IPC_CLIENT_CONNECT_MAX_RETRIES_DEFAULT); final int retryInterval = conf.getInt( CommonConfigurationKeysPublic.IPC_CLIENT_CONNECT_RETRY_INTERVAL_KEY, CommonConfigurationKeysPublic .IPC_CLIENT_CONNECT_RETRY_INTERVAL_DEFAULT); connectionRetryPolicy = RetryPolicies.retryUpToMaximumCountWithFixedSleep( max, retryInterval, TimeUnit.MILLISECONDS); }
So the time it takes the NMProxy to fail is actually (90 retries) * (10 sec NMProxy interval + o.a.h.ipc.Client retry time). In the default case, ipc client retries 10 times with a 1 sec interval, meaning the time it takes for NMProxy to fail is (90)(10 sec + 10 sec) = 30 min instead of the 15 min specified by NMProxy configuration.
Attachments
Issue Links
- duplicates
-
HADOOP-11398 RetryUpToMaximumTimeWithFixedSleep needs to behave more accurately
- Patch Available