Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.3.0
-
None
-
None
-
Reviewed
Description
Currently for NN SafeMode, we have the following corresponding retry policies:
- In non-HA setup, for certain API call ("create"), the client will retry if the NN is in SafeMode. Specifically, the client side's RPC adopts MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry is enabled.
- In HA setup, the client will retry if the NN is Active and in SafeMode. Specifically, the SafeModeException is wrapped as a RetriableException in the server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy which recognizes RetriableException (see
HDFS-5291).
There are several possible issues in the current implementation:
- The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator through CLI), and the clients may not want to retry on this type of SafeMode.
- Client may want to retry on other API calls in non-HA setup.
- We should have a single generic strategy to address the mapping between SafeMode and retry policy for both HA and non-HA setup. A possible straightforward solution is to always wrap the SafeModeException in the RetriableException to indicate that the clients should retry.