Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Duplicate
-
2.4.0
-
None
-
None
Description
Today, when a ZK operation fails, we handle connection-loss and operation-timeout the same way. This could definitely use some improvements:
- Add special handling for other error codes
- Connection-loss: Nullify zkClient, so a new connection is established
- Operation-timeout: Retry a few times with exponential delay?
Attachments
Issue Links
- Is contained by
-
YARN-2716 Refactor ZKRMStateStore retry code with Apache Curator
- Resolved