Thanks for the thorough review, Jian. Sorry for missing some simple things in the patch.
will the safeDelete throw noNodeExist exception if deleting a non-existing zone?
safeDelete checks if the znode exists before attempting to delete it. So, shouldn't throw NoNodeException.
why in HA case, zkRetryInterval is calculated as below
When HA is not enabled, we should give the store as much time as possible to connect to ZK. When HA is enabled, it is possible the other RM has better chance of connecting to ZK; so, we should give up trying by session-timeout.
YARN-2054 has all the details.
Posting a patch shortly to address all the review feedback.