Thanks Rohith Sharma K S, Jian He for the review and comments!
1. Should private boolean isTransitingToActive = false; is volatile?
Yes, it needs be volatile. I'll update it.
2. Since none of the refreshXXX methods are synchronized, patch introduces a concurrency issue. If there is an explicit admin call for refreshing at the time of transitionToActive, then checkRMStatus will be executed for other admin calls. Until RM transition-to-active completely, explicit admin commands should not allowed to refresh. I think, we should incorporate similar to refreshAdminAcl method.
How about adding synchronized to each refresh functions? It avoids adding more logic. When admin command comes, we could just call corresponding refresh functions. I think it does not matter to call refresh function many times.
3. I think flag checkRMHAState can be passed to method checkRMStatus.
I was thinking it. If adding checkRMHAState to checkRMStatus, we need add this parameter(checkRMHAState) to all refresh functions too(which is similar to refreshAdminAcl), there are a lot of places that call refresh functions. It might be better to just add a check before checkRMStatus?
I think if you can simulate test for generally instead of specific to fair scheduler, this test can be moved to class TestRMHA. There is already test TestRMHA#testTransitionedToActiveRefreshFail, probable the same test can be changed?
Thanks. I'll update the test case.
Instead of reusing the existing refreshAll method, I checked each refresh method, it should be cleaner to just create a new method which includes all necessary reconfig steps. This also avoids unnecessary audit logs, acl checks.
Yes, it will be more clear to add a new method to include all reconfig steps. My doubt is that there will be two places that do similar reconfig things(the one is in refresh functions, the other is in the new added method). Then we need to modify both places if there is some change for one of them. I will try to refactor those refresh functions.