Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
Incompatible change
-
Set YARN_FAIL_FAST to be false by default. If HA is enabled and if there's any state-store error, after the retry operation failed, we always transition RM to standby state.
Description
Several fixes:
1. Set YARN_FAIL_FAST to be false by default, since this makes more sense in production environment.
2. If HA is enabled and if there's any state-store error, after the retry operation failed, we always transition RM to standby state. Otherwise, we may see two active RMs running. YARN-4107 is one example.