When RM HA is enabled and running containers are kept across attempts, apps are more likely to finish successfully with more retries(attempts), so it will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make RM recover process much slower. It might be better to set max attempts to be stored in RMStateStore.
BTW: When 'attemptFailuresValidityInterval'(introduced in
YARN-611) is set to a small value, retried attempts might be very large. So we need to delete some attempts stored in RMStateStore and RMStateStore.