Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.8.0
-
None
-
Incompatible change, Reviewed
-
The default value of yarn.app.mapreduce.client.job.max-retries has been changed from 0 to 3. This will help protect clients from failures that are transient. True failures may take slightly longer now due to the retries.
Description
The default is 0, so any communication failure results in a client failure. Oozie doesn't like that. If the RM is failing over and Oozie gets a communication failure, it assumes the target job has failed. I propose raising the default to something modest like 3 or 5. The default retry interval is 2s.