Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Container retries are currently set to a default of -1 in AbstractProviderService.buildContainerRetry. If this is not overridden via service spec with a finite value for yarn.service.container-failure.retry.max , this causes infinite NM reties for the container for ALWAYS/ON_FAILURE restart policy . Ideally it should try a finite number of time on the same NM and subsequently Service AM can retry on another node.
We can set this to default value of 3.