Details
-
Task
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
-
None
-
None
Description
There was a recent change to the Aurora client to provide "at most once" instead of "at least once" retries for non-idempotent operations. See:
https://github.com/apache/aurora/commit/f1e25375def5a047da97d8bdfb47a3a9101568f6
`aurora job restart` is a non-idempotent operation, thus it was not retried. However, when a transport exception occurs, the operator has to babysit simple operations like aurora job restart if it were not retried. Compared to the requests that were causing problems (admin tasks, job creating, updates, etc.), restarts in general should be retried rather than erring on the side of caution.