Details
-
Task
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
With the introduction of different OfferManager orderings (see https://reviews.apache.org/r/59480/), we run the risk of repeatedly assigning the same task to a bad agent.
We should develop some sort of 'failure accrual' mechanism where we can track how many times tasks fail on a agent. If it reaches some sort of threshold, we should blacklist that agent for some time so that it can be investigated and the task can be assigned to a different agent.