Uploaded image for project: 'Aurora'
  1. Aurora
  2. AURORA-1932

Failure accrual detection mechanism for bad agents

    XMLWordPrintableJSON

Details

    • Task
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Scheduler
    • None

    Description

      With the introduction of different OfferManager orderings (see https://reviews.apache.org/r/59480/), we run the risk of repeatedly assigning the same task to a bad agent.

      We should develop some sort of 'failure accrual' mechanism where we can track how many times tasks fail on a agent. If it reaches some sort of threshold, we should blacklist that agent for some time so that it can be investigated and the task can be assigned to a different agent.

      Attachments

        Activity

          People

            jordanly Jordan Ly
            jordanly Jordan Ly
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: