Uploaded image for project: 'Aurora'
  1. Aurora
  2. AURORA-1514

Allow users to give guidance on SLA for their job

    XMLWordPrintableJSON

Details

    • Story
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Maintenance, SRE
    • None

    Description

      There needs to be a standard process for customizing the SLA used to validate a task on a host can be killed to drain that host into maintenance. Right now, the default is 95% over 30minutes, but there are certain services (such as memcache) which would be able to survive much better under a 99% over 5 minutes, for example.

      We could build this tooling around the existing aurora_admin drain_hosts, but it would apply to all tasks on that host, which would increase complexity.

      Lastly, in case we decide to make this user-settable vs. operator-whitelistable.. t is important that we still set firm barriers in place around acceptable values to prevent a service from setting 100% over 0 minutes and holding hosts hostage.

      Attachments

        Activity

          People

            Unassigned Unassigned
            yasumoto Joe Smith
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: