[AURORA-1514] Allow users to give guidance on SLA for their job - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Story
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Maintenance, SRE
Labels:
None

Description

There needs to be a standard process for customizing the SLA used to validate a task on a host can be killed to drain that host into maintenance. Right now, the default is 95% over 30minutes, but there are certain services (such as memcache) which would be able to survive much better under a 99% over 5 minutes, for example.

We could build this tooling around the existing aurora_admin drain_hosts, but it would apply to all tasks on that host, which would increase complexity.

Lastly, in case we decide to make this user-settable vs. operator-whitelistable.. t is important that we still set firm barriers in place around acceptable values to prevent a service from setting 100% over 0 minutes and holding hosts hostage.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Joe Smith

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 08/Oct/15 13:48

Updated:: 08/Oct/15 13:48