Details
Description
We need to expose per-job uptime metrics that will define first steps towards the contractual SLA relationship between Aurora/Mesos platform and hosted services.
Specifically, support the following client side queries:
- Given a job key return a job uptime vector containing enough data to calculate job uptime SLA metrics (aurora client)
- Given a slave attribute (HOST | RACK), return a list of values where it would be safe to kill tasks without violating their job SLA status (aurora_admin client)