Description
There is currently little to no visibility into task launching activities. Once a task reaches RUNNING it's considered entering an application realm where the only way to dissect the warmup period is through examining thermos processes (if they exist). The warmup may take arbitrary long time to complete exacerbating the visibility problem even further.
Another example is docker container pull (AURORA-1059) where a task stays in ASSIGNED until a docker pull completes. This skews our SLA metrics and risks aborting the task due to exceeding transient task timeout.
We should consider adding more task states to track package/container fetch and launching/warmup activities explicitly, e.g.:
ASSIGNED -> FETCHING ->LAUNCHING|STARTING -> RUNNING
The above would require modifying the schema to require explicit definition for package fetching.
Attachments
Issue Links
- relates to
-
AURORA-1225 Modify executor state transition logic to rely on health checks (if enabled)
-
- Resolved
-