Details
-
Task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
When ~50 containers are launched simultaneously in a task group on an agent, all of which specify command health checks, they will fail to become healthy. The LAUNCH_NESTED_CONTAINER_SESSION calls for the health checks time out, leading to task group failure.
We should both investigate the cause of the timeouts (based on previous profiling efforts, it is likely due to the cost of forking from the agent process), as well as consider rate-limiting options to allow operators to simultaneously scale large numbers of containers.