Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-9918

Agent fails to scale many tasks/containers with command health checks

Details

    Description

      When ~50 containers are launched simultaneously in a task group on an agent, all of which specify command health checks, they will fail to become healthy. The LAUNCH_NESTED_CONTAINER_SESSION calls for the health checks time out, leading to task group failure.

      We should both investigate the cause of the timeouts (based on previous profiling efforts, it is likely due to the cost of forking from the agent process), as well as consider rate-limiting options to allow operators to simultaneously scale large numbers of containers.

      Attachments

        Activity

          There are no comments yet on this issue.

          People

            Unassigned Unassigned
            greggomann Greg Mann
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: