Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-9230

Docker executor may stuck in infinite loop when `docker run` hangs.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Accepted
    • Major
    • Resolution: Unresolved
    • 1.2.3, 1.4.2, 1.5.1, 1.6.0
    • None
    • docker, executor
    • None

    Description

      This issue happens due to a very slow/unresponsive Docker daemon.

      Observed behaviour of the Docker executor:

      1. Agent launches the Docker executor, which calls `docker run` to launch a container.
      2. `docker inspect` hangs each time it's called, so the docker executor retries in a loop without success.
      3. After 5 minutes, a framework (Marathon) sends first `killTask` message, which interrupts the previous `docker inspect` loop.
      4. Then, `killTask()` launches the very first `docker stop`, which hangs.
      5. The framework sends the second `killTask()` after 20 seconds which interrupts the first `docker stop` command.
      6. The framework continues to send `killTask()` every 20 seconds, but `docker stop` always immediately returns an error: "Error response from daemon: No such container: mesos-some-UID".

      Since `docker run` hangs, `reaped()` callback is never called. Thus, the Docker executor gets stuck in an infinite `docker stop` loop.

      Attachments

        Activity

          People

            Unassigned Unassigned
            abudnik Andrei Budnik
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: