Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-7777

Agent failed to recover due to mount namespace leakage in Docker 1.12/1.13

Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 1.1.3, 1.2.2, 1.3.1, 1.4.0
    • docker
    • None
    • Mesosphere Sprint 59
    • 3

    Description

      Docker changed its default mount propagation to "shared" since 1.12 to enable persistent volume plugins. However, Docker has a known issue (https://github.com/moby/moby/issues/25718) that it sometimes leaks its mount namespace to other processes, which could make Mesos agents fail to remove Docker containers during recovery. The following shows the logs of such a faliure:

      I0615 09:39:11.083787  4573 docker.cpp:1002] Skipping recovery of executor 'kafka__7e49099d-7ab4-4435-a94a-1e849b8f2b70' of framework 44cbe3e9-984d-4073-b523-0023b427f54d-0011 because its executor is not marked as docker and the docker container doesn't exist
      Failed to perform recovery: Collect failed: Collect failed: Failed to run 'docker -H unix:///var/run/docker.sock rm -v 2de71c5383cb887f3ee49de5a517545b0522e1bbcb5df618c7ddb8583fd1d12d': exited with status 1; stderr='Error response from daemon: Driver overlay failed to remove root filesystem 2de71c5383cb887f3ee49de5a517545b0522e1bbcb5df618c7ddb8583fd1d12d: remove /var/lib/docker/overlay/221725ec545d60492b5431bb49380d868f7a949aaa3acff49f7ffb5bddeb3385/merged: device or resource busy
      '
      To remedy this do as follows:
      Step 1: rm -f /var/lib/mesos/slave/meta/slaves/latest
      This ensures agent doesn't recover old live executors.
      Step 2: Restart the agent.
      

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            chhsia0 Chun-Hung Hsiao
            chhsia0 Chun-Hung Hsiao
            Jie Yu Jie Yu
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Agile

                Completed Sprint:
                Mesosphere Sprint 59 ended 21/Jul/17
                View on Board

                Slack

                  Issue deployment