Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-6327

Large docker images causes container launch failures: Too many levels of symbolic links.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.0.0, 1.0.1
    • 1.1.2, 1.2.1, 1.3.0
    • containerization, docker
    • None
    • centos 7.2 (1511), ubuntu 14.04 (trusty). Replicated in the Apache Aurora vagrant image

    Description

      When deploying Mesos containers with large (6G+, 60+ layers) Docker images the task crashes with the error:

      Mesos agent logs:
      E1007 08:40:12.954227 8117 slave.cpp:3976] Container 'a1d759ae-5bc6-4c4e-ac03-717fbb8e5da4' for executor 'thermos-www-data-devel-hello_docker_image-0-d42d2af6-6b44-4b2b-be95-e1ba93a6b365' of framework df
      c91a86-84b9-4539-a7be-4ace7b7b44a1-0000 failed to start: Collect failed: Collect failed: Failed to copy layer: cp: cannot stat ‘/var/lib/mesos/provisioner/containers/a1d759ae-5bc6-4c4e-ac03-717fbb8e5da4/b
      ackends/copy/rootfses/5f328f72-25d4-4a26-ac83-8d30bbc44e97/usr/share/zoneinfo/right/Asia/Urumqi’: Too many levels of symbolic links
      ... (complete pastebin: http://pastebin.com/umZ4Q5d1 )

      How to replicate:
      Start the aurora vagrant image. Adjust the /etc/mesos-slave/executor_registration_timeout to 5 mins. Adjust the file /vagrant/examples/jobs/hello_docker_image.aurora to start a large Docker image instead of the example. (you can use anldisr/jupyter:0.4 i created as a test image, this is based upon the jupyter notebook stacks.). Create the job, watch it fail after x number of minutes.

      The mesos sandbox is empty.

      Aurora errors i see:
      28 minutes ago - FAILED : Failed to launch container: Collect failed: Collect failed: Failed to copy layer: cp: cannot stat ‘/var/lib/mesos/provisioner/containers/93420a36-0e0c-4f04-b401-74c426c25686/backends/copy/rootfses/6e185a51-7174-4b0d-a305-42b634eb91bb/usr/share/zoneinfo/right/Asia/Urumqi’: Too many levels of symbolic links cp: cannot stat ...
      Too many levels of symbolic links ; Container destroyed while provisioning images
      (complete pastebin: http://pastebin.com/uecHYD5J )

      To rule out the image i started this and more images as a normal Docker container. This works without issues.

      Mesos flags related configured:
      -appc_store_dir
      /tmp/mesos/images/appc
      -containerizers
      docker,mesos
      -executor_registration_timeout
      5mins
      -image_providers
      appc,docker
      -image_provisioner_backend
      copy
      -isolation
      filesystem/linux,docker/runtime

      Affected Mesos versions tested: 1.0.1 & 1.0.0

      Attachments

        Issue Links

          Activity

            People

              chhsia0 Chun-Hung Hsiao
              a-nldisr Rogier Dikkes
              Gilbert Song Gilbert Song
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: