Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-2252

Docker containers fail to start with "future discarded" error

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: None
    • Component/s: agent, docker
    • Environment:

      Mesos slaves in containers, image mesosphere/mesos-slave:0.21.0-1.0.ubuntu1404 on docker hub. Docker 1.4.1, marathon 0.8.0-SNAPSHOT

      Description

      I tried to launch my dockerized app with 50 tasks on marathon and all tasks failed to run. Usually app works just fine.

      Backstory: https://github.com/mesosphere/marathon/issues/1083#issuecomment-71196704

      Marathon logs:

      [2015-01-23 13:22:30,163] INFO Starting app /topface/prod-test/app (mesosphere.marathon.SchedulerActions:363)
      [2015-01-23 13:22:30,165] INFO Already running 0 instances of /topface/prod-test/app. Not scaling. (mesosphere.marathon.SchedulerActions:512)
      [2015-01-23 13:22:35,339] INFO Received status update for task topface_prod-test_app.e2baeae4-a302-11e4-bea0-56847afe9799: TASK_FAILED (Abnormal executor termination) (mesosphere.marathon.MarathonScheduler:148)
      [2015-01-23 13:22:35,367] INFO Task topface_prod-test_app.e2baeae4-a302-11e4-bea0-56847afe9799 expunged and removed from TaskTracker (mesosphere.marathon.tasks.TaskTracker:107)
      [2015-01-23 13:22:35,368] INFO Sending event notification. (mesosphere.marathon.MarathonScheduler:262)
      [2015-01-23 13:22:35,369] INFO Task launch delay for [/topface/prod-test/app] is now [999483319 nanoseconds] (mesosphere.util.RateLimiter:35)
      [2015-01-23 13:22:45,345] INFO Received status update for task topface_prod-test_app.e8f945c2-a302-11e4-bea0-56847afe9799: TASK_FAILED (Abnormal executor termination) (mesosphere.marathon.MarathonScheduler:148)
      [2015-01-23 13:22:45,359] INFO Task topface_prod-test_app.e8f945c2-a302-11e4-bea0-56847afe9799 expunged and removed from TaskTracker (mesosphere.marathon.tasks.TaskTracker:107)
      [2015-01-23 13:22:45,360] INFO Sending event notification. (mesosphere.marathon.MarathonScheduler:262)
      [2015-01-23 13:22:45,360] INFO Task launch delay for [/topface/prod-test/app] is now [999838313 nanoseconds] (mesosphere.util.RateLimiter:35)
      [2015-01-23 13:23:31,942] INFO Received status update for task topface_prod-test_app.e2bcbfae-a302-11e4-bea0-56847afe9799: TASK_FAILED (Abnormal executor termination) (mesosphere.marathon.MarathonScheduler:148)
      [2015-01-23 13:23:31,946] INFO Task launch delay for [/topface/prod-test/app] is now [1149948119 nanoseconds] (mesosphere.util.RateLimiter:35)
      [2015-01-23 13:23:31,946] INFO Task topface_prod-test_app.e2bcbfae-a302-11e4-bea0-56847afe9799 expunged and removed from TaskTracker (mesosphere.marathon.tasks.TaskTracker:107)
      [2015-01-23 13:23:31,946] INFO Sending event notification. (mesosphere.marathon.MarathonScheduler:262)
      [2015-01-23 13:23:31,955] INFO Received status update for task topface_prod-test_app.e2c460d9-a302-11e4-bea0-56847afe9799: TASK_FAILED (Abnormal executor termination) (mesosphere.marathon.MarathonScheduler:148)
      [2015-01-23 13:23:31,957] INFO Task launch delay for [/topface/prod-test/app] is now [1321950877 nanoseconds] (mesosphere.util.RateLimiter:35)
      [2015-01-23 13:23:31,958] INFO Task topface_prod-test_app.e2c460d9-a302-11e4-bea0-56847afe9799 expunged and removed from TaskTracker (mesosphere.marathon.tasks.TaskTracker:107)
      [2015-01-23 13:23:31,958] INFO Sending event notification. (mesosphere.marathon.MarathonScheduler:262)
      [2015-01-23 13:23:31,958] INFO Received status update for task topface_prod-test_app.e2bb3906-a302-11e4-bea0-56847afe9799: TASK_FAILED (Abnormal executor termination) (mesosphere.marathon.MarathonScheduler:148)
      [2015-01-23 13:23:31,960] INFO Task launch delay for [/topface/prod-test/app] is now [1519954162 nanoseconds] (mesosphere.util.RateLimiter:35)
      [2015-01-23 13:23:31,960] INFO Task topface_prod-test_app.e2bb3906-a302-11e4-bea0-56847afe9799 expunged and removed from TaskTracker (mesosphere.marathon.tasks.TaskTracker:107)
      [2015-01-23 13:23:31,961] INFO Sending event notification. (mesosphere.marathon.MarathonScheduler:262)
      [2015-01-23 13:23:31,961] INFO Received status update for task topface_prod-test_app.e2c30146-a302-11e4-bea0-56847afe9799: TASK_FAILED (Abnormal executor termination) (mesosphere.marathon.MarathonScheduler:148)
      [2015-01-23 13:23:31,963] INFO Task launch delay for [/topface/prod-test/app] is now [1746973326 nanoseconds] (mesosphere.util.RateLimiter:35)
      [2015-01-23 13:23:31,970] INFO Task topface_prod-test_app.e2c30146-a302-11e4-bea0-56847afe9799 expunged and removed from TaskTracker (mesosphere.marathon.tasks.TaskTracker:107)
      [2015-01-23 13:23:31,970] INFO Sending event notification. (mesosphere.marathon.MarathonScheduler:262)
      [2015-01-23 13:23:31,970] INFO Received status update for task topface_prod-test_app.e2ba9cc2-a302-11e4-bea0-56847afe9799: TASK_FAILED (Abnormal executor termination) (mesosphere.marathon.MarathonScheduler:148)
      [2015-01-23 13:23:31,973] INFO Task topface_prod-test_app.e2ba9cc2-a302-11e4-bea0-56847afe9799 expunged and removed from TaskTracker (mesosphere.marathon.tasks.TaskTracker:107)
      [2015-01-23 13:23:31,973] INFO Task launch delay for [/topface/prod-test/app] is now [2008991202 nanoseconds] (mesosphere.util.RateLimiter:35)
      [2015-01-23 13:23:31,973] INFO Sending event notification. (mesosphere.marathon.MarathonScheduler:262)
      [2015-01-23 13:23:31,973] INFO Received status update for task topface_prod-test_app.e2bc4a7c-a302-11e4-bea0-56847afe9799: TASK_FAILED (Abnormal executor termination) (mesosphere.marathon.MarathonScheduler:148)
      [2015-01-23 13:23:31,975] INFO Task launch delay for [/topface/prod-test/app] is now [2309993195 nanoseconds] (mesosphere.util.RateLimiter:35)
      [2015-01-23 13:23:31,976] INFO Task topface_prod-test_app.e2bc4a7c-a302-11e4-bea0-56847afe9799 expunged and removed from TaskTracker (mesosphere.marathon.tasks.TaskTracker:107)
      [2015-01-23 13:23:31,976] INFO Sending event notification. (mesosphere.marathon.MarathonScheduler:262)
      [2015-01-23 13:23:31,976] INFO Received status update for task topface_prod-test_app.e2bb11f5-a302-11e4-bea0-56847afe9799: TASK_FAILED (Abnormal executor termination) (mesosphere.marathon.MarathonScheduler:148)

      First task failed to start because of network setup (docker registry was unavailable). Second task ended up on the same host and failed as well:

      E0123 13:22:35.287389 13 slave.cpp:2787] Container '0a1225ce-98bd-4f83-a417-b7cf72bb90e8' for executor 'topface_prod-test_app.e2baeae4-a302-11e4-bea0-56847afe9799' of framework '20141003-172543-3892422848-5050-1-0000' failed to start: Failed to 'docker pull docker.core.tf/topface-prod-app:20150123019': exit status = exited with status 1 stderr = time="2015-01-23T13:22:35Z" level="fatal" msg="Error: Invalid registry endpoint https://docker.core.tf/v1/: Get https://docker.core.tf/v1/_ping: dial tcp 10.5.1.194:443: connection timed out. If this private registry supports only HTTP or HTTPS with an unknown CA certificate, please add `--insecure-registry docker.core.tf` to the daemon's arguments. In the case of HTTPS, if you have access to the registry's CA certificate, no need for the flag; simply place the CA certificate at /etc/docker/certs.d/docker.core.tf/ca.crt"
      E0123 13:22:35.303208 13 slave.cpp:2882] Termination of executor 'topface_prod-test_app.e2baeae4-a302-11e4-bea0-56847afe9799' of framework '20141003-172543-3892422848-5050-1-0000' failed: Unknown container: 0a1225ce-98bd-4f83-a417-b7cf72bb90e8
      E0123 13:22:35.303503 6 slave.cpp:3134] Failed to unmonitor container for executor topface_prod-test_app.e2baeae4-a302-11e4-bea0-56847afe9799 of framework 20141003-172543-3892422848-5050-1-0000: Not monitored
      W0123 13:22:35.304908 11 docker.cpp:1184] Ignoring updating unknown container: 0a1225ce-98bd-4f83-a417-b7cf72bb90e8
      E0123 13:22:45.330379 12 slave.cpp:2787] Container '60a2fe62-4d64-4594-b1be-7e5795d6323c' for executor 'topface_prod-test_app.e8f945c2-a302-11e4-bea0-56847afe9799' of framework '20141003-172543-3892422848-5050-1-0000' failed to start: Failed to 'docker pull docker.core.tf/topface-prod-app:20150123019': exit status = exited with status 1 stderr = time="2015-01-23T13:22:45Z" level="fatal" msg="Error: Invalid registry endpoint https://docker.core.tf/v1/: Get https://docker.core.tf/v1/_ping: dial tcp 10.5.1.194:443: connection timed out. If this private registry supports only HTTP or HTTPS with an unknown CA certificate, please add `--insecure-registry docker.core.tf` to the daemon's arguments. In the case of HTTPS, if you have access to the registry's CA certificate, no need for the flag; simply place the CA certificate at /etc/docker/certs.d/docker.core.tf/ca.crt"
      E0123 13:22:45.330746 12 slave.cpp:2882] Termination of executor 'topface_prod-test_app.e8f945c2-a302-11e4-bea0-56847afe9799' of framework '20141003-172543-3892422848-5050-1-0000' failed: Unknown container: 60a2fe62-4d64-4594-b1be-7e5795d6323c
      E0123 13:22:45.340802 9 slave.cpp:3134] Failed to unmonitor container for executor topface_prod-test_app.e8f945c2-a302-11e4-bea0-56847afe9799 of framework 20141003-172543-3892422848-5050-1-0000: Not monitored
      W0123 13:22:45.342725 11 docker.cpp:1184] Ignoring updating unknown container: 60a2fe62-4d64-4594-b1be-7e5795d6323c

      Third task failed because of future discarded error:

      E0123 13:23:31.906733 12 slave.cpp:2787] Container 'bd0337a2-41f4-4308-85a9-68a3ff0475e6' for executor 'topface_prod-test_app.e2bcbfae-a302-11e4-bea0-56847afe9799' of framework '20141003-172543-3892422848-5050-1-0000' failed to start: future discarded
      E0123 13:23:31.907039 12 slave.cpp:2882] Termination of executor 'topface_prod-test_app.e2bcbfae-a302-11e4-bea0-56847afe9799' of framework '20141003-172543-3892422848-5050-1-0000' failed: Unknown container: bd0337a2-41f4-4308-85a9-68a3ff0475e6
      E0123 13:23:31.907260 7 slave.cpp:3134] Failed to unmonitor container for executor topface_prod-test_app.e2bcbfae-a302-11e4-bea0-56847afe9799 of framework 20141003-172543-3892422848-5050-1-0000: Not monitored

      Fourth task failed because of future discarded error too:

      E0123 13:23:31.932677 8 slave.cpp:2787] Container '782c163a-9238-4f3b-b9fd-dcc50579322a' for executor 'topface_prod-test_app.e2c460d9-a302-11e4-bea0-56847afe9799' of framework '20141003-172543-3892422848-5050-1-0000' failed to start: future discarded
      E0123 13:23:31.933078 8 slave.cpp:2882] Termination of executor 'topface_prod-test_app.e2c460d9-a302-11e4-bea0-56847afe9799' of framework '20141003-172543-3892422848-5050-1-0000' failed: Unknown container: 782c163a-9238-4f3b-b9fd-dcc50579322a
      E0123 13:23:31.967974 6 slave.cpp:3134] Failed to unmonitor container for executor topface_prod-test_app.e2c460d9-a302-11e4-bea0-56847afe9799 of framework 20141003-172543-3892422848-5050-1-0000: Not monitored

      I think this "future discarded" thing should be fixed. Ideally more understandable error message should be introduced.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              bobrik Ivan Babrou
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: