Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-10066

mesos-docker-executor process dies when agent stops. Recovery fails when agent returns

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 1.7.3
    • None
    • None

    Description

      Hello all,

      The documentation about Agent Recovery shows two conditions for the recovery to be possible:

      • The agent must have recovery enabled (default true?);
      • The scheduler must register itself saying that it has checkpointing enabled.

      In my tests I'm using Marathon as the scheduler and Mesos itself sees Marathon as e checkpoint-enabled scheduler:

          $ curl -sL 10.234.172.27:5050/state | jq '.frameworks[] | {"name": .name, "id": .id, "checkpoint": .checkpoint, "active": .active}'
          {
            "name": "asgard-chronos",
            "id": "4783cf15-4fb1-4c75-90fe-44eeec5258a7-0001",
            "checkpoint": true,
            "active": true
          }
          {
            "name": "marathon",
            "id": "4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000",
            "checkpoint": true,
            "active": true
          }
      }}
      

      Here is what I'm using:

      1. Mesos Master, 1.4.1
      2. Mesos Agent 1.7.3
      3. Using docker image mesos/mesos-centos:1.7.x
      4. Docker sock mounted from the host
      5. Docker binary also mounted from the host
      6. Marathon: 1.4.12
      7. Docker
            Client: Docker Engine - Community
             Version:           19.03.5
             API version:       1.39 (downgraded from 1.40)
             Go version:        go1.12.12
             Git commit:        633a0ea838
             Built:             Wed Nov 13 07:22:05 2019
             OS/Arch:           linux/amd64
             Experimental:      false
            
            Server: Docker Engine - Community
             Engine:
              Version:          18.09.2
              API version:      1.39 (minimum version 1.12)
              Go version:       go1.10.6
              Git commit:       6247962
              Built:            Sun Feb 10 03:42:13 2019
              OS/Arch:          linux/amd64
              Experimental:     false
        

      The problem

      Here is the Marathon test app, a simple sleep 99d based on debian docker image.

          {
            "id": "/sleep",
            "cmd": "sleep 99d",
            "cpus": 0.1,
            "mem": 128,
            "disk": 0,
            "instances": 1,
            "constraints": [],
            "acceptedResourceRoles": [
              "*"
            ],
            "container": {
              "type": "DOCKER",
              "volumes": [],
              "docker": {
                "image": "debian",
                "network": "HOST",
                "privileged": false,
                "parameters": [],
                "forcePullImage": true
              }
            },
            "labels": {},
            "portDefinitions": []
          }
      

      This task runs fine and get scheduled on the right agent, which is running mesos agent 1.7.3 (using the docker image, mesos/mesos-centos:1.7.x).

      Here is a sample log:

          mesos-slave_1  | I1205 13:24:21.391464 19849 slave.cpp:2403] Authorizing task 'sleep.8c187c41-1762-11ea-a2e5-02429217540f' for framework 4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000
          mesos-slave_1  | I1205 13:24:21.392707 19849 slave.cpp:2846] Launching task 'sleep.8c187c41-1762-11ea-a2e5-02429217540f' for framework 4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000
          mesos-slave_1  | I1205 13:24:21.392895 19849 paths.cpp:748] Creating sandbox '/var/lib/mesos/agent/slaves/79ad3a13-b567-4273-ac8c-30378d35a439-S60499/frameworks/4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000/executors/sleep.8c187c41-1762-11ea-a2e5-02429217540f/runs/53ec0ef3-3290-476a-b2b6-385099e9b923'
          mesos-slave_1  | I1205 13:24:21.394399 19849 paths.cpp:748] Creating sandbox '/var/lib/mesos/agent/meta/slaves/79ad3a13-b567-4273-ac8c-30378d35a439-S60499/frameworks/4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000/executors/sleep.8c187c41-1762-11ea-a2e5-02429217540f/runs/53ec0ef3-3290-476a-b2b6-385099e9b923'
          mesos-slave_1  | I1205 13:24:21.394918 19849 slave.cpp:9068] Launching executor 'sleep.8c187c41-1762-11ea-a2e5-02429217540f' of framework 4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000 with resources [{"allocation_info":{"role":"*"},"name":"cpus","scalar":{"value":0.1},"type":"SCALAR"},{"allocation_info":{"role":"*"},"name":"mem","scalar":{"value":32.0},"type":"SCALAR"}] in work directory '/var/lib/mesos/agent/slaves/79ad3a13-b567-4273-ac8c-30378d35a439-S60499/frameworks/4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000/executors/sleep.8c187c41-1762-11ea-a2e5-02429217540f/runs/53ec0ef3-3290-476a-b2b6-385099e9b923'
          mesos-slave_1  | I1205 13:24:21.396499 19849 slave.cpp:3078] Queued task 'sleep.8c187c41-1762-11ea-a2e5-02429217540f' for executor 'sleep.8c187c41-1762-11ea-a2e5-02429217540f' of framework 4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000
          mesos-slave_1  | I1205 13:24:21.397038 19849 slave.cpp:3526] Launching container 53ec0ef3-3290-476a-b2b6-385099e9b923 for executor 'sleep.8c187c41-1762-11ea-a2e5-02429217540f' of framework 4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000
          mesos-slave_1  | I1205 13:24:21.398028 19846 docker.cpp:1177] Starting container '53ec0ef3-3290-476a-b2b6-385099e9b923' for task 'sleep.8c187c41-1762-11ea-a2e5-02429217540f' (and executor 'sleep.8c187c41-1762-11ea-a2e5-02429217540f') of framework 4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000
          mesos-slave_1  | W1205 13:24:22.576869 19846 slave.cpp:8496] Failed to get resource statistics for executor 'sleep.8c187c41-1762-11ea-a2e5-02429217540f' of framework 4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000: Failed to run '/usr/bin/docker -H unix:///var/run/docker.sock inspect --type=container mesos-53ec0ef3-3290-476a-b2b6-385099e9b923': exited with status 1; stderr='Error: No such container: mesos-53ec0ef3-3290-476a-b2b6-385099e9b923'
          mesos-slave_1  | I1205 13:24:24.094985 19853 docker.cpp:792] Checkpointing pid 12435 to '/var/lib/mesos/agent/meta/slaves/79ad3a13-b567-4273-ac8c-30378d35a439-S60499/frameworks/4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000/executors/sleep.8c187c41-1762-11ea-a2e5-02429217540f/runs/53ec0ef3-3290-476a-b2b6-385099e9b923/pids/forked.pid'
          mesos-slave_1  | I1205 13:24:24.343099 19848 slave.cpp:4839] Got registration for executor 'sleep.8c187c41-1762-11ea-a2e5-02429217540f' of framework 4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000 from executor(1)@10.234.172.56:16653
          mesos-slave_1  | I1205 13:24:24.345593 19848 docker.cpp:1685] Ignoring updating container 53ec0ef3-3290-476a-b2b6-385099e9b923 because resources passed to update are identical to existing resources
          mesos-slave_1  | I1205 13:24:24.345945 19848 slave.cpp:3296] Sending queued task 'sleep.8c187c41-1762-11ea-a2e5-02429217540f' to executor 'sleep.8c187c41-1762-11ea-a2e5-02429217540f' of framework 4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000 at executor(1)@10.234.172.56:16653
          mesos-slave_1  | I1205 13:24:24.362699 19853 slave.cpp:5310] Handling status update TASK_STARTING (Status UUID: 8b06fe8c-d709-453b-817d-2948e50782c9) for task sleep.8c187c41-1762-11ea-a2e5-02429217540f of framework 4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000 from executor(1)@10.234.172.56:16653
          mesos-slave_1  | I1205 13:24:24.363222 19853 task_status_update_manager.cpp:328] Received task status update TASK_STARTING (Status UUID: 8b06fe8c-d709-453b-817d-2948e50782c9) for task sleep.8c187c41-1762-11ea-a2e5-02429217540f of framework 4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000
          mesos-slave_1  | I1205 13:24:24.364115 19853 task_status_update_manager.cpp:842] Checkpointing UPDATE for task status update TASK_STARTING (Status UUID: 8b06fe8c-d709-453b-817d-2948e50782c9) for task sleep.8c187c41-1762-11ea-a2e5-02429217540f of framework 4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000
          mesos-slave_1  | I1205 13:24:24.364993 19847 slave.cpp:5815] Forwarding the update TASK_STARTING (Status UUID: 8b06fe8c-d709-453b-817d-2948e50782c9) for task sleep.8c187c41-1762-11ea-a2e5-02429217540f of framework 4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000 to master@10.234.172.34:5050
          mesos-slave_1  | I1205 13:24:24.365594 19847 slave.cpp:5726] Sending acknowledgement for status update TASK_STARTING (Status UUID: 8b06fe8c-d709-453b-817d-2948e50782c9) for task sleep.8c187c41-1762-11ea-a2e5-02429217540f of framework 4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000 to executor(1)@10.234.172.56:16653
          mesos-slave_1  | I1205 13:24:24.401759 19846 task_status_update_manager.cpp:401] Received task status update acknowledgement (UUID: 8b06fe8c-d709-453b-817d-2948e50782c9) for task sleep.8c187c41-1762-11ea-a2e5-02429217540f of framework 4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000
          mesos-slave_1  | I1205 13:24:24.401926 19846 task_status_update_manager.cpp:842] Checkpointing ACK for task status update TASK_STARTING (Status UUID: 8b06fe8c-d709-453b-817d-2948e50782c9) for task sleep.8c187c41-1762-11ea-a2e5-02429217540f of framework 4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000
          mesos-slave_1  | I1205 13:24:30.481829 19850 slave.cpp:5310] Handling status update TASK_RUNNING (Status UUID: 480abd48-6a85-4703-a2f2-0455f5c4c053) for task sleep.8c187c41-1762-11ea-a2e5-02429217540f of framework 4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000 from executor(1)@10.234.172.56:16653
          mesos-slave_1  | I1205 13:24:30.482340 19848 task_status_update_manager.cpp:328] Received task status update TASK_RUNNING (Status UUID: 480abd48-6a85-4703-a2f2-0455f5c4c053) for task sleep.8c187c41-1762-11ea-a2e5-02429217540f of framework 4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000
          mesos-slave_1  | I1205 13:24:30.482550 19848 task_status_update_manager.cpp:842] Checkpointing UPDATE for task status update TASK_RUNNING (Status UUID: 480abd48-6a85-4703-a2f2-0455f5c4c053) for task sleep.8c187c41-1762-11ea-a2e5-02429217540f of framework 4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000
          mesos-slave_1  | I1205 13:24:30.483163 19848 slave.cpp:5815] Forwarding the update TASK_RUNNING (Status UUID: 480abd48-6a85-4703-a2f2-0455f5c4c053) for task sleep.8c187c41-1762-11ea-a2e5-02429217540f of framework 4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000 to master@10.234.172.34:5050
          mesos-slave_1  | I1205 13:24:30.483664 19848 slave.cpp:5726] Sending acknowledgement for status update TASK_RUNNING (Status UUID: 480abd48-6a85-4703-a2f2-0455f5c4c053) for task sleep.8c187c41-1762-11ea-a2e5-02429217540f of framework 4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000 to executor(1)@10.234.172.56:16653
          mesos-slave_1  | I1205 13:24:30.557307 19852 task_status_update_manager.cpp:401] Received task status update acknowledgement (UUID: 480abd48-6a85-4703-a2f2-0455f5c4c053) for task sleep.8c187c41-1762-11ea-a2e5-02429217540f of framework 4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000
          mesos-slave_1  | I1205 13:24:30.557467 19852 task_status_update_manager.cpp:842] Checkpointing ACK for task status update TASK_RUNNING (Status UUID: 480abd48-6a85-4703-a2f2-0455f5c4c053) for task sleep.8c187c41-1762-11ea-a2e5-02429217540f of framework 4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000
      

      An important part of this log:

      Sending acknowledgement for status update TASK_RUNNING for task sleep.8c187c41-1762-11ea-a2e5-02429217540f to executor(1)@10.234.172.56:16653
      

      Here we have the executor's address that is responsible for our newly created task: executor(1)@10.234.172.56:16653.

      Looking at the O.S process list we see one instance of mesos-docker-executor:

          ps aux | grep executor
          root     12435  0.5  1.3 851796 54100 ?        Ssl  10:24   0:03 mesos-docker-executor --cgroups_enable_cfs=true --container=mesos-53ec0ef3-3290-476a-b2b6-385099e9b923 --docker=/usr/bin/docker --docker_socket=/var/run/docker.sock --help=false --initialize_driver_logging=true --launcher_dir=/usr/libexec/mesos --logbufsecs=0 --logging_level=INFO --mapped_directory=/mnt/mesos/sandbox --quiet=false --sandbox_directory=/var/lib/mesos/agent/slaves/79ad3a13-b567-4273-ac8c-30378d35a439-S60499/frameworks/4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000/executors/sleep.8c187c41-1762-11ea-a2e5-02429217540f/runs/53ec0ef3-3290-476a-b2b6-385099e9b923 --stop_timeout=10secs
          root     12456  0.0  0.8 152636 34548 ?        Sl   10:24   0:00 /usr/bin/docker -H unix:///var/run/docker.sock run --cpu-shares 102 --cpu-quota 10000 --memory 134217728 -e HOST=10.234.172.56 -e MARATHON_APP_DOCKER_IMAGE=debian -e MARATHON_APP_ID=/sleep -e MARATHON_APP_LABELS=HOLLOWMAN_DEFAULT_SCALE -e MARATHON_APP_LABEL_HOLLOWMAN_DEFAULT_SCALE=1 -e MARATHON_APP_RESOURCE_CPUS=0.1 -e MARATHON_APP_RESOURCE_DISK=0.0 -e MARATHON_APP_RESOURCE_GPUS=0 -e MARATHON_APP_RESOURCE_MEM=128.0 -e MARATHON_APP_VERSION=2019-12-05T13:24:18.206Z -e MESOS_CONTAINER_NAME=mesos-53ec0ef3-3290-476a-b2b6-385099e9b923 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_TASK_ID=sleep.8c187c41-1762-11ea-a2e5-02429217540f -v /var/lib/mesos/agent/slaves/79ad3a13-b567-4273-ac8c-30378d35a439-S60499/frameworks/4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000/executors/sleep.8c187c41-1762-11ea-a2e5-02429217540f/runs/53ec0ef3-3290-476a-b2b6-385099e9b923:/mnt/mesos/sandbox --net host --entrypoint /bin/sh --name mesos-53ec0ef3-3290-476a-b2b6-385099e9b923 --label=hollowman.appname=/asgard/sleep --label=MESOS_TASK_ID=sleep.8c187c41-1762-11ea-a2e5-02429217540f debian -c sleep 99d
          root     16776  0.0  0.0   7960   812 pts/3    S+   10:34   0:00 tail -f /var/lib/mesos/agent/slaves/79ad3a13-b567-4273-ac8c-30378d35a439-S60499/frameworks/4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000/executors/sleep.8c187c41-1762-11ea-a2e5-02429217540f/runs/53ec0ef3-3290-476a-b2b6-385099e9b923/stderr
      

      Here we have 3 processes:

      • Mesos executor;
      • The container that this executor started, running sleep 99d;
      • tail -f looking into stderr of this executor.

      The stderr content is as follows:

          $ tail -f /var/lib/mesos/agent/slaves/79ad3a13-b567-4273-ac8c-30378d35a439-S60499/frameworks/4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000/executors/sleep.8c187c41-1762-11ea-a2e5-02429217540f/runs/53ec0ef3-3290-476a-b2b6-385099e9b923/stderr
          I1205 13:24:24.337672 12435 exec.cpp:162] Version: 1.7.3
          I1205 13:24:24.347107 12449 exec.cpp:236] Executor registered on agent 79ad3a13-b567-4273-ac8c-30378d35a439-S60499
          I1205 13:24:24.349907 12451 executor.cpp:130] Registered docker executor on 10.234.172.56
          I1205 13:24:24.350291 12449 executor.cpp:186] Starting task sleep.8c187c41-1762-11ea-a2e5-02429217540f
          WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
          W1205 13:24:29.363168 12455 executor.cpp:253] Docker inspect timed out after 5secs for container 'mesos-53ec0ef3-3290-476a-b2b6-385099e9b923'
      

      While Mesos Agent is still running it is possible to connect on this executor port.

          telnet 10.234.172.56 16653
          Trying 10.234.172.56...
          Connected to 10.234.172.56.
          Escape character is '^]'.
          ^]
          telnet> Connection closed.
      

      As soon as the Agent shutsdown (a simple docker stop), this line appears on the stderr log of tyhe executor:

          I1205 13:40:31.160290 12452 exec.cpp:518] Agent exited, but framework has checkpointing enabled. Waiting 15mins to reconnect with agent 79ad3a13-b567-4273-ac8c-30378d35a439-S60499
      

      And now, looking ate the O.S process list, the executor process is gone:

          $ ps aux | grep executor
          root     16776  0.0  0.0   7960   812 pts/3    S+   10:34   0:00 tail -f /var/lib/mesos/agent/slaves/79ad3a13-b567-4273-ac8c-30378d35a439-S60499/frameworks/4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000/executors/sleep.8c187c41-1762-11ea-a2e5-02429217540f/runs/53ec0ef3-3290-476a-b2b6-385099e9b923/stderr
      

      and we can't telnet to that port anymore:

          telnet 10.234.172.56 16653
          Trying 10.234.172.56...
          telnet: Unable to connect to remote host: Connection refused
      

      When we run the Mesos Agent again, we see this:

          mesos-slave_1  | I1205 13:42:39.925992 18421 docker.cpp:899] Recovering Docker containers
          mesos-slave_1  | I1205 13:42:44.006150 18421 docker.cpp:912] Got the list of Docker containers
          mesos-slave_1  | I1205 13:42:44.010700 18421 docker.cpp:1009] Recovering container '53ec0ef3-3290-476a-b2b6-385099e9b923' for executor 'sleep.8c187c41-1762-11ea-a2e5-02429217540f' of framework 4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000
          mesos-slave_1  | W1205 13:42:44.011874 18421 docker.cpp:1053] Failed to connect to executor 'sleep.8c187c41-1762-11ea-a2e5-02429217540f' of framework 4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000: Failed to connect to 10.234.172.56:16653: Connection refused
          mesos-slave_1  | I1205 13:42:44.016306 18421 docker.cpp:2564] Executor for container 53ec0ef3-3290-476a-b2b6-385099e9b923 has exited
          mesos-slave_1  | I1205 13:42:44.016412 18421 docker.cpp:2335] Destroying container 53ec0ef3-3290-476a-b2b6-385099e9b923 in RUNNING state
          mesos-slave_1  | I1205 13:42:44.016705 18421 docker.cpp:1133] Finished processing orphaned Docker containers
          mesos-slave_1  | I1205 13:42:44.017102 18421 docker.cpp:2385] Running docker stop on container 53ec0ef3-3290-476a-b2b6-385099e9b923
          mesos-slave_1  | I1205 13:42:44.017251 18426 slave.cpp:7205] Recovering executors
          mesos-slave_1  | I1205 13:42:44.017493 18426 slave.cpp:7229] Sending reconnect request to executor 'sleep.8c187c41-1762-11ea-a2e5-02429217540f' of framework 4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000 at executor(1)@10.234.172.56:16653
          mesos-slave_1  | W1205 13:42:44.018522 18426 process.cpp:1890] Failed to send 'mesos.internal.ReconnectExecutorMessage' to '10.234.172.56:16653', connect: Failed to connect to 10.234.172.56:16653: Connection refused
          mesos-slave_1  | I1205 13:42:44.019569 18422 slave.cpp:6336] Executor 'sleep.8c187c41-1762-11ea-a2e5-02429217540f' of framework 4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000 has terminated with unknown status
          mesos-slave_1  | I1205 13:42:44.022147 18422 slave.cpp:5310] Handling status update TASK_FAILED (Status UUID: c946cd2b-1dec-4fc3-9ea6-501b40dcf4a7) for task sleep.8c187c41-1762-11ea-a2e5-02429217540f of framework 4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000 from @0.0.0.0:0
          mesos-slave_1  | W1205 13:42:44.023823 18423 docker.cpp:1672] Ignoring updating unknown container 53ec0ef3-3290-476a-b2b6-385099e9b923
          mesos-slave_1  | I1205 13:42:44.024247 18425 task_status_update_manager.cpp:328] Received task status update TASK_FAILED (Status UUID: c946cd2b-1dec-4fc3-9ea6-501b40dcf4a7) for task sleep.8c187c41-1762-11ea-a2e5-02429217540f of framework 4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000
          mesos-slave_1  | I1205 13:42:44.024363 18425 task_status_update_manager.cpp:842] Checkpointing UPDATE for task status update TASK_FAILED (Status UUID: c946cd2b-1dec-4fc3-9ea6-501b40dcf4a7) for task sleep.8c187c41-1762-11ea-a2e5-02429217540f of framework 4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000
          mesos-slave_1  | I1205 13:42:46.020193 18423 slave.cpp:5238] Cleaning up un-reregistered executors
          mesos-slave_1  | I1205 13:42:46.020882 18423 slave.cpp:7358] Finished recovery
      

      And then a new container is scheduled to run. Marathon shows a status of TASK_FAILED with message Container terminated for the previous task.

      Special attention to the part where the Agent seems to try to connect to the executor port:

          mesos-slave_1  | I1205 13:42:44.017493 18426 slave.cpp:7229] Sending reconnect request to executor 'sleep.8c187c41-1762-11ea-a2e5-02429217540f' of framework 4783cf15-4fb1-4c75-90fe-44eeec5258a7-0000 at executor(1)@10.234.172.56:16653
          mesos-slave_1  | W1205 13:42:44.018522 18426 process.cpp:1890] Failed to send 'mesos.internal.ReconnectExecutorMessage' to '10.234.172.56:16653', connect: Failed to connect to 10.234.172.56:16653: Connection refused
      

      Looking at the mesos Agent recovery documentation the setup seems very straight forward. I'm using the default mesos-docker-executor to run all docker tasks. Is there any detail that I'm not seeing or is mis-configurated for my setup?

      Can anyone confirm that the mesos-docker-executor is capable of doing task recovery?

      Thank you very much and if you need any additional information, let me know.

      Thanks,

      Attachments

        1. logs-after.txt
          22 kB
          Dalton Matos Coelho Barreto
        2. logs-before.txt
          15 kB
          Dalton Matos Coelho Barreto

        Activity

          People

            Unassigned Unassigned
            daltonmatos Dalton Matos Coelho Barreto
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: