Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
None
-
Mesosphere Sprint 49
-
1
Description
With the current Mesos master state (commit 42e515bc5c175a318e914d34473016feda4db6ff), the Docker executor segfaults during shutdown.
Steps to reproduce:
1) Start master:
$ ./bin/mesos-master.sh --ip=127.0.0.1 --work_dir=/tmp/jp/mesos WARNING: Logging before InitGoogleLogging() is written to STDERR I0125 13:41:15.963775 14744 main.cpp:278] Build: 2017-01-25 13:37:42 by jp I0125 13:41:15.963868 14744 main.cpp:279] Version: 1.2.0 I0125 13:41:15.963877 14744 main.cpp:286] Git SHA: 42e515bc5c175a318e914d34473016feda4db6ff
(note that building it at 13:37 is not part of the repro)
2) Start agent:
$ ./bin/mesos-slave.sh --containerizers=mesos,docker --master=127.0.0.1:5050 --work_dir=/tmp/jp/mesos
3) Run mesos-execute with the Docker containerizer:
$ ./src/mesos-execute --master=127.0.0.1:5050 --name=testcommand --containerizer=docker --docker_image=debian --command=env I0125 13:43:59.704973 14951 scheduler.cpp:184] Version: 1.2.0 I0125 13:43:59.706425 14952 scheduler.cpp:470] New master detected at master@127.0.0.1:5050 Subscribed with ID 57596743-06f4-45f1-a975-348cf70589b1-0000 Submitted task 'testcommand' to agent '57596743-06f4-45f1-a975-348cf70589b1-S0' Received status update TASK_RUNNING for task 'testcommand' source: SOURCE_EXECUTOR Received status update TASK_FINISHED for task 'testcommand' message: 'Container exited with status 0' source: SOURCE_EXECUTOR
Relevant agent output that shows the executor segfault:
[...] I0125 13:44:16.249191 14823 slave.cpp:4328] Got exited event for executor(1)@192.99.40.208:33529 I0125 13:44:16.347095 14830 docker.cpp:2358] Executor for container 396282a9-7bf0-48ee-ba07-3ff2ca801d53 has exited I0125 13:44:16.347127 14830 docker.cpp:2052] Destroying container 396282a9-7bf0-48ee-ba07-3ff2ca801d53 I0125 13:44:16.347439 14830 docker.cpp:2179] Running docker stop on container 396282a9-7bf0-48ee-ba07-3ff2ca801d53 I0125 13:44:16.349215 14826 slave.cpp:4691] Executor 'testcommand' of framework 57596743-06f4-45f1-a975-348cf70589b1-0000 terminated with signal Segmentation fault (core dumped) [...]
The complete task stderr:
$ cat /tmp/jp/mesos/slaves/57596743-06f4-45f1-a975-348cf70589b1-S0/frameworks/57596743-06f4-45f1-a975-348cf70589b1-0000/executors/testcommand/runs/latest/stderr I0125 13:44:12.850073 15030 exec.cpp:162] Version: 1.2.0 I0125 13:44:12.864229 15050 exec.cpp:237] Executor registered on agent 57596743-06f4-45f1-a975-348cf70589b1-S0 I0125 13:44:12.865842 15054 docker.cpp:850] Running docker -H unix:///var/run/docker.sock run --cpu-shares 1024 --memory 134217728 --env-file /tmp/xFZ8G9 -v /tmp/jp/mesos/slaves/57596743-06f4-45f1-a975-348cf70589b1-S0/frameworks/57596743-06f4-45f1-a975-348cf70589b1-0000/executors/testcommand/runs/396282a9-7bf0-48ee-ba07-3ff2ca801d53:/mnt/mesos/sandbox --net host --entrypoint /bin/sh --name mesos-57596743-06f4-45f1-a975-348cf70589b1-S0.396282a9-7bf0-48ee-ba07-3ff2ca801d53 debian -c env I0125 13:44:15.248721 15064 exec.cpp:410] Executor asked to shutdown *** Aborted at 1485369856 (unix time) try "date -d @1485369856" if you are using GNU date *** PC: @ 0x7fb38f153dd0 (unknown) *** SIGSEGV (@0x68) received by PID 15030 (TID 0x7fb3961a88c0) from PID 104; stack trace: *** @ 0x7fb38f15b5c0 (unknown) @ 0x7fb38f153dd0 (unknown) @ 0x7fb39332c607 __gthread_mutex_lock() @ 0x7fb39332c657 __gthread_recursive_mutex_lock() @ 0x7fb39332edca std::recursive_mutex::lock() @ 0x7fb393337bd8 _ZZ11synchronizeISt15recursive_mutexE12SynchronizedIT_EPS2_ENKUlPS0_E_clES5_ @ 0x7fb393337bf8 _ZZ11synchronizeISt15recursive_mutexE12SynchronizedIT_EPS2_ENUlPS0_E_4_FUNES5_ @ 0x7fb39333ba6b Synchronized<>::Synchronized() @ 0x7fb393337cac synchronize<>() @ 0x7fb39492f15c process::ProcessManager::wait() @ 0x7fb3949353f0 process::wait() @ 0x55fd63f31fe5 process::wait() @ 0x7fb39332ce3c mesos::MesosExecutorDriver::~MesosExecutorDriver() @ 0x55fd63f2bd86 main @ 0x7fb38e4fc401 __libc_start_main @ 0x55fd63f2ab5a _start