Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
0.19.0
-
None
-
None
Description
When attempting to run mesos-slave with an external containerizer with a bad containerizer_path the error message is misleading as to what the real problem is.
It would be nice if the containerizer code could detect exit code 127 and have an error message to the effect of "Command not found: <containerizer_path>"
Below is a log file illustrating the scenario I ran into.
mesos-slave.sh --log_dir=/tmp/mesos/slave/log_dir --master=zk://localhost:2181/mesos --work_dir=/tmp/mesos/slave/work_dir --containerizer_path=/usr/local/bin/deimos --isolation=external I0707 17:09:00.525806 29499 logging.cpp:167] INFO level logging started! I0707 17:09:00.525997 29499 main.cpp:126] Build: 2014-06-12 18:09:59 by ben.whitehead I0707 17:09:00.526013 29499 main.cpp:128] Version: 0.19.0 I0707 17:09:00.526023 29499 main.cpp:131] Git tag: 0.19.0 I0707 17:09:00.526033 29499 main.cpp:135] Git SHA: 51e047524cf744ee257870eb479345646c0428ff I0707 17:09:00.526167 29499 main.cpp:149] Starting Mesos slave 2014-07-07 17:09:00,526:29499(0x7fc0a747f700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5 2014-07-07 17:09:00,526:29499(0x7fc0a747f700):ZOO_INFO@log_env@716: Client environment:host.name=xxxxxx 2014-07-07 17:09:00,526:29499(0x7fc0a747f700):ZOO_INFO@log_env@723: Client environment:os.name=Linux 2014-07-07 17:09:00,526:29499(0x7fc0a747f700):ZOO_INFO@log_env@724: Client environment:os.arch=3.11.10-17-desktop 2014-07-07 17:09:00,526:29499(0x7fc0a747f700):ZOO_INFO@log_env@725: Client environment:os.version=#1 SMP PREEMPT Mon Jun 16 15:28:13 UTC 2014 (fba7c1f) 2014-07-07 17:09:00,526:29499(0x7fc0a747f700):ZOO_INFO@log_env@733: Client environment:user.name=ben.whitehead 2014-07-07 17:09:00,526:29499(0x7fc0a747f700):ZOO_INFO@log_env@741: Client environment:user.home=/home/ben.whitehead 2014-07-07 17:09:00,526:29499(0x7fc0a747f700):ZOO_INFO@log_env@753: Client environment:user.dir=/home/ben.whitehead/tmp/mesos/mesos/build/bin 2014-07-07 17:09:00,526:29499(0x7fc0a747f700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=localhost:2181 sessionTimeout=10000 watcher=0x7fc0ae7d59b0 sessionId=0 sessionPasswd=<null> context=0x7fc09c0008e0 flags=0 2014-07-07 17:09:00,526:29499(0x7fc0a5c7c700):ZOO_INFO@check_events@1703: initiated connection to server [127.0.0.1:2181] I0707 17:09:00.526564 29520 slave.cpp:143] Slave started on 1)@127.0.0.2:5051 I0707 17:09:00.526713 29520 slave.cpp:255] Slave resources: cpus(*):8; mem(*):14750; disk(*):221168; ports(*):[31000-32000] I0707 17:09:00.526747 29520 slave.cpp:283] Slave hostname: xxxxxx I0707 17:09:00.526757 29520 slave.cpp:284] Slave checkpoint: true I0707 17:09:00.527842 29518 state.cpp:33] Recovering state from '/tmp/mesos/slave/work_dir/meta' I0707 17:09:00.528142 29516 status_update_manager.cpp:193] Recovering status update manager I0707 17:09:00.528244 29517 external_containerizer.cpp:247] Recovering containerizer 2014-07-07 17:09:00,544:29499(0x7fc0a5c7c700):ZOO_INFO@check_events@1750: session establishment complete on server [127.0.0.1:2181], sessionId=0x14712df6ba4000e, negotiated timeout=10000 I0707 17:09:00.544852 29516 group.cpp:310] Group process ((4)@127.0.0.2:5051) connected to ZooKeeper I0707 17:09:00.544888 29516 group.cpp:784] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0) I0707 17:09:00.544900 29516 group.cpp:382] Trying to create path '/mesos' in ZooKeeper I0707 17:09:00.545446 29518 detector.cpp:135] Detected a new leader: (id='0') I0707 17:09:00.545524 29515 group.cpp:655] Trying to get '/mesos/info_0000000000' in ZooKeeper I0707 17:09:00.545805 29517 detector.cpp:377] A new leading master (UPID=master@127.0.0.2:5050) is detected Failed to perform recovery: Recover failed: External containerizer failed (status: 127) To remedy this do as follows: Step 1: rm -f /tmp/mesos/slave/work_dir/meta/slaves/latest This ensures slave doesn't recover old live executors. Step 2: Restart the slave.