[MESOS-1565] Improve error message for external containerizer when containerizer_path results in command not found (status: 127) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Won't Fix
Affects Version/s: 0.19.0
Fix Version/s: None
Component/s: containerization
Labels:
None

Description

When attempting to run mesos-slave with an external containerizer with a bad containerizer_path the error message is misleading as to what the real problem is.

It would be nice if the containerizer code could detect exit code 127 and have an error message to the effect of "Command not found: <containerizer_path>"

Below is a log file illustrating the scenario I ran into.

mesos-slave.sh --log_dir=/tmp/mesos/slave/log_dir --master=zk://localhost:2181/mesos --work_dir=/tmp/mesos/slave/work_dir --containerizer_path=/usr/local/bin/deimos --isolation=external
I0707 17:09:00.525806 29499 logging.cpp:167] INFO level logging started!
I0707 17:09:00.525997 29499 main.cpp:126] Build: 2014-06-12 18:09:59 by ben.whitehead
I0707 17:09:00.526013 29499 main.cpp:128] Version: 0.19.0
I0707 17:09:00.526023 29499 main.cpp:131] Git tag: 0.19.0
I0707 17:09:00.526033 29499 main.cpp:135] Git SHA: 51e047524cf744ee257870eb479345646c0428ff
I0707 17:09:00.526167 29499 main.cpp:149] Starting Mesos slave
2014-07-07 17:09:00,526:29499(0x7fc0a747f700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5
2014-07-07 17:09:00,526:29499(0x7fc0a747f700):ZOO_INFO@log_env@716: Client environment:host.name=xxxxxx
2014-07-07 17:09:00,526:29499(0x7fc0a747f700):ZOO_INFO@log_env@723: Client environment:os.name=Linux
2014-07-07 17:09:00,526:29499(0x7fc0a747f700):ZOO_INFO@log_env@724: Client environment:os.arch=3.11.10-17-desktop
2014-07-07 17:09:00,526:29499(0x7fc0a747f700):ZOO_INFO@log_env@725: Client environment:os.version=#1 SMP PREEMPT Mon Jun 16 15:28:13 UTC 2014 (fba7c1f)
2014-07-07 17:09:00,526:29499(0x7fc0a747f700):ZOO_INFO@log_env@733: Client environment:user.name=ben.whitehead
2014-07-07 17:09:00,526:29499(0x7fc0a747f700):ZOO_INFO@log_env@741: Client environment:user.home=/home/ben.whitehead
2014-07-07 17:09:00,526:29499(0x7fc0a747f700):ZOO_INFO@log_env@753: Client environment:user.dir=/home/ben.whitehead/tmp/mesos/mesos/build/bin
2014-07-07 17:09:00,526:29499(0x7fc0a747f700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=localhost:2181 sessionTimeout=10000 watcher=0x7fc0ae7d59b0 sessionId=0 sessionPasswd=<null> context=0x7fc09c0008e0 flags=0
2014-07-07 17:09:00,526:29499(0x7fc0a5c7c700):ZOO_INFO@check_events@1703: initiated connection to server [127.0.0.1:2181]
I0707 17:09:00.526564 29520 slave.cpp:143] Slave started on 1)@127.0.0.2:5051
I0707 17:09:00.526713 29520 slave.cpp:255] Slave resources: cpus(*):8; mem(*):14750; disk(*):221168; ports(*):[31000-32000]
I0707 17:09:00.526747 29520 slave.cpp:283] Slave hostname: xxxxxx
I0707 17:09:00.526757 29520 slave.cpp:284] Slave checkpoint: true
I0707 17:09:00.527842 29518 state.cpp:33] Recovering state from '/tmp/mesos/slave/work_dir/meta'
I0707 17:09:00.528142 29516 status_update_manager.cpp:193] Recovering status update manager
I0707 17:09:00.528244 29517 external_containerizer.cpp:247] Recovering containerizer
2014-07-07 17:09:00,544:29499(0x7fc0a5c7c700):ZOO_INFO@check_events@1750: session establishment complete on server [127.0.0.1:2181], sessionId=0x14712df6ba4000e, negotiated timeout=10000
I0707 17:09:00.544852 29516 group.cpp:310] Group process ((4)@127.0.0.2:5051) connected to ZooKeeper
I0707 17:09:00.544888 29516 group.cpp:784] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I0707 17:09:00.544900 29516 group.cpp:382] Trying to create path '/mesos' in ZooKeeper
I0707 17:09:00.545446 29518 detector.cpp:135] Detected a new leader: (id='0')
I0707 17:09:00.545524 29515 group.cpp:655] Trying to get '/mesos/info_0000000000' in ZooKeeper
I0707 17:09:00.545805 29517 detector.cpp:377] A new leading master (UPID=master@127.0.0.2:5050) is detected
Failed to perform recovery: Recover failed: External containerizer failed (status: 127)
To remedy this do as follows:
Step 1: rm -f /tmp/mesos/slave/work_dir/meta/slaves/latest
        This ensures slave doesn't recover old live executors.
Step 2: Restart the slave.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Ben Whitehead

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 08/Jul/14 00:28

Updated:: 16/Aug/16 19:20

Resolved:: 16/Aug/16 19:20