Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-1565

Improve error message for external containerizer when containerizer_path results in command not found (status: 127)

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 0.19.0
    • None
    • containerization
    • None

    Description

      When attempting to run mesos-slave with an external containerizer with a bad containerizer_path the error message is misleading as to what the real problem is.

      It would be nice if the containerizer code could detect exit code 127 and have an error message to the effect of "Command not found: <containerizer_path>"

      Below is a log file illustrating the scenario I ran into.

      mesos-slave.sh --log_dir=/tmp/mesos/slave/log_dir --master=zk://localhost:2181/mesos --work_dir=/tmp/mesos/slave/work_dir --containerizer_path=/usr/local/bin/deimos --isolation=external
      I0707 17:09:00.525806 29499 logging.cpp:167] INFO level logging started!
      I0707 17:09:00.525997 29499 main.cpp:126] Build: 2014-06-12 18:09:59 by ben.whitehead
      I0707 17:09:00.526013 29499 main.cpp:128] Version: 0.19.0
      I0707 17:09:00.526023 29499 main.cpp:131] Git tag: 0.19.0
      I0707 17:09:00.526033 29499 main.cpp:135] Git SHA: 51e047524cf744ee257870eb479345646c0428ff
      I0707 17:09:00.526167 29499 main.cpp:149] Starting Mesos slave
      2014-07-07 17:09:00,526:29499(0x7fc0a747f700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5
      2014-07-07 17:09:00,526:29499(0x7fc0a747f700):ZOO_INFO@log_env@716: Client environment:host.name=xxxxxx
      2014-07-07 17:09:00,526:29499(0x7fc0a747f700):ZOO_INFO@log_env@723: Client environment:os.name=Linux
      2014-07-07 17:09:00,526:29499(0x7fc0a747f700):ZOO_INFO@log_env@724: Client environment:os.arch=3.11.10-17-desktop
      2014-07-07 17:09:00,526:29499(0x7fc0a747f700):ZOO_INFO@log_env@725: Client environment:os.version=#1 SMP PREEMPT Mon Jun 16 15:28:13 UTC 2014 (fba7c1f)
      2014-07-07 17:09:00,526:29499(0x7fc0a747f700):ZOO_INFO@log_env@733: Client environment:user.name=ben.whitehead
      2014-07-07 17:09:00,526:29499(0x7fc0a747f700):ZOO_INFO@log_env@741: Client environment:user.home=/home/ben.whitehead
      2014-07-07 17:09:00,526:29499(0x7fc0a747f700):ZOO_INFO@log_env@753: Client environment:user.dir=/home/ben.whitehead/tmp/mesos/mesos/build/bin
      2014-07-07 17:09:00,526:29499(0x7fc0a747f700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=localhost:2181 sessionTimeout=10000 watcher=0x7fc0ae7d59b0 sessionId=0 sessionPasswd=<null> context=0x7fc09c0008e0 flags=0
      2014-07-07 17:09:00,526:29499(0x7fc0a5c7c700):ZOO_INFO@check_events@1703: initiated connection to server [127.0.0.1:2181]
      I0707 17:09:00.526564 29520 slave.cpp:143] Slave started on 1)@127.0.0.2:5051
      I0707 17:09:00.526713 29520 slave.cpp:255] Slave resources: cpus(*):8; mem(*):14750; disk(*):221168; ports(*):[31000-32000]
      I0707 17:09:00.526747 29520 slave.cpp:283] Slave hostname: xxxxxx
      I0707 17:09:00.526757 29520 slave.cpp:284] Slave checkpoint: true
      I0707 17:09:00.527842 29518 state.cpp:33] Recovering state from '/tmp/mesos/slave/work_dir/meta'
      I0707 17:09:00.528142 29516 status_update_manager.cpp:193] Recovering status update manager
      I0707 17:09:00.528244 29517 external_containerizer.cpp:247] Recovering containerizer
      2014-07-07 17:09:00,544:29499(0x7fc0a5c7c700):ZOO_INFO@check_events@1750: session establishment complete on server [127.0.0.1:2181], sessionId=0x14712df6ba4000e, negotiated timeout=10000
      I0707 17:09:00.544852 29516 group.cpp:310] Group process ((4)@127.0.0.2:5051) connected to ZooKeeper
      I0707 17:09:00.544888 29516 group.cpp:784] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
      I0707 17:09:00.544900 29516 group.cpp:382] Trying to create path '/mesos' in ZooKeeper
      I0707 17:09:00.545446 29518 detector.cpp:135] Detected a new leader: (id='0')
      I0707 17:09:00.545524 29515 group.cpp:655] Trying to get '/mesos/info_0000000000' in ZooKeeper
      I0707 17:09:00.545805 29517 detector.cpp:377] A new leading master (UPID=master@127.0.0.2:5050) is detected
      Failed to perform recovery: Recover failed: External containerizer failed (status: 127)
      To remedy this do as follows:
      Step 1: rm -f /tmp/mesos/slave/work_dir/meta/slaves/latest
              This ensures slave doesn't recover old live executors.
      Step 2: Restart the slave.
      
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            BenWhitehead Ben Whitehead
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: