Uploaded image for project: 'Aurora'
  1. Aurora
  2. AURORA-1303

Thermos runner broken with non-root account

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.7.0
    • None
    • Executor
    • None

    Description

      This happens with the latest code from github.

      I'm trying to schedule the hello_world example using a non-root role. The thermos_runner crashes when it tries to write the checkpoint in the fetch_package process.

      It looks like what is happening is the runner is executing as the non-root user, but the checkpoint is owned by root.

      Unfortunately the error handling in Aurora is not very good. The exception thrown by the runner is silently swallowed, and the fetch_package process is running without showing any failures in the log files. I was able to figure out what's going on by manually running the command.

      As a workaround I added user 'ovidiu' to group 'root', since the directory containing the checkpoint has 'rwx' permissions for the group.

      This is the command:

      /usr/bin/python2.7 /var/lib/mesos/slaves/20150502-132057-838930604-5050-17297-S23/frameworks/20150502-132057-838930604-5050-17297-0000/executors/thermos-1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runs/68c1af87-c531-424f-9fdb-0840cde02815/thermos_runner.pex --setuid=ovidiu --thermos_json=/var/lib/mesos/slaves/20150502-132057-838930604-5050-17297-S23/frameworks/20150502-132057-838930604-5050-17297-0000/executors/thermos-1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runs/68c1af87-c531-424f-9fdb-0840cde02815/task.json --sandbox=/var/lib/mesos/slaves/20150502-132057-838930604-5050-17297-S23/frameworks/20150502-132057-838930604-5050-17297-0000/executors/thermos-1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runs/68c1af87-c531-424f-9fdb-0840cde02815/sandbox --log_dir=. --task_id=1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1 --log_to_disk=DEBUG --checkpoint_root=/var/run/thermos --hostname=m1a.dc

      And here is the output:

      Writing log files to disk in .
      ERROR] Found existing runner, cannot take control.
      ERROR] Unknown exception: Unable to open checkpoint /var/run/thermos/checkpoints/1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runner
      ERROR] Traceback (most recent call last):
      ERROR] File "/var/lib/mesos/slaves/20150502-132057-838930604-5050-17297-S23/frameworks/20150502-132057-838930604-5050-17297-0000/executors/thermos-1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runs/68c1af87-c531-424f-9fdb-0840cde02815/thermos_runner.pex/apache/thermos/bin/thermos_runner.py", line 176, in proxy_main
      ERROR] File "/var/lib/mesos/slaves/20150502-132057-838930604-5050-17297-S23/frameworks/20150502-132057-838930604-5050-17297-0000/executors/thermos-1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runs/68c1af87-c531-424f-9fdb-0840cde02815/thermos_runner.pex/apache/thermos/core/runner.py", line 859, in run
      ERROR] with self.control(force):
      ERROR] File "/usr/lib/python2.7/contextlib.py", line 17, in _enter_
      ERROR] return self.gen.next()
      ERROR] File "/var/lib/mesos/slaves/20150502-132057-838930604-5050-17297-S23/frameworks/20150502-132057-838930604-5050-17297-0000/executors/thermos-1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runs/68c1af87-c531-424f-9fdb-0840cde02815/thermos_runner.pex/apache/thermos/core/runner.py", line 552, in control
      ERROR] raise self.PermissionError('Unable to open checkpoint %s' % ckpt_file)
      ERROR] PermissionError: Unable to open checkpoint /var/run/thermos/checkpoints/1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runner

      Attachments

        Activity

          People

            Unassigned Unassigned
            ovidiup Ovidiu Predescu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: