Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-3738

Mesos health check is invoked incorrectly when Mesos slave is within the docker container

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.25.0
    • Fix Version/s: 0.24.2, 0.25.1, 0.26.0
    • Component/s: containerization, docker
    • Labels:
      None
    • Environment:

      Description

      When Mesos slave is within the container, the COMMAND health check from Marathon is invoked incorrectly.

      In such a scenario, the sandbox directory (instead of the launcher/health-check directory) is used. This result in an error with the container.

      Command to invoke the Mesos slave container:

      sudo docker run -d -v /sys:/sys -v /usr/bin/docker:/usr/bin/docker:ro -v /usr/lib/x86_64-linux-gnu/libapparmor.so.1:/usr/lib/x86_64-linux-gnu/libapparmor.so.1:ro -v /var/run/docker.sock:/var/run/docker.sock -v /tmp/mesos:/tmp/mesos mesos mesos slave --master=zk://10.2.1.2:2181/mesos --containerizers=docker,mesos --executor_registration_timeout=5mins --docker_stop_timeout=10secs --launcher=posix
      

      Marathon JSON file:

      {
        "id": "ubuntu",
        "container":
        {
          "type": "DOCKER",
          "docker":
          {
            "image": "ubuntu",
            "network": "BRIDGE",
            "parameters": []
          }
        },
        "args": [ "bash", "-c", "while true; do echo 1; sleep 5; done" ],
        "uris": [],
        "healthChecks":
        [
          {
            "protocol": "COMMAND",
            "command": { "value": "echo Success" },
            "gracePeriodSeconds": 3000,
            "intervalSeconds": 5,
            "timeoutSeconds": 5,
            "maxConsecutiveFailures": 300
          }
        ],
        "instances": 1
      }
      
      STDOUT:
      
      root@cea2be47d64f:/mnt/mesos/sandbox# cat stdout 
      --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f" --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-0000/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f" --stop_timeout="10secs"
      --container="mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f" --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" --mapped_directory="/mnt/mesos/sandbox" --quiet="false" --sandbox_directory="/tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-0000/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f" --stop_timeout="10secs"
      Registered docker executor on b01e2e75afcb
      Starting task ubuntu.86bca10f-72c9-11e5-b36d-02420a020106
      1
      Launching health check process: /tmp/mesos/slaves/e20f8959-cd9f-40ae-987d-809401309361-S0/frameworks/e20f8959-cd9f-40ae-987d-809401309361-0000/executors/ubuntu.86bca10f-72c9-11e5-b36d-02420a020106/runs/815cc886-1cd1-4f13-8f9b-54af1f127c3f/mesos-health-check --executor=(1)@10.2.1.7:40695 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-e20f8959-cd9f-40ae-987d-809401309361-S0.815cc886-1cd1-4f13-8f9b-54af1f127c3f sh -c \" echo Success \""},"consecutive_failures":300,"delay_seconds":0.0,"grace_period_seconds":3000.0,"interval_seconds":5.0,"timeout_seconds":5.0} --task_id=ubuntu.86bca10f-72c9-11e5-b36d-02420a020106
      Health check process launched at pid: 94
      1
      1
      1
      1
      1
      
      STDERR:
      
      root@cea2be47d64f:/mnt/mesos/sandbox# cat stderr
      I1014 23:15:58.127950    56 exec.cpp:134] Version: 0.25.0
      I1014 23:15:58.130627    62 exec.cpp:208] Executor registered on slave e20f8959-cd9f-40ae-987d-809401309361-S0
      WARNING: Your kernel does not support swap limit capabilities, memory limited without swap.
      ABORT: (/tmp/mesos-build/mesos-repo/3rdparty/libprocess/src/subprocess.cpp:177): Failed to os::execvpe in childMain: No such file or directory*** Aborted at 1444864558 (unix time) try "date -d @1444864558" if you are using GNU date ***
      PC: @     0x7fc8c5975107 (unknown)
      *** SIGABRT (@0x5e) received by PID 94 (TID 0x7fc8bee5e700) from PID 94; stack trace: ***
          @     0x7fc8c5cf88d0 (unknown)
          @     0x7fc8c5975107 (unknown)
          @     0x7fc8c59764e8 (unknown)
          @           0x419142 _Abort()
          @           0x41917c _Abort()
          @     0x7fc8c7745780 process::childMain()
          @     0x7fc8c7747a49 std::_Function_handler<>::_M_invoke()
          @     0x7fc8c774561c process::defaultClone()
          @     0x7fc8c7745f81 process::subprocess()
          @           0x43c58d mesos::internal::docker::DockerExecutorProcess::launchHealthCheck()
          @     0x7fc8c771b424 process::ProcessManager::resume()
          @     0x7fc8c771b74f process::internal::schedule()
          @     0x7fc8c64d3970 (unknown)
          @     0x7fc8c5cf10a4 start_thread
          @     0x7fc8c5a2604d (unknown)
      
      1. MESOS-3738-0_23_1.patch
        2 kB
        haosdent
      2. MESOS-3738-0_24_1.patch
        2 kB
        haosdent
      3. MESOS-3738-0_25_0.patch
        2 kB
        haosdent

        Issue Links

          Activity

          Hide
          haosdent@gmail.com haosdent added a comment -

          Hi, Evan Krall I think you could patch it as I mentioned above.

          Show
          haosdent@gmail.com haosdent added a comment - Hi, Evan Krall I think you could patch it as I mentioned above.
          Hide
          meatmanek Evan Krall added a comment -

          Any chance we could get that patch applied and a version 0.23.2, 0.24.2, 0.25.2 released?

          Show
          meatmanek Evan Krall added a comment - Any chance we could get that patch applied and a version 0.23.2, 0.24.2, 0.25.2 released?
          Hide
          haosdent@gmail.com haosdent added a comment -
          Show
          haosdent@gmail.com haosdent added a comment - Hi, you need patch https://issues.apache.org/jira/secure/attachment/12766990/MESOS-3738-0_23_1.patch which I upload in attachments.
          Hide
          hindessm Mark Hindess added a comment -

          Has this fix been backported to a 0.23.x release? I'm using the latest 0.23.1 debian package and it is still broken.

          In case it helps anyone else upgrade smoothly to a working release, I am using a workaround of creating a mesos-health-check wrapper that execs the real mesos-health-check. That is:

              bash$ cat <<EOF >mesos-health-check
              > #!/bin/sh
              > exec /usr/libexec/mesos/mesos-health-check "$@"
              > EOF
              bash$ chmod 0755 mesos-health-check
              bash$ fakeroot sh -c "chown root:root mesos-health-check; \
                             tar cf - mesos-health-check |gzip -9 >mesos-health-check.tar.gz"
              bash$ tar tvzf mesos-health-check.tar.gz
              -rwxr-xr-x root/root        56 2015-12-09 07:44 mesos-health-check
              bash$ # deploy mesos-health-check.tar.gz to your mesos-slaves (I used ansible)
              bash$ # if using docker, restart your slaves with mesos-health-check.tar.gz
              bash$ # mounted as volume into your mesos-slave container
              bash$ # add file:///path/to/mesos-health-check.tar.gz to uris in app json
          
          Show
          hindessm Mark Hindess added a comment - Has this fix been backported to a 0.23.x release? I'm using the latest 0.23.1 debian package and it is still broken. In case it helps anyone else upgrade smoothly to a working release, I am using a workaround of creating a mesos-health-check wrapper that execs the real mesos-health-check. That is: bash$ cat <<EOF >mesos-health-check > #!/bin/sh > exec /usr/libexec/mesos/mesos-health-check "$@" > EOF bash$ chmod 0755 mesos-health-check bash$ fakeroot sh -c "chown root:root mesos-health-check; \ tar cf - mesos-health-check |gzip -9 >mesos-health-check.tar.gz" bash$ tar tvzf mesos-health-check.tar.gz -rwxr-xr-x root/root 56 2015-12-09 07:44 mesos-health-check bash$ # deploy mesos-health-check.tar.gz to your mesos-slaves (I used ansible) bash$ # if using docker, restart your slaves with mesos-health-check.tar.gz bash$ # mounted as volume into your mesos-slave container bash$ # add file: ///path/to/mesos-health-check.tar.gz to uris in app json
          Hide
          tnachen Timothy Chen added a comment -

          commit ef338465c3f0cec23cd489917eb1f671d550cc02
          Author: haosdent huang <haosdent@gmail.com>
          Date: Thu Oct 29 14:17:35 2015 +0000

          Fixed uncorrect launcher dir in docker executor.

          Review: https://reviews.apache.org/r/39386

          Show
          tnachen Timothy Chen added a comment - commit ef338465c3f0cec23cd489917eb1f671d550cc02 Author: haosdent huang <haosdent@gmail.com> Date: Thu Oct 29 14:17:35 2015 +0000 Fixed uncorrect launcher dir in docker executor. Review: https://reviews.apache.org/r/39386
          Hide
          haosdent@gmail.com haosdent added a comment -

          I also update Rafael Capucho your dockefile to apply the patch, see https://paste.ee/p/8u3eL .

          Show
          haosdent@gmail.com haosdent added a comment - I also update Rafael Capucho your dockefile to apply the patch, see https://paste.ee/p/8u3eL .
          Hide
          haosdent@gmail.com haosdent added a comment -

          Thank you very much for your confirm!

          Show
          haosdent@gmail.com haosdent added a comment - Thank you very much for your confirm!
          Hide
          jaytaylor Jay Taylor added a comment -

          Hi Rafael,

          I just uploaded the deb's I built on friday (latest master on friday at
          3:30pm Pacific time) which have the patches applied.

          You can grab it here: scala.sh/mesos-0.26.0-g38b2f72-0.1.20151016221956.deb

          Hope this helps!

          Best,
          Jay

          On Tue, Oct 20, 2015 at 4:54 PM, Rafael Capucho (JIRA) <jira@apache.org>

          Show
          jaytaylor Jay Taylor added a comment - Hi Rafael, I just uploaded the deb's I built on friday (latest master on friday at 3:30pm Pacific time) which have the patches applied. You can grab it here: scala.sh/mesos-0.26.0-g38b2f72-0.1.20151016221956.deb Hope this helps! Best, Jay On Tue, Oct 20, 2015 at 4:54 PM, Rafael Capucho (JIRA) <jira@apache.org>
          Hide
          rafaelcapucho Rafael Capucho added a comment -

          It will be released like 0.25.1? How can I apply this patch considering that I'm using dockerfile[1]? thank you.

          [1] - https://paste.ee/p/eryAc

          Show
          rafaelcapucho Rafael Capucho added a comment - It will be released like 0.25.1? How can I apply this patch considering that I'm using dockerfile [1] ? thank you. [1] - https://paste.ee/p/eryAc
          Hide
          jaytaylor Jay Taylor added a comment -

          I've rebuilt with the 0.25.0 patch on this ticket and confirmed that all previously failing health-check configurations now work:

          [OK] Using launcher_dir flag
          [OK] Using MESOS_LAUNCHER_DIR environment variable
          [OK] Not setting the flag or variable, health-checks now launch fine!

          Thanks Haosdent et. al.!

          Best,
          Jay

          Show
          jaytaylor Jay Taylor added a comment - I've rebuilt with the 0.25.0 patch on this ticket and confirmed that all previously failing health-check configurations now work: [OK] Using launcher_dir flag [OK] Using MESOS_LAUNCHER_DIR environment variable [OK] Not setting the flag or variable, health-checks now launch fine! Thanks Haosdent et. al.! Best, Jay
          Hide
          marco-mesos Marco Massenzio added a comment -

          Timothy Chen is not going to be around for the next couple of weeks, unfortunately.

          cc: Michael Park - could you please have a look and see if you can shepherd, please?

          Show
          marco-mesos Marco Massenzio added a comment - Timothy Chen is not going to be around for the next couple of weeks, unfortunately. cc: Michael Park - could you please have a look and see if you can shepherd, please?
          Hide
          yongtang Yong Tang added a comment -

          Thanks. Was going to spend more time to investigate this issue but is glad a fix is already there.

          Show
          yongtang Yong Tang added a comment - Thanks. Was going to spend more time to investigate this issue but is glad a fix is already there.
          Hide
          haosdent@gmail.com haosdent added a comment -

          Thank you, let me contract Timothy Chen to describe this issue.

          Show
          haosdent@gmail.com haosdent added a comment - Thank you, let me contract Timothy Chen to describe this issue.
          Hide
          anandmazumdar Anand Mazumdar added a comment -

          haosdent Can you find a shepherd for this issue ? If you already have one, can you update the JIRA. Thanks

          Show
          anandmazumdar Anand Mazumdar added a comment - haosdent Can you find a shepherd for this issue ? If you already have one, can you update the JIRA. Thanks
          Hide
          haosdent@gmail.com haosdent added a comment -

          Thanks for report. Because we launch docker executor through subprocess() and argv[0] is "mesos-docker-executor", we get basename for mesos-docker-executor would become sandbox dir. While we launch "mesos-executor" through complete path and we got the basename for it is launcher_dir.
          And if we build it from source and test "mesos-docker-executor" could pass, because we wrap "mesos-docker-executor" through a automake script. So that argv[0] would become a correct path.

          Patch: https://reviews.apache.org/r/39386/

          Show
          haosdent@gmail.com haosdent added a comment - Thanks for report. Because we launch docker executor through subprocess() and argv [0] is "mesos-docker-executor", we get basename for mesos-docker-executor would become sandbox dir. While we launch "mesos-executor" through complete path and we got the basename for it is launcher_dir. And if we build it from source and test "mesos-docker-executor" could pass, because we wrap "mesos-docker-executor" through a automake script. So that argv [0] would become a correct path. Patch: https://reviews.apache.org/r/39386/
          Hide
          yongtang Yong Tang added a comment -

          A quick workaround for this issue is to pass

          --executor_environment_variables=executor.json

          where executor.json consists of MESOS_LAUNCHER_DIR:

          {"MESOS_LAUNCHER_DIR": "/usr/libexec/mesos"}

          Though this is just a workaround. A fix for this issue is still needed in Mesos source.

          Show
          yongtang Yong Tang added a comment - A quick workaround for this issue is to pass --executor_environment_variables=executor.json where executor.json consists of MESOS_LAUNCHER_DIR: {"MESOS_LAUNCHER_DIR": "/usr/libexec/mesos"} Though this is just a workaround. A fix for this issue is still needed in Mesos source.
          Hide
          yongtang Yong Tang added a comment -

          Tried to pass environmental variable MESOS_LAUNCHER_DIR with
          sudo docker run -e MESOS_LAUNCHER_DIR=q/usr/libexec/mesos ...
          but got the same result.

          It seems that in mesos/src/docker/executor.cpp Ln 573-576:

          const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR");
          string path =
          envPath.isSome() ? envPath.get()
          : os::realpath(Path(argv[0]).dirname()).get();

          The environmental variable MESOS_LAUNCHER_DIR is not passed and argv[0] is used (which is the sandbox directory, not the launcher directory) for health check directory.

          Show
          yongtang Yong Tang added a comment - Tried to pass environmental variable MESOS_LAUNCHER_DIR with sudo docker run -e MESOS_LAUNCHER_DIR=q/usr/libexec/mesos ... but got the same result. It seems that in mesos/src/docker/executor.cpp Ln 573-576: const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR"); string path = envPath.isSome() ? envPath.get() : os::realpath(Path(argv [0] ).dirname()).get(); The environmental variable MESOS_LAUNCHER_DIR is not passed and argv [0] is used (which is the sandbox directory, not the launcher directory) for health check directory.

            People

            • Assignee:
              haosdent@gmail.com haosdent
              Reporter:
              yongtang Yong Tang
              Shepherd:
              Timothy Chen
            • Votes:
              3 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development