Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-4812

Mesos fails to escape command health checks

    Details

    • Type: Bug
    • Status: Reviewable
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.25.0
    • Fix Version/s: None
    • Component/s: None

      Description

      As described in https://github.com/mesosphere/marathon/issues/3333
      I would like to run a command health check

      /bin/bash -c "</dev/tcp/$HOST/$PORT0"
      

      The health check fails because Mesos, while running the command inside double quotes of a sh -c "" doesn't escape the double quotes in the command.

      If I escape the double quotes myself the command health check succeeds. But this would mean that the user needs intimate knowledge of how Mesos executes his commands which can't be right.

      I was told this is not a Marathon but a Mesos issue so am opening this JIRA. I don't know if this only affects the command health check.

        Activity

        Hide
        haosdent@gmail.com haosdent added a comment - - edited

        Hi, do you use MesosContainerizer or DockerContainerizer? Lukas Loesche

        As I checked by HealthCheckTest.HealthyTaskShellEscape

        All below ways could pass:

          // First way.
          command.set_shell(true);
          command.set_value("bash -c \"</dev/tcp/$HOST/$PORT0\"");
        
          // Second way.
          command.set_shell(true);
          command.set_value("</dev/tcp/$HOST/$PORT0");
        
          // Third way.
          command.set_shell(false);
          command.set_value("bash");
          command.add_arguments("bash");
          command.add_arguments("-c");
          command.add_arguments("</dev/tcp/$HOST/$PORT0");
        

        And I am going to set up a marathon to test e2e.

        Show
        haosdent@gmail.com haosdent added a comment - - edited Hi, do you use MesosContainerizer or DockerContainerizer? Lukas Loesche As I checked by HealthCheckTest.HealthyTaskShellEscape All below ways could pass: // First way. command.set_shell( true ); command.set_value( "bash -c \" </dev/tcp/$HOST/$PORT0\""); // Second way. command.set_shell( true ); command.set_value( "</dev/tcp/$HOST/$PORT0" ); // Third way. command.set_shell( false ); command.set_value( "bash" ); command.add_arguments( "bash" ); command.add_arguments( "-c" ); command.add_arguments( "</dev/tcp/$HOST/$PORT0" ); And I am going to set up a marathon to test e2e.
        Hide
        haosdent@gmail.com haosdent added a comment -

        My marathon task json

        {
          "app": {
            "id": "/test-health",
            "cmd": "sleep 200",
            "args": null,
            "user": null,
            "env": {},
            "instances": 1,
            "cpus": 1,
            "mem": 128,
            "disk": 0,
            "executor": "",
            "constraints": [],
            "uris": [],
            "fetch": [],
            "storeUrls": [],
            "ports": [
              10000
            ],
            "portDefinitions": [
              {
                "port": 10000,
                "protocol": "tcp",
                "labels": {}
              }
            ],
            "requirePorts": false,
            "backoffSeconds": 1,
            "backoffFactor": 1.15,
            "maxLaunchDelaySeconds": 3600,
            "container": null,
            "healthChecks": [
              {
                "protocol": "COMMAND",
                "command": {
                  "value": "bash -c \"</dev/tcp/www.google.com/80\""
                },
                "gracePeriodSeconds": 300,
                "intervalSeconds": 60,
                "timeoutSeconds": 20,
                "maxConsecutiveFailures": 3,
                "ignoreHttp1xx": false
              }
            ],
            "readinessChecks": [],
            "dependencies": [],
            "upgradeStrategy": {
              "minimumHealthCapacity": 1,
              "maximumOverCapacity": 1
            },
            "labels": {},
            "acceptedResourceRoles": null,
            "ipAddress": null,
            "version": "2016-04-17T06:24:40.457Z",
            "residency": null,
            "versionInfo": {
              "lastScalingAt": "2016-04-17T06:24:40.457Z",
              "lastConfigChangeAt": "2016-04-17T06:24:40.457Z"
            },
            "tasksStaged": 0,
            "tasksRunning": 1,
            "tasksHealthy": 1,
            "tasksUnhealthy": 0,
            "deployments": [],
            "tasks": [
              {
                "id": "test-health.1108b4a6-0465-11e6-9af4-0242d20a0294",
                "slaveId": "7f8c8ce6-45b2-4e63-a5c4-a88e08af12ed-S0",
                "host": "127.0.0.1",
                "startedAt": "2016-04-17T06:24:40.770Z",
                "stagedAt": "2016-04-17T06:24:40.537Z",
                "ports": [
                  31619
                ],
                "version": "2016-04-17T06:24:40.457Z",
                "ipAddresses": [
                  {
                    "ipAddress": "127.0.0.1",
                    "protocol": "IPv4"
                  }
                ],
                "appId": "/test-health",
                "healthCheckResults": [
                  {
                    "alive": true,
                    "consecutiveFailures": 0,
                    "firstSuccess": "2016-04-17T06:24:40.924Z",
                    "lastFailure": null,
                    "lastSuccess": "2016-04-17T06:24:40.924Z",
                    "lastFailureCause": null,
                    "taskId": "test-health.1108b4a6-0465-11e6-9af4-0242d20a0294"
                  }
                ]
              }
            ]
          }
        }
        

        Which could pass health check in Mesos

        Registered executor on 127.0.0.1
        Starting task test-health.1108b4a6-0465-11e6-9af4-0242d20a0294
        sh -c 'sleep 200'
        Forked command at 27046
        Launching health check process: /home/haosdent/mesos/build/src/mesos-health-check --executor=(1)@127.0.0.1:39822 --health_check_json={"command":{"shell":true,"value":"bash -c \"<\/dev\/tcp\/www.google.com\/80\""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":300.0,"interval_seconds":60.0,"timeout_seconds":20.0} --task_id=test-health.1108b4a6-0465-11e6-9af4-0242d20a0294
        Health check process launched at pid: 27047
        Received task health update, healthy: true
        
        Show
        haosdent@gmail.com haosdent added a comment - My marathon task json { "app" : { "id" : "/test-health" , "cmd" : "sleep 200" , "args" : null , "user" : null , "env" : {}, "instances" : 1, "cpus" : 1, "mem" : 128, "disk" : 0, "executor" : "", "constraints" : [], "uris" : [], "fetch" : [], "storeUrls" : [], "ports" : [ 10000 ], "portDefinitions" : [ { "port" : 10000, "protocol" : "tcp" , "labels" : {} } ], "requirePorts" : false , "backoffSeconds" : 1, "backoffFactor" : 1.15, "maxLaunchDelaySeconds" : 3600, "container" : null , "healthChecks" : [ { "protocol" : "COMMAND" , "command" : { "value" : "bash -c \" </dev/tcp/www.google.com/80\"" }, "gracePeriodSeconds" : 300, "intervalSeconds" : 60, "timeoutSeconds" : 20, "maxConsecutiveFailures" : 3, "ignoreHttp1xx" : false } ], "readinessChecks" : [], "dependencies" : [], "upgradeStrategy" : { "minimumHealthCapacity" : 1, "maximumOverCapacity" : 1 }, "labels" : {}, "acceptedResourceRoles" : null , "ipAddress" : null , "version" : "2016-04-17T06:24:40.457Z" , "residency" : null , "versionInfo" : { "lastScalingAt" : "2016-04-17T06:24:40.457Z" , "lastConfigChangeAt" : "2016-04-17T06:24:40.457Z" }, "tasksStaged" : 0, "tasksRunning" : 1, "tasksHealthy" : 1, "tasksUnhealthy" : 0, "deployments" : [], "tasks" : [ { "id" : "test-health.1108b4a6-0465-11e6-9af4-0242d20a0294" , "slaveId" : "7f8c8ce6-45b2-4e63-a5c4-a88e08af12ed-S0" , "host" : "127.0.0.1" , "startedAt" : "2016-04-17T06:24:40.770Z" , "stagedAt" : "2016-04-17T06:24:40.537Z" , "ports" : [ 31619 ], "version" : "2016-04-17T06:24:40.457Z" , "ipAddresses" : [ { "ipAddress" : "127.0.0.1" , "protocol" : "IPv4" } ], "appId" : "/test-health" , "healthCheckResults" : [ { "alive" : true , "consecutiveFailures" : 0, "firstSuccess" : "2016-04-17T06:24:40.924Z" , "lastFailure" : null , "lastSuccess" : "2016-04-17T06:24:40.924Z" , "lastFailureCause" : null , "taskId" : "test-health.1108b4a6-0465-11e6-9af4-0242d20a0294" } ] } ] } } Which could pass health check in Mesos Registered executor on 127.0.0.1 Starting task test-health.1108b4a6-0465-11e6-9af4-0242d20a0294 sh -c 'sleep 200' Forked command at 27046 Launching health check process: /home/haosdent/mesos/build/src/mesos-health-check --executor=(1)@127.0.0.1:39822 --health_check_json={ "command" :{ "shell" : true , "value" : "bash -c \" <\/dev\/tcp\/www.google.com\/80\ ""}," consecutive_failures ":3," delay_seconds ":0.0," grace_period_seconds ":300.0," interval_seconds ":60.0," timeout_seconds":20.0} --task_id=test-health.1108b4a6-0465-11e6-9af4-0242d20a0294 Health check process launched at pid: 27047 Received task health update, healthy: true
        Hide
        lloesche Lukas Loesche added a comment -

        Hi, thanks for looking into this! I know how to make the health check pass. In the Github Link above I explained how I worked around the problem which is essentially your first solution.

        The second solution is broken for systems that don't use bash for /bin/sh as /dev/tcp is a bash only thing.

        Anyways finding some workaround is not the problem. This issue is about Mesos (or Marathon) doing the wrong thing, imho. Why would the user need to know about the details of the implementation to get a valid shell command executed?

        Like why would you expect the user to read the Mesos source code to find out his command is executed inside a

        /bin/sh -c ""
        

        and that's why he has to escape double quotes in his own command before sending it to Mesos. That seems unreasonable to me.

        Show
        lloesche Lukas Loesche added a comment - Hi, thanks for looking into this! I know how to make the health check pass. In the Github Link above I explained how I worked around the problem which is essentially your first solution. The second solution is broken for systems that don't use bash for /bin/sh as /dev/tcp is a bash only thing. Anyways finding some workaround is not the problem. This issue is about Mesos (or Marathon) doing the wrong thing, imho. Why would the user need to know about the details of the implementation to get a valid shell command executed? Like why would you expect the user to read the Mesos source code to find out his command is executed inside a /bin/sh -c "" and that's why he has to escape double quotes in his own command before sending it to Mesos. That seems unreasonable to me.
        Hide
        haosdent@gmail.com haosdent added a comment -

        Lukas Loesche Sorry for my unclear explain. I mean I could not reproduce the problem you mentioned in github link. As you see, I use

            "healthChecks": [
                ...
                "command": {
                  "value": "bash -c \"</dev/tcp/www.google.com/80\""
                },
            ...
        

        in marathon task definition. Which you mentioned didn't work in github link.

        And

          // First way.
          command.set_shell(true);
          command.set_value("bash -c \"</dev/tcp/$HOST/$PORT0\"");
        

        is equal to

        {
              "protocol": "COMMAND",
              "command": { "value": "/bin/bash -c \"</dev/tcp/$HOST/$PORT0\"" }
            }
        

        the one you mentioned in github link. Because in "...", you have to add slash if you want to put ".

          {
            "protocol": "COMMAND",
            "command": { "value": "/bin/bash -c \\\"</dev/tcp/$HOST/$PORT0\\\"" }
          }
        

        is equal to

          command.set_shell(true);
          command.set_value("bash -c \\\"</dev/tcp/$HOST/$PORT0\\\"");
        

        However, this is not work for me. I use MesosContainerizer and marathon master branch, would you double check this again?

        Show
        haosdent@gmail.com haosdent added a comment - Lukas Loesche Sorry for my unclear explain. I mean I could not reproduce the problem you mentioned in github link. As you see, I use "healthChecks" : [ ... "command" : { "value" : "bash -c \" </dev/tcp/www.google.com/80\"" }, ... in marathon task definition. Which you mentioned didn't work in github link. And // First way. command.set_shell( true ); command.set_value( "bash -c \" </dev/tcp/$HOST/$PORT0\""); is equal to { "protocol" : "COMMAND" , "command" : { "value" : "/bin/bash -c \" </dev/tcp/$HOST/$PORT0\"" } } the one you mentioned in github link. Because in "..." , you have to add slash if you want to put " . { "protocol" : "COMMAND" , "command" : { "value" : "/bin/bash -c \\\" </dev/tcp/$HOST/$PORT0\\\"" } } is equal to command.set_shell( true ); command.set_value( "bash -c \\\" </dev/tcp/$HOST/$PORT0\\\""); However, this is not work for me. I use MesosContainerizer and marathon master branch, would you double check this again?
        Hide
        lloesche Lukas Loesche added a comment - - edited

        I think you're mixing up syntax. What you wrote was

        command.set_value("bash -c \"</dev/tcp/$HOST/$PORT0\"");
        

        which is a direct call to set_value().

        Whereas what I wrote is

            {
              "protocol": "COMMAND",
              "command": { "value": "/bin/bash -c \"</dev/tcp/$HOST/$PORT0\"" }
            }
        

        as part of a JSON file! Remember that the JSON data must also be escaped.
        The escaped double quote \" will be replaced by the JSON parser.

        So the command that's being send to Mesos ends up being:

        /bin/bash -c "</dev/tcp/$HOST/$PORT0"
        

        which is a perfectly valid shell command but will not execute successfully.

        What you have to send to Mesos is

        /bin/bash -c \"</dev/tcp/$HOST/$PORT0\"
        

        because Mesos will take that string and put a /bin/sh -c "" around it.

        To do that the JSON file that you're sending to Marathon has to look like this:

          {
            "protocol": "COMMAND",
            "command": { "value": "/bin/bash -c \\\"</dev/tcp/$HOST/$PORT0\\\"" }
          }
        

        So here I'm escaping the backslash as well as the double quotes inside the JSON string so that I'm left with one backslash and double quotes after the JSON parser has processed the data. That can then be send to Mesos and successfully executed.

        But as you can see from this conversation this gets confusing really quickly even for people working with it daily. So how can we expect our users to work with it.

        Show
        lloesche Lukas Loesche added a comment - - edited I think you're mixing up syntax. What you wrote was command.set_value("bash -c \"</dev/tcp/$HOST/$PORT0\""); which is a direct call to set_value(). Whereas what I wrote is { "protocol": "COMMAND", "command": { "value": "/bin/bash -c \"</dev/tcp/$HOST/$PORT0\"" } } as part of a JSON file! Remember that the JSON data must also be escaped. The escaped double quote \" will be replaced by the JSON parser. So the command that's being send to Mesos ends up being: /bin/bash -c "</dev/tcp/$HOST/$PORT0" which is a perfectly valid shell command but will not execute successfully. What you have to send to Mesos is /bin/bash -c \"</dev/tcp/$HOST/$PORT0\" because Mesos will take that string and put a /bin/sh -c "" around it. To do that the JSON file that you're sending to Marathon has to look like this: { "protocol": "COMMAND", "command": { "value": "/bin/bash -c \\\"</dev/tcp/$HOST/$PORT0\\\"" } } So here I'm escaping the backslash as well as the double quotes inside the JSON string so that I'm left with one backslash and double quotes after the JSON parser has processed the data. That can then be send to Mesos and successfully executed. But as you can see from this conversation this gets confusing really quickly even for people working with it daily. So how can we expect our users to work with it.
        Hide
        haosdent@gmail.com haosdent added a comment -

        What you have to send to Mesos is
        /bin/bash -c \"</dev/tcp/$HOST/$PORT0\"

        Actually this is not true. You know I use C++ here, and I have to add \, just like why you need add \ in json

        command.set_value("bash -c \"</dev/tcp/$HOST/$PORT0\"");
        

        If I don't add \, compile would failed. Because

        command.set_value("bash -c "</dev/tcp/$HOST/$PORT0"");
        

        is illegal C++ sytnax.

        As you see,

        "bash -c \"</dev/tcp/$HOST/$PORT0\""
        

        would become the string

        bash -c </dev/tcp/$HOST/$PORT0
        

        in C++.

        Anyway, do you mind try my task definition in your side?

        {
        	"id": "/test-health",
        	"cmd": "sleep 200",
        	"healthChecks": [{
        		"protocol": "COMMAND",
        		"command": {
        			"value": "bash -c \"</dev/tcp/www.google.com/80\""
        		},
        		"gracePeriodSeconds": 300,
        		"intervalSeconds": 60,
        		"timeoutSeconds": 20,
        		"maxConsecutiveFailures": 3,
        		"ignoreHttp1xx": false
        	}]
        }
        

        I sure it would running correctly in my marathon. If you not convenience at this, I would like to send you a video to prove this.

        Show
        haosdent@gmail.com haosdent added a comment - What you have to send to Mesos is /bin/bash -c \"</dev/tcp/$HOST/$PORT0\" Actually this is not true. You know I use C++ here, and I have to add \, just like why you need add \ in json command.set_value( "bash -c \" </dev/tcp/$HOST/$PORT0\""); If I don't add \, compile would failed. Because command.set_value( "bash -c " </dev/tcp/$HOST/$PORT0""); is illegal C++ sytnax. As you see, "bash -c \" </dev/tcp/$HOST/$PORT0\"" would become the string bash -c </dev/tcp/$HOST/$PORT0 in C++. Anyway, do you mind try my task definition in your side? { "id" : "/test-health" , "cmd" : "sleep 200" , "healthChecks" : [{ "protocol" : "COMMAND" , "command" : { "value" : "bash -c \" </dev/tcp/www.google.com/80\"" }, "gracePeriodSeconds" : 300, "intervalSeconds" : 60, "timeoutSeconds" : 20, "maxConsecutiveFailures" : 3, "ignoreHttp1xx" : false }] } I sure it would running correctly in my marathon. If you not convenience at this, I would like to send you a video to prove this.
        Hide
        haosdent@gmail.com haosdent added a comment -

        The gif a bit large although I have already compress it.

        It show my steps in marathon I mentioned above.

        Show
        haosdent@gmail.com haosdent added a comment - The gif a bit large although I have already compress it. It show my steps in marathon I mentioned above.
        Hide
        haosdent@gmail.com haosdent added a comment -

        Lukas Loesche I closed this because can not reproduce, free feel to reopen it if you can reproduce it. Thanks a lot!

        Show
        haosdent@gmail.com haosdent added a comment - Lukas Loesche I closed this because can not reproduce, free feel to reopen it if you can reproduce it. Thanks a lot!
        Hide
        haosdent@gmail.com haosdent added a comment -

        Reopen this ticket and wait for the details from Lukas Loesche about how to reproduce it.

        Show
        haosdent@gmail.com haosdent added a comment - Reopen this ticket and wait for the details from Lukas Loesche about how to reproduce it.
        Hide
        haosdent@gmail.com haosdent added a comment -

        Thanks Lukas Loesche's help. I could reproduce by this application definition

        {
          "id": "/test",
          "cmd": null,
          "cpus": 1,
          "mem": 128,
          "disk": 0,
          "instances": 1,
          "executor": null,
          "fetch": null,
          "constraints": null,
          "acceptedResourceRoles": null,
          "user": null,
          "container": {
            "docker": {
              "image": "nginx",
              "forcePullImage": false,
              "privileged": false,
              "portMappings": [
                {
                  "containerPort": 80,
                  "protocol": "tcp"
                }
              ],
              "network": "BRIDGE"
            }
          },
          "labels": null,
          "healthChecks": [
            {
              "protocol": "COMMAND",
              "command": {
                "value": "bash -c \"</dev/tcp/$HOST/$PORT0\""
              }
            }
          ],
          "env": null
        }
        

        It is because we warp the health check command with docker exec blindly.

              // Wrap the original health check command in `docker exec`.
              const CommandInfo& command = healthCheck.command();
        
              vector<string> commandArguments;
              commandArguments.push_back(docker->getPath());
              commandArguments.push_back("exec");
              commandArguments.push_back(containerName);
        
              if (command.shell()) {
                commandArguments.push_back("sh");
                commandArguments.push_back("-c");
                commandArguments.push_back("\"");
                commandArguments.push_back(command.value());
                commandArguments.push_back("\"");
              } else {
                commandArguments.push_back(command.value());
        
                foreach (const string& argument, command.arguments()) {
                  commandArguments.push_back(argument);
                }
              }
        
              healthCheck.mutable_command()->set_shell(true); <-- Cause problem.
              healthCheck.mutable_command()->clear_arguments();
              healthCheck.mutable_command()->set_value(
                  strings::join(" ", commandArguments)); <-- Cause problem.
        

        Then it would generate the health check command

        sh -c 'docker exec mesos-ce13aa71-ebba-4361-b6dd-8d4ce57ea4ab-S9.566f6c77-a6c9-46e0-bc40-5fe95a1aa9ae sh -c " bash -c "</dev/tcp/$HOST/$PORT" "'
        

        This leads to the error

        sh: 1: cannot open /dev/tcp/127.0.0.1/80 : No such file
        
        Show
        haosdent@gmail.com haosdent added a comment - Thanks Lukas Loesche 's help. I could reproduce by this application definition { "id" : "/test" , "cmd" : null , "cpus" : 1, "mem" : 128, "disk" : 0, "instances" : 1, "executor" : null , "fetch" : null , "constraints" : null , "acceptedResourceRoles" : null , "user" : null , "container" : { "docker" : { "image" : "nginx" , "forcePullImage" : false , "privileged" : false , "portMappings" : [ { "containerPort" : 80, "protocol" : "tcp" } ], "network" : "BRIDGE" } }, "labels" : null , "healthChecks" : [ { "protocol" : "COMMAND" , "command" : { "value" : "bash -c \" </dev/tcp/$HOST/$PORT0\"" } } ], "env" : null } It is because we warp the health check command with docker exec blindly. // Wrap the original health check command in `docker exec`. const CommandInfo& command = healthCheck.command(); vector<string> commandArguments; commandArguments.push_back(docker->getPath()); commandArguments.push_back( "exec" ); commandArguments.push_back(containerName); if (command.shell()) { commandArguments.push_back( "sh" ); commandArguments.push_back( "-c" ); commandArguments.push_back( "\" "); commandArguments.push_back(command.value()); commandArguments.push_back( "\" "); } else { commandArguments.push_back(command.value()); foreach ( const string& argument, command.arguments()) { commandArguments.push_back(argument); } } healthCheck.mutable_command()->set_shell( true ); <-- Cause problem. healthCheck.mutable_command()->clear_arguments(); healthCheck.mutable_command()->set_value( strings::join( " " , commandArguments)); <-- Cause problem. Then it would generate the health check command sh -c 'docker exec mesos-ce13aa71-ebba-4361-b6dd-8d4ce57ea4ab-S9.566f6c77-a6c9-46e0-bc40-5fe95a1aa9ae sh -c " bash -c " </dev/tcp/$HOST/$PORT " " ' This leads to the error sh: 1: cannot open /dev/tcp/127.0.0.1/80 : No such file
        Hide
        haosdent@gmail.com haosdent added a comment -
        Show
        haosdent@gmail.com haosdent added a comment - Patch: https://reviews.apache.org/r/54846/
        Hide
        jdef James DeFelice added a comment -

        would like to see some traction on this. several issues have been reported against Marathon, the latest of which is https://github.com/mesosphere/marathon/issues/5136

        Show
        jdef James DeFelice added a comment - would like to see some traction on this. several issues have been reported against Marathon, the latest of which is https://github.com/mesosphere/marathon/issues/5136

          People

          • Assignee:
            haosdent@gmail.com haosdent
            Reporter:
            lloesche Lukas Loesche
          • Votes:
            4 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:

              Development