Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
Consider this Marathon app definition:
{ "id": "/testapp", "cmd": "env && tail -f /dev/null", "env":{ "TESTVAR":"line1\nline2" }, "cpus": 0.1, "mem": 10, "instances": 1, "container": { "type": "DOCKER", "docker": { "image": "alpine" } } }
The JSON-encoded newline in the value of the TESTVAR environment variable leads to a corrupted task environment. What follows is a subset of the resulting task environment (as printed via env, i.e. in key=value notation):
line2= TESTVAR=line1
That is, the trailing part of the intended value ended up being interpreted as variable name, and only the leading part of the intended value was used as actual value for TESTVAR.
Common application scenarios that would badly break with that involve pretty-printed JSON documents or YAML documents passed along via the environment.
Following the code and information flow led to the conclusion that Docker's --env-file command line interface is the weak point in the flow. It is currently used in Mesos' Docker containerizer for passing the environment to the container:
argv.push_back("--env-file");
argv.push_back(environmentFile);
(Ref: code)
Docker's --env-file argument behavior is documented via
The --env-file flag takes a filename as an argument
and expects each line to be in the VAR=VAL format,
(Ref: https://docs.docker.com/engine/reference/commandline/run/)
That is, Docker identifies individual environment variable key/value pair definitions based on newline bytes in that file which explains the observed environment variable value fragmentation. Notably, Docker does not provide a mechanism for escaping newline bytes in the values specified in this environment file.
I think it is important to understand that Docker's --env-file mechanism is ill-posed in the sense that it is not capable of transmitting the whole range of environment variable values allowed by POSIX. That's what the Single UNIX Specification, Version 3 has to say about environment variable values:
the value shall be composed of characters from the
portable character set (except NUL and as indicated below).
(Ref: http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap08.html)
About "The portable character set": http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap06.html#tagtcjh_3
It includes (among others) the LF byte. Understandably, the current Docker --env-file behavior will not change, so this is not an issue that can be deferred to Docker: https://github.com/docker/docker/issues/12997
Notably, the --env-file method for communicating environment variables to Docker containers was just recently introduced to Mesos as of https://issues.apache.org/jira/browse/MESOS-6566, for not leaking secrets through the process listing. Previously, we specified env key/value pairs on the command line which leaked secrets to the process list and probably also did not support the full range of valid environment variable values.
We need a solution that
1) does not leak sensitive values (i.e. is compliant with MESOS-6566).
2) allows for passing arbitrary environment variable values.
It seems that Docker's --env method can be used for that. It can be used to define just the names of the environment variables to-be-passed-along, in which case the docker binary will read the corresponding values from its own environment, which we can clearly prepare appropriately when we invoke the corresponding child process. This method would still leak environment variable names to the process listing, but (especially if documented) this should be fine.
Attachments
Issue Links
- relates to
-
MESOS-6566 The Docker executor should not leak task env variables in the Docker command cmd line.
- Accepted