Thank you for the comments.
Eric Badger, the main reason a Java root process is more secure than container-executor is that it protects against exploitable buffer overflows. This is why I raised the suggestion. I was not sure why this approach was not followed before, this is why I raised this jira. It is also easier to use for most Hadoop developers, as you mentioned.
Vinod Kumar Vavilapalli, this jira already builds on the experiences of
YARN-6623, I would rather consider it as a subtask of YARN-5673, if even considered. Now that you mentioned, a possible solution for YARN-5673 considering this (YARN-7506) suggestion would be to have a root Java based container executor framework that loads Java or native C modules. However, Docker has its own unique design with the CLI and the socket and no native system call dependencies, that it could be handled separately.
Side note: One more important consideration in the container-executor design was to not have long running root processes as it may increase the attack scope. Assuming that is still intact.
Suggestion 1. above does not require any long running root user process. 2. does, however the only surface would be the proxied docker socket and config file that is protected with file system permissions just like the container-executor executable.
Both docker and hadoop use "trusted" users...
I have to remind about the rule of defense in depth. In case of defense in depth, there is no trusted user. Every input is evil and each component (container-executor in this case) has to do its proper error checking.
YARN user tap directly into docker.sock goes against our original philosophy of having both "trusted" user and root to perform validation.
Indeed. I agree.
Root power may be used for validation logic when trusted user can not validate, such as symlink to local file system access that
Indeed, and I would mention volume white and blacklists, that the yarn user cannot validate because of the defense in depth rule.
We can consider to keep most of logic in Java as long as root privileges is not required.
I disagree here. Most of the functionality that
YARN-6623 implemented requires that root does the validation, so if done in Java, it should be in a Java root process.
The performance gain from tapping into docker socket is saving the cost of one fork but we would lose a lot of validations done by docker CLI.
The validations are important indeed, but making validations is much more difficult on command line options than on easily parseable JSON as the recent issues showed.
If it can be helped, calling root cli is preferred than calling root owned network socket.
There is a solution for that. We could still use the CLI from Java node manager running as yarn on a unix socket writable to yarn that is proxied and security filtered with a root java process running in the background and that works on the original socket. (See attached diagram)
I don't fully agree with YARN-5673 modules API design. The description is another plug-in architecture to enable more functionality with root power. I think this is a slippy slope to enable more risks in container-executor.
I agree, I also raised my concerns there.
It is best to avoid running java as root. Java runtime includes a lot of third party code, which can be unpredictable with root power.
That is a risk. I would minimize the number of non-JDK dependencies, if java root process is chosen. I still think it may be more favorable in this case.
I summarized the options in the attached diagram. That shows which one is the most simple.