JDK7's Process output streams have an async fd-close race bug. This manifests as commands run via o.a.h.u.Shell causing threads to hang, OOM, or cause other bizarre behavior. The NM is likely to encounter the bug under heavy load.
Specifically, ProcessBuilder's UNIXProcess starts a thread to reap the process and drain stdout/stderr to avoid a lingering zombie process. A race occurs if the thread using the stream closes it, the underlying fd is recycled/reopened, while the reaper is draining it. ProcessPipeInputStream.drainInputStream's will OOM allocating an array if in.available() returns a huge number, or may wreak havoc by incorrectly draining the fd.
HADOOP-10622 Shell.runCommand can deadlock
- is related to
HADOOP-15512 clean up Shell from JDK7 workarounds