Details
Description
the hadoop-daemon.sh script (and other liveness monitors) probe the existence of a daemon service by a kill -0 of a process id picked up from a pid file.
This is flawed
- pid file locations may change with installations.
- Linux and Unix recycle pids, leading to false positives -the scripts think the process is running, when another process is.
- doesn't work on windows.
Having the processes acquire an exclusive write-lock on a known file would delegate lock management and implicitly liveness to the OS itself. when the process dies, the lock is relased (on Unixes)
Attachments
Issue Links
- is related to
-
HADOOP-9085 start namenode failure,bacause pid of namenode pid file is other process pid or thread id before start namenode
- Resolved