Description
It is critical that we have a service that can auto-restart during crashes and reboots. On Amazon Linux this tasks are done by Upstart.
By default Upstart will track the life cycle of the first PID that it executes in the exec or script stanzas (defined in the Upstart config file), however, most Unix services will "daemonize", meaning that they will create a new process (using fork(2)) which is a child of the initial process. This is what also happens when when gateway.sh or ldap.sh is invoked.
In order to track the right PID, Upstart must determine the final process ID for a job, and in case of daemonized processes it needs to know how many times that process will call fork(2).
Upstart supports the followings:
- expect fork: Upstart will expect the process executed to call fork(2) exactly once.
- expect daemon: Upstart will expect the process executed to call fork(2) exactly twice
Unfortunately none of the above cases fits to gateway.sh and ldap.sh, since they are calling fork many times and Upstart always tracks the wrong PID.
According to Upstart doc http://upstart.ubuntu.com/cookbook/#how-to-establish-fork-count if the application you are attempting to create a Job Configuration File does not document how many times it forks, you can run it with a tool such as strace(1) which will allow you to count the number of forks:
[root@ip-10-0-4-107 ~]# strace -o /tmp/strace.log -fFv su -c "/usr/hdp/current/knox-server/bin/gateway.sh start" knox Starting Gateway succeeded with PID 25528. [root@ip-10-0-4-107 ~]# sudo egrep "\<(fork|clone)\>\(" /tmp/strace.log | wc | awk '{print $1}' 86
Ambari had similar issues in the past: https://issues.apache.org/jira/browse/AMBARI-14842