SA Bugzilla – Bug 6682
Using daemonize option writes wrong PID (pid of parent process instead of child / daemon).
Last modified: 2018-06-09 13:31:36 UTC
Starting spamd with --daemonize option writes wrong PID into pid-file. Child PID is overwritten by parent PID. Seems to be a race condition. Small testscript: test.pl #!/usr/bin/perl system("/usr/sbin/spamd --username=popuser --daemonize --helper-home-dir=/var/qmail --max-children 5 --pidfile=/var/run/spamd/spamd_full.pid --socketpath=/tmp/spamd_full.sock"); exit(0); Sample result (in my environment): 1. Using ps I can see one process "/usr/sbin/spamd" with PID 3840 2. Second process /usr/sbin/spamd with PID 3878 is started by daemonize() 3. In PID file first I can see the PID 3878 (the correct one!) 4. Shortly after, the PID 3840 is written into PID file (the wrong one) 5. Parent Process with PID 3840 is killed. => daemonized process with PID 3878 is running but PID file contains pid 3840 Potential fix (works in my environment): In spamd change the order "write PID file, then kill parent" to "kill parent, then write PID file: Old: # Make the pidfile ... if (defined $opt{'pidfile'}) { if (open PIDF, ">$opt{'pidfile'}") { print PIDF "$$\n"; close PIDF; } else { warn "spamd: cannot write to PID file: $!\n"; } } # now allow waiting processes to connect, if they're watching the log. # The test suite does this! info("spamd: server pid: $$\n"); kill("USR1",$originalparent) if ($opt{'daemonize'}); New: # now allow waiting processes to connect, if they're watching the log. # The test suite does this! info("spamd: server pid: $$\n"); kill("USR1",$originalparent) if ($opt{'daemonize'}); # Make the pidfile ... if (defined $opt{'pidfile'}) { if (open PIDF, ">$opt{'pidfile'}") { print PIDF "$$\n"; close PIDF; } else { warn "spamd: cannot write to PID file: $!\n"; } } I am running on a virtual server which is not the fastest => maybe it is working correctly on "fast" machines. Maybe one can verify if my proposed fix is ok. Thank you.
I can't replicate on my development box. Running straight from the command line doesn't appear to show the issue. I wonder if it has something to do with you using perl to launch a perl program. What happens if you do the below on your box? [root@devel root]# spamd --username=httpd --daemonize --max-children 5 --pidfile=/tmp/spamd_full.pid --socketpath=/tmp/spamd_full.sock [root@devel root]# cat /tmp/spamd_full.pid 12392 [root@devel root]# ps auxww | grep spamd root 12392 2.9 0.8 34744 32712 ? Ss 09:12 0:02 /usr/local/bin/spamd --username=httpd --daemonize --max-children 5 --pidfile=/tmp/spamd_full.pid --socketpath=/tmp/spamd_full.sock httpd 12394 0.0 0.8 34736 32708 ? S 09:12 0:00 spamd child httpd 12395 0.0 0.8 34744 32712 ? S 09:12 0:00 spamd child root 12403 0.0 0.0 1744 588 pts/0 R+ 09:13 0:00 grep spamd
I have tried it using spamd command directly without embedded in Perl script. Same behaviour. Found the reason, problem is the kill-command using signal "USR1" sent to the parent process. kill("USR1",$originalparent) if ($opt{'daemonize'}); Replacing it with "SIGTERM" solves the problem on my server, too. So it seems it is not a Perl-dependent problem.
(In reply to comment #2) > I have tried it using spamd command directly without embedded in Perl script. > Same behaviour. > > Found the reason, problem is the kill-command using signal "USR1" sent to the > parent process. > > kill("USR1",$originalparent) if ($opt{'daemonize'}); > > Replacing it with "SIGTERM" solves the problem on my server, too. > > So it seems it is not a Perl-dependent problem. What OS are you using? SIGUSR1 is a reserved signal so it should let SA do what it wants with the signal. A SIGTERM just terminates the process. At least that's my understanding.
(In reply to comment #3) > > What OS are you using? It is a hosted virtual server, OS is Suse 9.1 (old, but nearly everything running on it continously manually updated) with a 2.4.20 kernel. > SIGUSR1 is a reserved signal so it should let SA do what > it wants with the signal. A SIGTERM just terminates the process. At least > that's my understanding. I have same understanding. But if I am write terminating the parent process after daemonizing is exactly waht should be done. - Parent process gets started - Parent process "daemonizes" which results in a parent process - Parent process will be terminated - child process continues running In spamd code I could not found any special handling for SIGUSR1.
(In reply to comment #4) > (In reply to comment #3) > > > > What OS are you using? > > It is a hosted virtual server, OS is Suse 9.1 (old, but nearly everything > running on it continously manually updated) with a 2.4.20 kernel. > > > > SIGUSR1 is a reserved signal so it should let SA do what > > it wants with the signal. A SIGTERM just terminates the process. At least > > that's my understanding. > > I have same understanding. But if I am write terminating the parent process > after daemonizing is exactly waht should be done. > > - Parent process gets started > - Parent process "daemonizes" which results in a parent process > - Parent process will be terminated > - child process continues running > > > In spamd code I could not found any special handling for SIGUSR1. Please run spamd --username=httpd --daemonize --max-children 5 --pidfile=/tmp/spamd_full.pid --socketpath=/tmp/spamd_full.sock from the command line It should exit to the command line after a minute. When that's done, what is the output of cat /tmp/spamd_full.pid and ps auxwww | grep spam?
~ # spamd --username=popuser --daemonize --max-children 5 --pidfile=/tmp/spamd_full.pid --socketpath=/tmp/spamd_full.sock User defined signal 1 ~ # ps -f -A |grep "spamd" root 2798 1 12 18:12 ? 00:00:04 /usr/sbin/spamd --username=popuser --daemonize --max-children 5 --pidfile=/tmp/spamd_full.pid --socketpath=/tmp/spamd_full.sock popuser 2849 2798 0 18:12 ? 00:00:00 spamd child popuser 2850 2798 0 18:12 ? 00:00:00 spamd child root 4113 27614 0 18:13 pts/0 00:00:00 grep spamd ~ # cat /tmp/spamd_full.pid 2794 ~ # 2794 is the PID of the parent process before daemonize
(In reply to comment #6) > ~ # spamd --username=popuser --daemonize --max-children 5 > --pidfile=/tmp/spamd_full.pid --socketpath=/tmp/spamd_full.sock > User defined signal 1 > ~ # ps -f -A |grep "spamd" > root 2798 1 12 18:12 ? 00:00:04 /usr/sbin/spamd > --username=popuser --daemonize --max-children 5 --pidfile=/tmp/spamd_full.pid > --socketpath=/tmp/spamd_full.sock > popuser 2849 2798 0 18:12 ? 00:00:00 spamd child > popuser 2850 2798 0 18:12 ? 00:00:00 spamd child > root 4113 27614 0 18:13 pts/0 00:00:00 grep spamd > ~ # cat /tmp/spamd_full.pid > 2794 > ~ # > > > 2794 is the PID of the parent process before daemonize Did you modify spamd to print the text "User defined signal 1"?
No this output seems to be printed by system / linux. spamd was completely unmodified version 3.3.1
(In reply to comment #8) > No this output seems to be printed by system / linux. > > spamd was completely unmodified version 3.3.1 My only thought is to find what is printing that text and you might find something interfering with the signal.
Thank you for your help. So I will continue running with my changed spamd version using SIGTERM instead of SIGUSR1. It is working. According to some Linux / Posix docs default behaviour of SIGUSR1 is terminating (same like SIGTERM). And spamd does not have an own handler for SIGUSR1.
As per POSIX.1-1990 the default signal(7) for SIGUSR1 is TERM. So iff there is no handler for SIGUSR1 the default action is term. I cannot reproduce this issue.