Bug 6682 - Using daemonize option writes wrong PID (pid of parent process instead of child / daemon).
Summary: Using daemonize option writes wrong PID (pid of parent process instead of chi...
Status: RESOLVED WORKSFORME
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: spamc/spamd (show other bugs)
Version: 3.3.1
Hardware: PC Linux
: P2 normal
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-10-26 12:16 UTC by a-x-e
Modified: 2018-06-09 13:31 UTC (History)
3 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description a-x-e 2011-10-26 12:16:20 UTC
Starting spamd with --daemonize option writes wrong PID into pid-file.
Child PID is overwritten by parent PID.  Seems to be a race condition.

Small testscript:

test.pl

#!/usr/bin/perl

system("/usr/sbin/spamd --username=popuser --daemonize --helper-home-dir=/var/qmail --max-children 5 --pidfile=/var/run/spamd/spamd_full.pid --socketpath=/tmp/spamd_full.sock");

exit(0);

Sample result (in my environment):

1. Using ps I can see one process "/usr/sbin/spamd" with PID 3840  
2. Second process /usr/sbin/spamd with PID 3878 is started by daemonize()
3. In PID file first I can see the PID 3878 (the correct one!)
4. Shortly after, the PID 3840 is written into PID file (the wrong one)
5. Parent Process with PID 3840 is killed.

=>  daemonized process with PID 3878 is running but PID file contains pid 3840


Potential fix (works in my environment):

In spamd change the order "write PID file, then kill parent" to "kill parent, then write PID file:

Old:

# Make the pidfile ...
if (defined $opt{'pidfile'}) {
  if (open PIDF, ">$opt{'pidfile'}") {
    print PIDF "$$\n";
    close PIDF;
  }
  else {
    warn "spamd: cannot write to PID file: $!\n";
  }
}

# now allow waiting processes to connect, if they're watching the log.
# The test suite does this!
info("spamd: server pid: $$\n");
kill("USR1",$originalparent) if ($opt{'daemonize'});


New:

# now allow waiting processes to connect, if they're watching the log.
# The test suite does this!
info("spamd: server pid: $$\n");
kill("USR1",$originalparent) if ($opt{'daemonize'});

# Make the pidfile ...
if (defined $opt{'pidfile'}) {
  if (open PIDF, ">$opt{'pidfile'}") {
    print PIDF "$$\n";
    close PIDF;
  }
  else {
    warn "spamd: cannot write to PID file: $!\n";
  }
}

I am running on a virtual server which is not the fastest => maybe it is working correctly on "fast" machines.  

Maybe one can verify if my proposed fix is ok.

Thank you.
Comment 1 Kevin A. McGrail 2011-10-26 13:13:23 UTC
I can't replicate on my development box.  Running straight from the command line doesn't appear to show the issue.  I wonder if it has something to do with you using perl to launch a perl program.

What happens if you do the below on your box?

[root@devel root]# spamd --username=httpd --daemonize --max-children 5 --pidfile=/tmp/spamd_full.pid --socketpath=/tmp/spamd_full.sock          
[root@devel root]# cat /tmp/spamd_full.pid 
12392
[root@devel root]# ps auxww | grep spamd
root     12392  2.9  0.8  34744 32712 ?        Ss   09:12   0:02 /usr/local/bin/spamd --username=httpd --daemonize --max-children 5 --pidfile=/tmp/spamd_full.pid --socketpath=/tmp/spamd_full.sock
httpd    12394  0.0  0.8  34736 32708 ?        S    09:12   0:00 spamd child
httpd    12395  0.0  0.8  34744 32712 ?        S    09:12   0:00 spamd child
root     12403  0.0  0.0   1744   588 pts/0    R+   09:13   0:00 grep spamd
Comment 2 a-x-e 2011-10-26 16:36:56 UTC
I have tried it using spamd command directly without embedded in Perl script.
Same behaviour.

Found the reason,  problem is the kill-command using signal "USR1" sent to the parent process.

kill("USR1",$originalparent) if ($opt{'daemonize'});

Replacing it with "SIGTERM" solves the problem on my server, too.

So it seems it is not a Perl-dependent problem.
Comment 3 Kevin A. McGrail 2011-10-27 12:23:33 UTC
(In reply to comment #2)
> I have tried it using spamd command directly without embedded in Perl script.
> Same behaviour.
> 
> Found the reason,  problem is the kill-command using signal "USR1" sent to the
> parent process.
> 
> kill("USR1",$originalparent) if ($opt{'daemonize'});
> 
> Replacing it with "SIGTERM" solves the problem on my server, too.
> 
> So it seems it is not a Perl-dependent problem.

What OS are you using? SIGUSR1 is a reserved signal so it should let SA do what it wants with the signal.  A SIGTERM just terminates the process.  At least that's my understanding.
Comment 4 a-x-e 2011-10-27 12:48:12 UTC
(In reply to comment #3)
> 
> What OS are you using? 

It is a hosted virtual server, OS is Suse 9.1 (old, but nearly everything running on it continously manually updated)  with a 2.4.20 kernel.
 

> SIGUSR1 is a reserved signal so it should let SA do what
> it wants with the signal.  A SIGTERM just terminates the process.  At least
> that's my understanding.

I have same understanding. But if I am write terminating the parent process after daemonizing is exactly waht should be done.

- Parent process gets started
- Parent process "daemonizes" which results in a parent process
- Parent process will be terminated
- child process continues running


In spamd code I could not found any special handling for SIGUSR1.
Comment 5 Kevin A. McGrail 2011-10-27 16:03:46 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > 
> > What OS are you using? 
> 
> It is a hosted virtual server, OS is Suse 9.1 (old, but nearly everything
> running on it continously manually updated)  with a 2.4.20 kernel.
> 
> 
> > SIGUSR1 is a reserved signal so it should let SA do what
> > it wants with the signal.  A SIGTERM just terminates the process.  At least
> > that's my understanding.
> 
> I have same understanding. But if I am write terminating the parent process
> after daemonizing is exactly waht should be done.
> 
> - Parent process gets started
> - Parent process "daemonizes" which results in a parent process
> - Parent process will be terminated
> - child process continues running
> 
> 
> In spamd code I could not found any special handling for SIGUSR1.

Please run spamd --username=httpd --daemonize --max-children 5
--pidfile=/tmp/spamd_full.pid --socketpath=/tmp/spamd_full.sock           from the command line

It should exit to the command line after a minute.  When that's done, what is the output of cat /tmp/spamd_full.pid and ps auxwww | grep spam?
Comment 6 a-x-e 2011-10-27 16:13:17 UTC
~ # spamd --username=popuser --daemonize --max-children 5 --pidfile=/tmp/spamd_full.pid --socketpath=/tmp/spamd_full.sock
User defined signal 1
~ # ps -f -A |grep "spamd"
root      2798     1 12 18:12 ?        00:00:04 /usr/sbin/spamd --username=popuser --daemonize --max-children 5 --pidfile=/tmp/spamd_full.pid --socketpath=/tmp/spamd_full.sock
popuser   2849  2798  0 18:12 ?        00:00:00 spamd child
popuser   2850  2798  0 18:12 ?        00:00:00 spamd child
root      4113 27614  0 18:13 pts/0    00:00:00 grep spamd
~ # cat /tmp/spamd_full.pid
2794
~ #


2794 is the PID of the parent process before daemonize
Comment 7 Kevin A. McGrail 2011-10-27 16:23:47 UTC
(In reply to comment #6)
> ~ # spamd --username=popuser --daemonize --max-children 5
> --pidfile=/tmp/spamd_full.pid --socketpath=/tmp/spamd_full.sock
> User defined signal 1
> ~ # ps -f -A |grep "spamd"
> root      2798     1 12 18:12 ?        00:00:04 /usr/sbin/spamd
> --username=popuser --daemonize --max-children 5 --pidfile=/tmp/spamd_full.pid
> --socketpath=/tmp/spamd_full.sock
> popuser   2849  2798  0 18:12 ?        00:00:00 spamd child
> popuser   2850  2798  0 18:12 ?        00:00:00 spamd child
> root      4113 27614  0 18:13 pts/0    00:00:00 grep spamd
> ~ # cat /tmp/spamd_full.pid
> 2794
> ~ #
> 
> 
> 2794 is the PID of the parent process before daemonize

Did you modify spamd to print the text "User defined signal 1"?
Comment 8 a-x-e 2011-10-27 16:33:48 UTC
No this output seems to be printed by system / linux.

spamd was completely unmodified version 3.3.1
Comment 9 Kevin A. McGrail 2011-10-27 16:39:38 UTC
(In reply to comment #8)
> No this output seems to be printed by system / linux.
> 
> spamd was completely unmodified version 3.3.1

My only thought is to find what is printing that text and you might find something interfering with the signal.
Comment 10 a-x-e 2011-10-31 12:14:03 UTC
Thank you for your help.

So I will continue running with my changed spamd version using SIGTERM instead of SIGUSR1. It is working.
According to some Linux / Posix docs default behaviour of SIGUSR1 is terminating (same like SIGTERM). And spamd does not have an own handler for SIGUSR1.
Comment 11 Giovanni Bechis 2018-06-09 13:31:36 UTC
As per POSIX.1-1990 the default signal(7) for SIGUSR1 is TERM.
So iff there is no handler for SIGUSR1 the default action is term.
I cannot reproduce this issue.