Bug 4590 - prefork: syswrite(6) failed
Summary: prefork: syswrite(6) failed
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: spamc/spamd (show other bugs)
Version: 3.1.0
Hardware: PC Linux
: P3 major
Target Milestone: 3.1.9
Assignee: Justin Mason
URL:
Whiteboard:
Keywords:
: 4666 (view as bug list)
Depends on: 4696 5313
Blocks:
  Show dependency tree
 
Reported: 2005-09-17 18:16 UTC by Rajendra Singh
Modified: 2007-04-03 11:40 UTC (History)
14 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status
workaround patch patch None Justin Mason [HasCLA]
Logfile text/plain None Frank Urban [NoCLA]
Logfile2 text/plain None Frank Urban [NoCLA]
log with patch #3284 text/plain None Daryl C. W. O'Shea [HasCLA]
Part of maillog text/plain None Frank Urban [NoCLA]
DOS readable version text/plain None Frank Urban [NoCLA]
spamd restart hack text/plain None peter gervai [NoCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Rajendra Singh 2005-09-17 18:16:29 UTC
Getting this error in maillog:
prefork: syswrite(6) failed, retrying... at /
usr/lib/perl5/site_perl/5.8.0/Mail/SpamAssassin/SpamdForkScaling.pm line 554. 

Then all incoming mail doesn't get processed or delivered.
Comment 1 Jeff Martin 2005-10-04 06:27:00 UTC
I'm seeing the same issue. Things work for a while then it'll start failing and
we have to restart spamd.
Comment 2 Michel Meyers 2005-10-28 15:54:15 UTC
Same thing just happened on my Debian etch system here:

Oct 28 01:26:06 server spamd[15848]: prefork: syswrite(6) failed, retrying... at
 /usr/share/perl5/Mail/SpamAssassin/SpamdForkScaling.pm line 554.
Oct 28 01:26:16 server last message repeated 2 times
Comment 3 James Thorne 2005-11-04 12:51:54 UTC
Same thing here on RHEL3:

Nov  4 11:19:45 griffin spamd[28185]: prefork: syswrite(18) failed, retrying... 
at
+/usr/lib/perl5/site_perl/5.8.0/Mail/SpamAssassin/SpamdForkScaling.pm line 554.
Nov  4 11:20:20 griffin last message repeated 7 time

Cannot connect to spamd once this message has appeared.  Restarting gets rid of 
the messages for a while.  The options we're passing to spamd are:

SPAMDOPTIONS="-d -i 0.0.0.0 -H /tmp -A 193.128.242.5 -A 193.128.240.22 -A 
193.128.240.33 -A 193.128.240.7 -A 193.128.242.4 -A 193.128.240.9 -A 127.0.0.1 -
-max-children=25 --max-conn-per-child=200 --min-spare=10"
Comment 4 James Thorne 2005-11-07 15:51:21 UTC
This is not happening on the RHEL4 box here that is also running SA.
Comment 5 Brett Derby 2005-11-23 09:51:01 UTC
Same here - using the port on FreeBSD 4.7, SA 3.1.0
Forking frustrating!

error: usage: $io->syswrite(BUF, LEN [, OFFSET]) at
/usr/local/lib/perl5/site_perl/5.6.1/Mail/SpamAssassin/SpamdForkScaling.pm line 544
Comment 6 David Anderson 2005-11-29 11:48:30 UTC
Same here - Red Hat 9 / SpamAssassin 3.1.0. Problem began immediately after 
upgrade to 3.1.0. 
 
Nov 29 09:59:07 warfield spamd[25460]: prefork: syswrite(8) failed, retrying... 
at /usr/lib/perl5/site_perl/5.8.0/Mail/SpamAssassin/SpamdForkScaling.pm line 
554. 
 
spamd options are: 
 
/usr/bin/spamd -d -u exim --min-children 1 --max-children 10 --min-spare 1 
--max-spare 3 -H --socketpath=/var/run/spamd.sock 
Comment 7 Justin Mason 2005-11-29 19:47:57 UTC
*** Bug 4666 has been marked as a duplicate of this bug. ***
Comment 8 Justin Mason 2005-11-29 19:49:14 UTC
1. use the "--round-robin" switch to avoid this; it appears to be load-related,
and at that level of load, round-robin is marginally more efficient anyway.

2. if anyone can provide strace or truss output from *both* the parent and child
process when this occurs, I'd be very grateful.  haven't been able to repro it here.
Comment 9 Daryl C. W. O'Shea 2005-11-30 09:47:27 UTC
Here's a prefork log and a strace of both parent and children:

http://cyan.dostech.net/sa-bug4590/prefork.log.syswrite7
http://cyan.dostech.net/sa-bug4590/spamd-strace.syswrite7.trimmed

The busy child was a child that timed out an alarm on an open (WHOIS, "whois
$domain|");

Let me know if you need more of the strace log or the maillog.

Comment 10 Justin Mason 2005-12-01 01:18:08 UTC
Daryl -- thanks!

more info would def be welcome.  what was the PID of the busy child?  30625?
can you post the entire strace somewhere?
where did the whois stuff come from?  another plugin?
Comment 11 Justin Mason 2005-12-01 02:25:59 UTC
OK, I've been looking at
http://cyan.dostech.net/sa-bug4590/spamd-strace.syswrite7.trimmed . for what
it's worth, here's what I can tell so far.  It just doesn't make much sense.


PIDS:

  15882: the spamd parent
  19783: a child PID that continues working OK
  30625: a child PID that stops processing mail and gets hung

FDS:

  15882 fd 4 => syslog

  15882 fd 7 => control channel for 30625, attached to fd 8
                in that process

  15882 fd 8 => control channel for 19783, fd 9


ANNOTATED STRACE:

child, waiting for events:

30625 00:02:31.541351 select(16, [8], NULL, NULL, {300, 0} <unfinished ...>

everything goes well until (event: linenumber):
parent, last good write: 12070

15882 00:05:03.547505 write(7, "P....\n", 6 <unfinished ...>
15882 00:05:03.547564 <... write resumed> ) = 6

The "P....\n" data is a ping, to check to ensure that the child can
process mails.  The child doesn't need to respond to this, it's
a one-way operation.

child, last successful read: 12070 - 12094

After that write from the parent, the child wakes up from its select(), reads
the ping,
and goes back to another select():

30625 00:05:03.547950 <... select resumed> ) = 1 (in [8], left {148, 0})
15882 00:05:03.548020 rt_sigprocmask(SIG_BLOCK, ~[],  <unfinished ...>
30625 00:05:03.548132 read(8,  <unfinished ...>
15882 00:05:03.548168 <... rt_sigprocmask resumed> [33], 8) = 0
30625 00:05:03.548265 <... read resumed> "P....\n", 6) = 6
15882 00:05:03.548334 rt_sigprocmask(SIG_BLOCK, [ALRM],  <unfinished ...>
30625 00:05:03.548440 read(8,  <unfinished ...>
15882 00:05:03.548461 <... rt_sigprocmask resumed> ~[KILL STOP], 8) = 0
30625 00:05:03.548505 <... read resumed> 0x9f81dc0, 6) = -1 EAGAIN (Resource
temporarily unavailable)
15882 00:05:03.548529 rt_sigaction(SIGALRM, {0x80a7be8, [], SA_RESTORER,
0x420276f8},  <unfinished ...>
30625 00:05:03.548591 time( <unfinished ...>
15882 00:05:03.548609 <... rt_sigaction resumed> {0x80a6dcc, [], SA_RESTORER,
0x420276f8}, 8) = 0
30625 00:05:03.548700 <... time resumed> NULL) = 1132463103
15882 00:05:03.548720 rt_sigprocmask(SIG_SETMASK, ~[KILL STOP],  <unfinished ...>
30625 00:05:03.548792 select(16, [8], NULL, NULL, {300, 0} <unfinished ...>

and this is where the problem happens, somewhere between line 12094 and 12854.
parent, first failed write: 12854

15882 00:07:35.737337 write(7, "P....\n", 6) = -1 EAGAIN (Resource temporarily
unavailable)

parent, second failed write: 12874

15882 00:07:41.834433 write(7, "P....\n", 6) = -1 EAGAIN (Resource temporarily
unavailable)

(many more attempts 5 seconds apart, all fail)
child, failed read: 13338 - 13351

30625 00:10:03.561013 <... select resumed> ) = 0 (Timeout)
30625 00:10:03.606427 read(8, 0x9f81dc0, 6) = -1 EAGAIN (Resource temporarily
unavailable)
30625 00:10:03.614481 time(NULL)        = 1132463403
30625 00:10:03.635766 select(16, [8], NULL, NULL, {1, 0}) = 0 (Timeout)
30625 00:10:04.631238 read(8, 0x9f81dc0, 6) = -1 EAGAIN (Resource temporarily
unavailable)
30625 00:10:04.631361 time(NULL)        = 1132463404
30625 00:10:04.818517 rt_sigprocmask(SIG_BLOCK, [PIPE], [33], 8) = 0
30625 00:10:04.822736 rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}, 8) = 0
30625 00:10:04.822854 rt_sigprocmask(SIG_SETMASK, [33], NULL, 8) = 0
30625 00:10:04.822906 rt_sigprocmask(SIG_BLOCK, [PIPE], [33], 8) = 0
30625 00:10:04.822962 rt_sigaction(SIGPIPE, {0x80a7be8, [], SA_RESTORER,
0x420276f8}, {SIG_DFL}, 8) = 0
30625 00:10:04.823016 rt_sigprocmask(SIG_SETMASK, [33], NULL, 8) = 0
30625 00:10:04.945869 select(8, [4], NULL, [4], {0, 0}) = 0 (Timeout)
30625 00:10:04.953285 write(4, "<20>spamd[30625]: prefork: sysre"..., 149) = 149

and it carries on with both sides retrying and failing, until end of strace.

So the problem happens somewhere between line 12094 and 12854.  However,
there's absolutely nothing of note in those lines -- just the usual idle
signal-twiddling and 2-second sleeping.

If anyone can spot anything, I'd be much obliged!  I'm stumped ;)

Comment 12 Justin Mason 2005-12-01 03:07:11 UTC
Created attachment 3284 [details]
workaround patch

This patch adds some workaround code; instead of retrying indefinitely, it'll
retry 3 times then kill the child and allow it to respawn.

It's in trunk as r350101.

I think the bug will still occur -- it should be a lot less serious with this. 
If anyone has any ideas, I'm all ears. ;)
Comment 13 Daryl C. W. O'Shea 2005-12-01 03:20:21 UTC
Yeah, the whois thing is in a plugin...


  my $oldalarm = 0;

  eval {
    local $SIG{ALRM} = sub { die "__alarm__\n" };

    $oldalarm = alarm 5;

    open (WHOIS, "whois $domain|");

    while (<WHOIS>) {
      $lines++;

	... match stuff ...

    }
    close (WHOIS);
    alarm $oldalarm;
  };


... which will leave a whois zombie process if it times out.  Not sure how I
missed that when I wrote it.



The busy child was 12301.  After the alarm on the whois call, it cotinues and
returns a scanned message.  The next message it accepts it doesn't scan...
"processing message" is the last line maillog for the 12301 child:


Nov 19 11:42:51 cyan spamd[15882]: prefork: child states: II
Nov 19 11:44:42 cyan spamd[12301]: spamd: connection from smtp.film-tech.net
[66.98.221.156] at port 2285
Nov 19 11:44:42 cyan spamd[12301]: spamd: processing message
<0563657271.7475497925@mulespub.com> for dcaldwell@domain.com:501
Nov 19 11:44:43 cyan dccproc[17048]: open(/var/dcc/map): Permission denied
Nov 19 11:44:49 cyan spamd[12301]: __alarm__
Nov 19 11:44:49 cyan spamd[12301]: __alarm__
Nov 19 11:44:49 cyan spamd[12301]: spamd: identified spam (9.7/5.0) for
dcaldwell@domain.com:501 in 7.2 seconds, 1716 bytes.
Nov 19 11:44:49 cyan spamd[12301]: spamd: result: Y  9 -
BAYES_99,RATWARE_RCVD_PF,URIBL_BLACK
scantime=7.2,size=1716,user=dcaldwell@domain.com,uid=501,requ
ired_score=5.0,rhost=smtp.film-tech.net,raddr=66.98.221.156,rport=2285,mid=<0563657271.7475497925@mulespub.com>,bayes=0.993899328941404,autolearn=no
Nov 19 11:44:49 cyan spamd[15882]: prefork: child states: II
Nov 19 11:45:20 cyan spamd[12301]: spamd: connection from smtp.film-tech.net
[66.98.221.156] at port 2298
Nov 19 11:45:20 cyan spamd[12301]: spamd: processing message
<BFA4B2BD.24969%bart@videofest.org> for jspracklen@domain.com:501

The "dccproc[17048]: open(/var/dcc/map): Permission denied" started happening
when I started strace'ing spamd... haven't got around to finding out why.

In the case of 30625 (going from memory, I meant to post this the next morning
and forgot) I believe it became a zombie process (probably after killing the
parent -- I can't remember).  There's nothing abnormal in the maillog about the
last messages it processed.  It successfully returned its last scanned message,
but way before the pings started failing, at Nov 19 16:44:12.  The prefork log
idetifies it (30625) as IDLE the entire time (from 16:44:12 until the last
prefork child state log message at 23:58:03).
Comment 14 Daryl C. W. O'Shea 2005-12-01 03:50:59 UTC
Complete strace (all 647M of fun, compressed to 76M):

http://cyan.dostech.net/sa-bug4590/spamd-strace.syswrite7.gz


Would you prefer I disable my whois plugin and run as is, or additionally apply
the patch to 3.1.0?
Comment 15 Justin Mason 2005-12-01 07:07:11 UTC
I'm not sure what's up with that strace, then -- child 12301 never appears... 

I wouldn't worry about it -- I don't think there'll be any useful data in the
640MB strace; I can't see any myself. feel free to rm it.
Comment 16 Daryl C. W. O'Shea 2005-12-01 07:52:30 UTC
I'm confused... I see thousands of lines in the full strace for child 12301.
Comment 17 Justin Mason 2005-12-01 22:38:41 UTC
hmm, yes, I see 12301 now.  Must have screwed up my searching :(
Comment 18 Justin Mason 2005-12-02 01:37:05 UTC
interesting.  However, child 12301 has nothing to do with the "prefork:
syswrite(N) failed, retrying..." bug; that child is *definitely* 30625,
according to that strace.
Comment 19 Daryl C. W. O'Shea 2005-12-02 05:03:34 UTC
Yeah, it's weird.  12301 is hung up at "processing message" and left marked as
busy, while 30625 successfully returns its scanned message, and is marked idle,
almost 7.5 hours before it stops responding to pings.


BTW, do you think this and the "prefork: sysread(8) failed after 300
secs at /usr/local/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/SpamdForkScaling
.pm line 501." I'm seeing in bug 4476 is being caused by the same thing?
Comment 20 Frank Urban 2006-01-13 07:50:36 UTC
Hi,

I'm a little bit confused what I have to do now to avoid this bug?
Should I install the patch? Should I use --round-robin? 
Or whatelse can I do?

Frank
Comment 21 Justin Mason 2006-01-13 20:22:21 UTC
Frank --

nobody has reported any results with the patch, so I have no idea :(

has anyone tried the patch?
Comment 22 Daryl C. W. O'Shea 2006-01-14 02:19:06 UTC
I've been running with the patch since you made it, but I've only ever seen this
problem the one time.  It didn't even occur when the server hit a load average
of 900+ earlier in the week.

Frank --

I'd try it out and, please, let us know how it goes.  It looks like it should
fix the problem and in my experience it doesn't cause any problems of it's own,
so it  should be safe to use in production.
Comment 23 Frank Urban 2006-01-16 08:45:48 UTC
OK I installed the patch now. Let us the what will happen.
I had the problem 5 times last week. But there was now high load on the server 
at this time...
Comment 24 Frank Urban 2006-01-17 08:29:04 UTC
Created attachment 3333 [details]
Logfile

I still have problems, but now with a different look (see attached Logfile)
In the night there where some minor problems with seemed to be catched by the
patch. But now at 08:15 the filter crashed again (see second part of the
log.txt)
Frank
Comment 25 Frank Urban 2006-01-17 08:38:50 UTC
the 
"spamd[4339]: prefork: select returned error on server filehandle: Bad file 
descriptor" 
message I can see in my logfile from 03:48 until 08:15 until I restarted the 
spamd. All mailss seemed to be queued until this time. So the patch does not 
work correct?
Frank
Comment 26 Frank Urban 2006-01-17 08:44:54 UTC
Created attachment 3334 [details]
Logfile2

one of my other servers had also problems since this time. There was only one
mail activ at this time. So this was shuer not a problem of the load of the
server.
Frank
Comment 27 Frank Urban 2006-01-19 07:52:12 UTC
I'm still running in this problem every day. So I now installed a cron job 
witch restart the spamd every houre. I hope this will help in a first step.
Frank
Comment 28 Chris Shelton 2006-01-23 02:48:01 UTC
(In reply to comment #12)

> This patch adds some workaround code; instead of retrying indefinitely, it'll
> retry 3 times then kill the child and allow it to respawn.
> 
> It's in trunk as r350101.
> 
> I think the bug will still occur -- it should be a lot less serious with this. 
> If anyone has any ideas, I'm all ears. ;)

I installed this patch a couple of days ago after encountering the exact problem
described in this bug.  The patch seems to have fully resolved the problems that
I was having, where mail was very slow to get processed.

The patch seems reasonable to me, in that it averts a possible endless loop in
the code.
Comment 29 Daryl C. W. O'Shea 2006-01-26 02:41:01 UTC
Created attachment 3347 [details]
log with patch #3284

Justin --

It looks like the patch prevented the loop but failed to "forget" the child (I
haven't really looked at the code).

I've got a 2GB strace log that I can extract a section out of if you want it,
along with complete mail logs and strace if you're interested.
Comment 30 Tom Schulz 2006-01-30 17:33:56 UTC
If your redo or add to the patch, you might want to check the following comment:
# give it 1 second to recover.  we retry indefinitely.
I expect that the 'retry indefinitely' part is wrong.
Comment 31 Justin Mason 2006-01-31 03:23:04 UTC
Tom, thanks -- corrected now in my checkout, I'll commit it RSN.
Comment 32 Justin Mason 2006-02-10 20:21:07 UTC
There are some major bugfixes for the apache-style preforking subsystem in bug
4696.   If you are seeing this issue, please try applying the patch in that bug,
and see if it helps!

(fwiw that minor comment thing is also fixed in the patch in that bug.)
Comment 33 Justin Mason 2006-02-21 18:38:40 UTC
OK, the fix for bug 4696 now includes this fix, too; if you're seeing this
issue, go to that bug and apply the patch immediately ;)
Comment 34 Sidney Markowitz 2006-03-09 19:57:29 UTC
Now that bug 4696 is fixed in the 3.1 branch, I think this can be closed as fixed.
Comment 35 Daryl C. W. O'Shea 2006-03-09 20:04:15 UTC
I don't think this is fixed.  Reopening and targetting at 3.1.2.
Comment 36 Frank Urban 2006-03-20 08:24:37 UTC
Hello,
is there still anyone working on this?
I got the same error again at the weekend now in Version 3.1.1.

Mar 18 05:04:16 mail1 spamd[15430]: prefork: retrying syswrite(): Resource 
temporarily unavailable 
at /usr/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/SpamdForkScaling.pm line 
616.
Mar 18 05:04:16 mail1 spamd[15430]: prefork: syswrite(21) to 19194 failed on 
try 2 at /usr/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/SpamdForkScaling.pm 
line 593.
....


Mar 18 05:04:17 mail1 spamd[15430]: prefork: retrying syswrite(): Resource 
temporarily unavailable 
at /usr/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/SpamdForkScaling.pm line 
616.
Mar 18 05:04:17 mail1 spamd[15430]: prefork: syswrite(21) to 19194 failed on 
try 3 at /usr/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/SpamdForkScaling.pm 
line 593.
prefork: retrying syswrite(): Resource temporarily unavailable 
at /usr/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/SpamdForkScaling.pm line 
616.
Mar 18 05:04:18 mail1 spamd[15430]: prefork: syswrite(21) to 19194 failed on 
try 4 at /usr/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/SpamdForkScaling.pm 
line 593.
Mar 18 05:04:18 mail1 spamd[15430]: prefork: giving up 
at /usr/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/SpamdForkScaling.pm line 
595.
Mar 18 05:04:18 mail1 spamd[15430]: prefork: write of ping failed to 19194 
fd=21:  
at /usr/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/SpamdForkScaling.pm line 
330.
Mar 18 05:04:18 mail1 spamd[15430]: prefork: killing failed child 19194 fd=21 
at /usr/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/SpamdForkScaling.pm line 
127.
Mar 18 05:04:18 mail1 spamd[15430]: prefork: killed child 19194 
at /usr/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/SpamdForkScaling.pm line 
141.
Mar 18 05:04:18 mail1 spamd[15430]: prefork: select returned -1! recovering: 
Bad file descriptor Mar 18 05:04:18 mail1 spamd[15430]: spamd: handled cleanup 
of child pid 19194 due to SIGCHLD Mar 18 05:04:18 mail1 spamd[15430]: prefork: 
select returned -1! recovering: Bad file descriptor Mar 18 05:04:18 mail1 
postfix/smtpd[13451]: lost connection after DATA from unknown[60.50.129.32] 
Mar 18 05:04:18 mail1 postfix/smtpd[13451]: disconnect from unknown
[60.50.129.32] Mar 18 05:04:19 mail1 spamd[15430]: prefork: select returned -
1! recovering: Bad file descriptor

So I will install againe a cronjob witch restart the spamd every houre...

Frank
Comment 37 Daryl C. W. O'Shea 2006-07-12 01:18:11 UTC
Frank, sorry for the delay in looking at this, but it isn't clear from the above
snippet that this actually causes a failure of the spamd parent.  It only shows
a screwed up child that gets killed off.

Are you still seeing this?  Can you provide more info?
Comment 38 Frank Urban 2006-07-12 05:51:11 UTC
Hi Daryl,
great that you will try to take a look on this...
I think the problem is still there. My cronjob restart the spamd every houre 
and this helps.
Whrn I remeber it right, the spamd has done nothing anymore after the last 
entry in my logfile from Mar 18 05:04:18 and I have taken this "snip" at 
arround 11:00.
So I switched of now again the cronjob. What should I do when the spamd died?

Frank
Comment 39 Daryl C. W. O'Shea 2006-07-13 22:30:04 UTC
(In reply to comment #38)
> So I switched of now again the cronjob. What should I do when the spamd died?

For now I'm just looking to confirm that the following is indeed the last thing
logged by spamd:

[60.50.129.32] Mar 18 05:04:19 mail1 spamd[15430]: prefork: select returned -
1! recovering: Bad file descriptor
Comment 40 Frank Urban 2006-07-14 05:35:12 UTC
Hello,

that should be all I can see in the log. After 
Mar 18 05:04:18 mail1 spamd[15430]: prefork: select returned -1! recovering: 
Bad file descriptor
I can see only this kind of line expect that what you can see below.

Since I switched of my cronjob yesterday nothing happened until now. Maybe 
there was a change in Version 3.1.3 witch solve the problem? Before I got this 
problem several times a day.

Frank



Mar 18 05:04:02 mail1 spamd[1550]: spamd: result: Y 24 - 
BAYES_00,DCC_CHECK,DIGEST_MULTIPLE,ONLINE_PHARMACY,RAZOR2_CF_RANGE_51_100,RAZOR
2_CF_RANGE_E8_51_100,RAZOR2_CHECK,RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_DSBL,RCVD_IN_N
JABL_PROXY,RCVD_IN_XBL,SPF_NEUTRAL,URIBL_JP_SURBL,URIBL_SBL,URIBL_SC_SURBL 
scantime=2.1,size=1078,user=filter,uid=1002,required_score=6.0,rhost=localhost,
raddr=127.0.0.1,rport=37447,mid=<000301c64a39
$35ea32ea$64225a40@hmccgw>,bayes=0,autolearn=no
Mar 18 05:04:02 mail1 spamd[15430]: prefork: child states: IBBBI
Mar 18 05:04:16 mail1 spamd[15430]: prefork: retrying syswrite(): Resource 
temporarily unavailable 
at /usr/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/SpamdForkScaling.pm line 
616.
Mar 18 05:04:16 mail1 spamd[15430]: prefork: syswrite(21) to 19194 failed on 
try 2 at /usr/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/SpamdForkScaling.pm 
line 593.
Mar 18 05:04:17 mail1 spamd[15430]: prefork: retrying syswrite(): Resource 
temporarily unavailable 
at /usr/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/SpamdForkScaling.pm line 
616.
Mar 18 05:04:17 mail1 spamd[15430]: prefork: syswrite(21) to 19194 failed on 
try 3 at /usr/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/SpamdForkScaling.pm 
line 593.
Mar 18 05:04:18 mail1 spamd[15430]: prefork: retrying syswrite(): Resource 
temporarily unavailable 
at /usr/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/SpamdForkScaling.pm line 
616.
Mar 18 05:04:18 mail1 spamd[15430]: prefork: syswrite(21) to 19194 failed on 
try 4 at /usr/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/SpamdForkScaling.pm 
line 593.
Mar 18 05:04:18 mail1 spamd[15430]: prefork: giving up 
at /usr/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/SpamdForkScaling.pm line 
595.
Mar 18 05:04:18 mail1 spamd[15430]: prefork: write of ping failed to 19194 
fd=21:  
at /usr/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/SpamdForkScaling.pm line 
330.
Mar 18 05:04:18 mail1 spamd[15430]: prefork: killing failed child 19194 fd=21 
at /usr/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/SpamdForkScaling.pm line 
127.
Mar 18 05:04:18 mail1 spamd[15430]: prefork: killed child 19194 
at /usr/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/SpamdForkScaling.pm line 
141.
Mar 18 05:04:18 mail1 spamd[15430]: prefork: select returned -1! recovering: 
Bad file descriptor
Mar 18 05:04:18 mail1 spamd[15430]: spamd: handled cleanup of child pid 19194 
due to SIGCHLD
Mar 18 05:04:18 mail1 spamd[15430]: prefork: select returned -1! recovering: 
Bad file descriptor
Mar 18 05:04:19 mail1 spamd[15430]: prefork: select returned -1! recovering: 
Bad file descriptor
Mar 18 05:04:20 mail1 spamd[15430]: prefork: select returned -1! recovering: 
Bad file descriptor
Mar 18 05:04:21 mail1 spamd[15430]: prefork: select returned -1! recovering: 
Bad file descriptor
Mar 18 05:04:22 mail1 spamd[15430]: prefork: select returned -1! recovering: 
Bad file descriptor
Mar 18 05:04:23 mail1 spamd[15430]: prefork: select returned -1! recovering: 
Bad file descriptor
Mar 18 05:04:24 mail1 spamd[15430]: prefork: select returned -1! recovering: 
Bad file descriptor

...

ar 18 05:09:17 mail1 spamd[15430]: prefork: select returned -1! recovering: 
Bad file descriptor
Mar 18 05:09:18 mail1 spamd[15430]: prefork: select returned -1! recovering: 
Bad file descriptor
Mar 18 05:09:19 mail1 spamd[15430]: prefork: select returned -1! recovering: 
Bad file descriptor
Mar 18 05:09:19 mail1 spamd[1550]: prefork: sysread(10) failed after 300 secs 
at /usr/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/SpamdForkScaling.pm line 
544.
Mar 18 05:09:19 mail1 spamd[20338]: prefork: sysread(7) failed after 300 secs 
at /usr/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/SpamdForkScaling.pm line 
544.
Mar 18 05:09:19 mail1 spamd[15430]: spamd: handled cleanup of child pid 20338 
due to SIGCHLD
Mar 18 05:09:19 mail1 spamd[15430]: prefork: select returned -1! recovering: 
Bad file descriptor
Mar 18 05:09:19 mail1 spamd[15430]: prefork: select returned -1! recovering: 
Bad file descriptor
Mar 18 05:09:20 mail1 spamd[15430]: prefork: select returned -1! recovering: 
Bad file descriptor
Mar 18 05:09:21 mail1 spamd[15430]: prefork: select returned -1! recovering: 
Bad file descriptor

...

Mar 18 06:07:29 mail1 spamd[15430]: prefork: select returned -1! recovering: 
Bad file descriptor
Mar 18 06:07:30 mail1 spamd[15430]: prefork: select returned -1! recovering: 
Bad file descriptor
Mar 18 06:07:30 mail1 spamc[29254]: connect(AF_INET) to spamd at 127.0.0.1 
failed, retrying (#1 of 3):
 Connection timed out
Mar 18 06:07:30 mail1 spamc[29256]: connect(AF_INET) to spamd at 127.0.0.1 
failed, retrying (#1 of 3):
 Connection timed out
Mar 18 06:07:30 mail1 spamc[29258]: connect(AF_INET) to spamd at 127.0.0.1 
failed, retrying (#1 of 3):
 Connection timed out
Mar 18 06:07:31 mail1 spamd[15430]: prefork: select returned -1! recovering: 
Bad file descriptor
Mar 18 06:07:32 mail1 spamd[15430]: prefork: select returned -1! recovering: 
Bad file descriptor
Mar 18 06:07:33 mail1 spamd[15430]: prefork: select returned -1! recovering: 
Bad file descriptor
Mar 18 06:07:34 mail1 spamd[15430]: prefork: select returned -1! recovering: 
Bad file descriptor
Comment 41 Frank Urban 2006-07-19 07:27:48 UTC
Created attachment 3593 [details]
Part of maillog

now it happened again. Here the newest extract from my logfile 
Frank
SpamAssassin 3.1.3
Comment 42 Frank Urban 2006-07-19 07:29:53 UTC
Created attachment 3594 [details]
DOS readable version 

DOS readable version. changed with unix2dos
Comment 43 peter gervai 2006-08-31 07:40:36 UTC
Repeatedly same here, Debian/Linux sid:

Aug 29 21:26:24 Yikes spamd[8608]: prefork: retrying syswrite(): Resource
temporarily unavailable at usr/share/perl5/Mail/SpamAssassin/SpamdForkScaling.pm
line 623. 
Aug 29 21:26:24 Yikes spamd[8608]: prefork: syswrite(8) to 1652 failed on try 2
at /usr/share/perl5/Mail/SpamAssassin/SpamdForkScaling.pm line 600. 
Aug 29 21:26:25 Yikes spamd[8608]: prefork: retrying syswrite(): Resource
temporarily unavailable at
/usr/share/perl5/Mail/SpamAssassin/SpamdForkScaling.pm line 623. 
Aug 29 21:26:25 Yikes spamd[8608]: prefork: syswrite(8) to 1652 failed on try 3
at /usr/share/perl5/Mail/SpamAssassin/SpamdForkScaling.pm line 600. 
Aug 29 21:26:26 Yikes spamd[8608]: prefork: retrying syswrite(): Resource
temporarily unavailable at
/usr/share/perl5/Mail/SpamAssassin/SpamdForkScaling.pm line 623. 
Aug 29 21:26:26 Yikes spamd[8608]: prefork: syswrite(8) to 1652 failed on try 4
at /usr/share/perl5/Mail/SpamAssassin/SpamdForkScaling.pm line 600. 
Aug 29 21:26:26 Yikes spamd[8608]: prefork: giving up at
/usr/share/perl5/Mail/SpamAssassin/SpamdForkScaling.pm line 602. 
Aug 29 21:26:26 Yikes spamd[8608]: prefork: write of ping failed to 1652 fd=8:
at /usr/share/perl5/Mail/SpamAssassin/SpamdForkScaling.pm line 333. 
Aug 29 21:26:26 Yikes spamd[8608]: prefork: killing failed child 1652 fd=8 at
/usr/share/perl5/Mail/SpamAssassin/SpamdForkScaling.pm line 127. 
Aug 29 21:26:26 Yikes spamd[8608]: prefork: killed child 1652 at
/usr/share/perl5/Mail/SpamAssassin/SpamdForkScaling.pm line 141. 
Aug 29 21:26:26 Yikes spamd[8608]: spamd: handled cleanup of child pid 1652 due
to SIGCHLD 
Aug 29 21:26:26 Yikes spamd[8608]: prefork: select returned -1! recovering: Bad
file descriptor 
Aug 29 21:26:27 Yikes spamd[8608]: prefork: select returned -1! recovering: Bad
file descriptor 
Aug 29 21:26:28 Yikes spamd[8608]: prefork: select returned -1! recovering: Bad
file descriptor 
Aug 29 21:26:29 Yikes spamd[8608]: prefork: select returned -1! recovering: Bad
file descriptor 
Aug 29 21:26:30 Yikes spamd[8608]: prefork: select returned -1! recovering: Bad
file descriptor 

until spamd restarted.
# spamd --version
SpamAssassin Server version 3.1.4
  running on Perl 5.8.8
  with SSL support (IO::Socket::SSL 0.998)
Comment 44 peter gervai 2006-09-20 07:17:15 UTC
And happens again. 

Guys, please some kind soul leave a comment whether there is some planned
improvement, or some workaround I could patch in. I tried brifly to examine the
code, but it is so nicely modular that I would have to examine most of SA's
source to come up with a hack, for which I have been lacking the time. 

If I guess right, and this code is in the spamd server, would it be possible to
completely restart the whole mess? (It seems to be dead anyway, so kill
everything and restart cleanly, if there is no better idea. How to do that,
apart from die'ing and leaving extrnal restarter to take care of the restart?)

If this is in a child, would it be possible to suicide and leave the system to
go on? 

[ Some solution would be extremely nice since this is a really major pain in he
ass... it just stops the whole spamd, which stalls all mailqueues, which
basically stops everything smtp around, which is not nice. I hate morning
phonecalls, how about you? :) ]
Comment 45 Scott Lamb 2006-09-20 07:35:54 UTC
Re: comment #44:

Peter, I've been using "--round-robin" ever since it was reported as a workaround way back in comment 
#8. Haven't hit this problem since, and it's been almost a year. If this is causing you grief, I say just throw 
that option in.
Comment 46 peter gervai 2006-09-20 20:01:31 UTC
(In reply to comment #45)

> Peter, I've been using "--round-robin" ever since it was reported as a
workaround way back in comment 

Ermm, the problem is that as far as I remember RR spawns the maximal number of
SA, which is pretty memory-consuming in case of high maximum. Imagine a system
with average 4-5 spamd and the maximum of around 50 or more, where 50 allocates
pretty extensive amount of memory (and it grows substantially over time, doesn't
it for you?), which is usually very rarely needed.

But you're right, maybe this is a workaround. Other workaround is attached,
pretty much an "if you've got a hammer everything looks like a nail" approach. :)

Thanks.
Comment 47 peter gervai 2006-09-20 20:03:34 UTC
Created attachment 3700 [details]
spamd restart hack

Run this as root, fill in admin email and check spamd restart command at the
end. It will restart SA in case of the problem. Hopefully. Not thoroughly
tested, use at your own risk.
Comment 48 Tom Schulz 2006-09-25 13:19:44 UTC
I think that the problem here is that none of the dev's can reliably
reproduce this bug and none of the traces provided allow them to track
down the exact cause of the problem.  What is needed is for someone
who can reproduce this bug to dig in and track down the cause of the bug.
Comment 49 Frank Urban 2006-09-28 06:09:13 UTC
The spamd restart program work great. I had 3 restarts this night.
Can someone give me a hint how I can include the hostname into the 
informationmail?

system "echo 'SA restart on" . qx("hostname") . " at " . scalar
(localtime) . "' | mail $adminemail -s spamd-sux";

is that correct?



is this information helpfull?

mail3:/var/log # grep "spamd\[19736\]" mail
Sep 28 00:15:59 mail3 spamd[19736]: prefork: child states: IIBB
Sep 28 00:17:12 mail3 spamd[19736]: prefork: child states: IIBB
Sep 28 00:17:22 mail3 spamd[19736]: prefork: child states: IIBB
Sep 28 00:17:49 mail3 spamd[19736]: prefork: child states: IIBB
Sep 28 00:18:49 mail3 spamd[19736]: prefork: child states: IIBB
Sep 28 00:19:03 mail3 spamd[19736]: prefork: child states: IIBB
Sep 28 00:19:18 mail3 spamd[19736]: prefork: child states: IBBB
Sep 28 00:19:22 mail3 spamd[19736]: prefork: child states: IIBB
Sep 28 00:20:12 mail3 spamd[19736]: prefork: child states: IIBB
Sep 28 00:20:32 mail3 spamd[19736]: prefork: child states: BBBB
Sep 28 00:20:32 mail3 spamd[19736]: spamd: server successfully spawned child 
process, pid 6176
Sep 28 00:20:32 mail3 spamd[19736]: prefork: child states: BBBIB
Sep 28 00:20:35 mail3 spamd[19736]: prefork: child states: IBBIB
Sep 28 00:20:37 mail3 spamd[19736]: prefork: child states: IBBIB
Sep 28 00:20:38 mail3 spamd[19736]: prefork: child states: IIBIB
Sep 28 00:20:38 mail3 spamd[19736]: spamd: handled cleanup of child pid 6176 
due to SIGCHLD
Sep 28 00:20:38 mail3 spamd[19736]: prefork: child states: IIBB
Sep 28 00:20:42 mail3 spamd[19736]: prefork: child states: IIBB
Sep 28 00:20:56 mail3 spamd[19736]: prefork: child states: IIBB
Sep 28 00:21:10 mail3 spamd[19736]: prefork: child states: IIBB
Sep 28 00:21:15 mail3 spamd[19736]: prefork: child states: IIBB
Sep 28 00:21:41 mail3 spamd[19736]: prefork: child states: IIBB
Sep 28 00:21:43 mail3 spamd[19736]: prefork: child states: IIBB
Sep 28 00:22:00 mail3 spamd[19736]: prefork: child states: IIBB
Sep 28 00:22:09 mail3 spamd[19736]: prefork: child states: IIBB
Sep 28 00:22:11 mail3 spamd[19736]: prefork: retrying syswrite(): Resource 
temporarily unavailable 
at /usr/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/SpamdForkScaling.pm line 
624.
Sep 28 00:22:11 mail3 spamd[19736]: prefork: syswrite(12) to 6120 failed on 
try 2 at /usr/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/SpamdForkScaling.pm 
line 601.
Sep 28 00:22:12 mail3 spamd[19736]: prefork: retrying syswrite(): Resource 
temporarily unavailable 
at /usr/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/SpamdForkScaling.pm line 
624.
Sep 28 00:22:12 mail3 spamd[19736]: prefork: syswrite(12) to 6120 failed on 
try 3 at /usr/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/SpamdForkScaling.pm 
line 601.
Sep 28 00:22:13 mail3 spamd[19736]: prefork: retrying syswrite(): Resource 
temporarily unavailable 
at /usr/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/SpamdForkScaling.pm line 
624.
Sep 28 00:22:13 mail3 spamd[19736]: prefork: syswrite(12) to 6120 failed on 
try 4 at /usr/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/SpamdForkScaling.pm 
line 601.
Sep 28 00:22:13 mail3 spamd[19736]: prefork: giving up 
at /usr/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/SpamdForkScaling.pm line 
603.
Sep 28 00:22:13 mail3 spamd[19736]: prefork: write of ping failed to 6120 
fd=12: at /usr/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/SpamdForkScaling.pm 
line 334.
Sep 28 00:22:13 mail3 spamd[19736]: prefork: killing failed child 6120 fd=12 
at /usr/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/SpamdForkScaling.pm line 
128.
Sep 28 00:22:13 mail3 spamd[19736]: prefork: killed child 6120 
at /usr/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/SpamdForkScaling.pm line 
142.
Sep 28 00:22:13 mail3 spamd[19736]: prefork: select returned -1! recovering: 
Bad file descriptor
Sep 28 00:22:13 mail3 spamd[19736]: spamd: handled cleanup of child pid 6120 
due to SIGCHLD
Sep 28 00:22:13 mail3 spamd[19736]: prefork: select returned -1! recovering: 
Bad file descriptor
Sep 28 00:22:14 mail3 spamd[19736]: spamd: server killed by SIGTERM, shutting 
down






Comment 50 Justin Mason 2006-11-02 13:33:18 UTC
--debug output, or strace, would be useful I think...
Comment 51 Kai Bolay 2006-11-02 13:37:27 UTC
related to bug 5149?
Comment 52 Justin Mason 2006-11-02 14:00:39 UTC
no, I don't think they're related.
Comment 53 paulf 2006-11-13 03:17:34 UTC
I have been having this same issue on one of my boxen - SA 3.1.3

We also have clamAV installed - I don't recall this issue happening before we
installed ClamAV.
Comment 54 Justin Mason 2007-03-23 04:50:53 UTC
Frank, Peter, and anyone else seeing messages like this:

Sep 28 00:22:13 mail3 spamd[19736]: prefork: killed child 6120 
at /usr/lib/perl5/site_perl/5.8.7/Mail/SpamAssassin/SpamdForkScaling.pm line 
142.
Sep 28 00:22:13 mail3 spamd[19736]: prefork: select returned -1! recovering: 
Bad file descriptor
Sep 28 00:22:13 mail3 spamd[19736]: spamd: handled cleanup of child pid 6120 
due to SIGCHLD
Sep 28 00:22:13 mail3 spamd[19736]: prefork: select returned -1! recovering: 
Bad file descriptor

I think you're seeing bug 5313:
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5313
(sorry Kai, I think you were right after all.)

Please try the patch in that bug and see if it helps.
Comment 55 Robert Blayzor 2007-03-26 08:02:29 UTC
Seeing this problem on FreeBSD 4.11, SpamAssassin 3.1.8 with PERL 5.8.5.

Seems to happen at completely random times.  The only common thing I can tell is
that I see some of the child prefork status in "S" just after this happens.

When this happens it doesn't seem to effect spamd as a whole, just the child,
which I assume gets restarted?

Mar 26 13:05:44 scrub1-b spamd[69396]: prefork: child states: IIIBIIIIII 
Mar 26 13:05:45 scrub1-b spamd[69396]: prefork: child states: BIIIIIIIII 
Mar 26 13:05:47 scrub1-b spamd[69396]: prefork: child states: IBBBIIIIII 
Mar 26 13:05:48 scrub1-b spamd[69396]: prefork: child states: IIBBIIIIII 
Mar 26 13:05:49 scrub1-b spamd[69396]: prefork: child states: BIIBIIIIII 
Mar 26 13:05:50 scrub1-b spamd[69396]: prefork: child states: BIBBBIIIII 
Mar 26 13:05:51 scrub1-b spamd[69396]: prefork: child states: IIBBBIIIII 
Mar 26 13:05:51 scrub1-b spamd[69396]: prefork: child states: IIIBBIIIII 
Mar 26 13:05:53 scrub1-b spamd[69396]: prefork: child states: BBIBIIIIII 
Mar 26 13:05:53 scrub1-b spamd[69396]: prefork: child states: BBIIIIIIII 
Mar 26 13:05:54 scrub1-b spamd[69396]: prefork: child states: BIBBIIIIII 
Mar 26 13:05:56 scrub1-b spamd[69396]: prefork: child states: BBBIBIIIII 
Mar 26 13:05:56 scrub1-b spamd[69396]: prefork: child states: IBBIBIIIII 
Mar 26 13:05:57 scrub1-b spamd[69396]: prefork: child states: BBIIBIIIII 
Mar 26 13:05:58 scrub1-b spamd[69396]: prefork: child states: IBBIBIIIII 
Mar 26 13:05:59 scrub1-b spamd[69396]: prefork: child states: IBBIBIIIII 
Mar 26 13:05:59 scrub1-b spamd[70957]: prefork: sysread(16) failed after 300
secs at /usr/
local/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/SpamdForkScaling.pm line 575. 
Mar 26 13:05:59 scrub1-b spamd[70956]: prefork: sysread(16) failed after 300
secs at /usr/
local/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/SpamdForkScaling.pm line 575. 
Mar 26 13:05:59 scrub1-b spamd[70952]: prefork: sysread(16) failed after 300
secs at /usr/
local/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/SpamdForkScaling.pm line 575. 
Mar 26 13:05:59 scrub1-b spamd[69396]: prefork: child states: BBBIBIIIII 
Mar 26 13:05:59 scrub1-b spamd[69396]: prefork: child states: BBBIBIIIII 
Mar 26 13:05:59 scrub1-b spamd[69396]: prefork: child states: BBBIBIIIII 
Mar 26 13:05:59 scrub1-b spamd[69396]: prefork: child states: BBBIBIIISS 
Mar 26 13:05:59 scrub1-b spamd[69396]: prefork: child states: BBBIBIIIIS 
Mar 26 13:05:59 scrub1-b spamd[69396]: prefork: child states: BBBIBIIIII 
Mar 26 13:06:00 scrub1-b spamd[69396]: prefork: child states: BBBBIIIIII 
Mar 26 13:06:00 scrub1-b spamd[69396]: prefork: child states: BIBBIIIIII 
Mar 26 13:06:00 scrub1-b spamd[69396]: prefork: child states: BIIBIIIIII 
Mar 26 13:06:02 scrub1-b spamd[69396]: prefork: child states: BBIIIIIIII 

Comment 56 Justin Mason 2007-03-26 08:23:18 UTC
Robert, that's not this bug.
Comment 57 Frank Urban 2007-03-26 23:30:38 UTC
I applied the patch from bug 5313 yesterday morning.
Since this time I didn't see this problem again. Until yesterday if have seen 
it 3 times a day and was only able to stay alive with the "spamd restart 
hack ".

Looks good now...

Frank
Comment 58 Justin Mason 2007-03-29 12:53:32 UTC
I'd appreciate it if anyone else running into prefork-related errors, try out
the latest patch at bug 5313 and report success/failure....
Comment 59 Justin Mason 2007-04-03 11:40:15 UTC
OK, I'm marking this resolved, as (as far as I can tell) all the outstanding
reports are actually instances of bug 5313, which is now resolved ;)

reminder:

Mar 18 06:07:34 mail1 spamd[15430]: prefork: select returned -1! recovering: 
Bad file descriptor

if you are seeing that message, please go to bug 5313, not this one.

in fact, any further *new* prefork bugs should probably be opened as new bugs
rather than comments here, to avoid confusion...