Bug 4476 - prefork: sysread(8) failed after 300 secs (no children responding)
Summary: prefork: sysread(8) failed after 300 secs (no children responding)
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: spamc/spamd (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: Other Linux
: P3 normal
Target Milestone: 3.1.2
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on: 4696
Blocks:
  Show dependency tree
 
Reported: 2005-07-10 22:28 UTC by Daryl C. W. O'Shea
Modified: 2006-02-27 18:52 UTC (History)
2 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Daryl C. W. O'Shea 2005-07-10 22:28:19 UTC
It appears that the two children became zombies and weren't replaced.

"prefork: sysread(8) failed after 300 secs" is the last thing logged.




SpamAssassin version 3.1.0-pre3-r202465
  running on Perl version 5.8.3


Jul 10 03:56:11 cyan spamassassin[32355]: prefork: child states: II
Jul 10 03:58:35 cyan spamassassin[24483]: spamd: connection from
smtp.film-tech.net [66.98.221.156] at port 4742
Jul 10 03:58:35 cyan spamassassin[24483]: spamd: processing message
<200507100758.j6A7w1929850@dragon.redballoon.net> for user@domain.com:501
Jul 10 03:58:38 cyan spamassassin[24483]: spamd: identified spam (27.6/5.0) for
user@domain.com:501 in 2.5 seconds, 2880 bytes.
Jul 10 03:58:38 cyan spamassassin[24483]: spamd: result: Y 27 -
BAYES_99,DCC_CHECK,DIGEST_MULTIPLE,FUZZY_REFINANCE,HTML_40_50,HTML_MESSAGE,MIME_BOUND_NEXTPART
,MORTGAGE_RATES,MSGID_FROM_MTA_ID,RAZOR2_CF_RANGE_51_100,RAZOR2_CF_RANGE_E4_51_100,RAZOR2_CHECK,SARE_MILLIONSOF,URIBL_BLACK,URIBL_JP_SURBL,URIBL_OB_SURBL,URIB
L_SBL,URIBL_SC_SURBL,URIBL_WS_SURBL,URI_REDIRECTOR,WRONGMX
scantime=2.5,size=2880,user=user@domain.com,uid=501,required_score=5.0,rhost=smtp.film-tech.n
et,raddr=66.98.221.156,rport=4742,mid=<200507100758.j6A7w1929850@dragon.redballoon.net>,bayes=1,autolearn=spam
Jul 10 03:58:38 cyan spamassassin[32355]: prefork: child states: II
Jul 10 04:02:05 cyan spamassassin[24483]: spamd: connection from
cyan.dostech.net [127.0.0.1] at port 52663
Jul 10 04:02:05 cyan spamassassin[24483]: spamd: processing message
<3F5430226.7052858@codeweavers.com> for dos@csolve.net:501
Jul 10 04:02:10 cyan spamassassin[24483]: spamd: identified spam (41.6/5.0) for
dos@csolve.net:501 in 4.9 seconds, 3783 bytes.
Jul 10 04:02:10 cyan spamassassin[24483]: spamd: result: Y 41 -
BAYES_99,DOMAIN_AGE_0_7_DAYS,DOMAIN_RATIO,FORGED_YAHOO_RCVD,HTML_BACKHAIR_8,HTML_FONT_BIG,HTML
_FONT_TINY,HTML_MESSAGE,HTML_TINY_FONT,HTML_TITLE_LONG,HTML_TITLE_SUBJ_DIFF,MIME_HTML_ONLY,RCVD_HELO_IP_MISMATCH,RCVD_NUMERIC_HELO,SARE_HTML_FSIZE_1ALL,SARE_H
TML_INV_TAG,SARE_HTML_USL_OBFU,SARE_SUB_MSGSUB,SARE_SUB_MSG_SUBJ,UNPARSEABLE_RELAY,URIBL_AB_SURBL,URIBL_BLACK,URIBL_JP_SURBL,URIBL_OB_SURBL,URIBL_SC_SURBL
sca
ntime=4.9,size=3783,user=dos@csolve.net,uid=501,required_score=5.0,rhost=cyan.dostech.net,raddr=127.0.0.1,rport=52663,mid=<3F5430226.7052858@codeweavers.com>,
bayes=1,autolearn=spam
Jul 10 04:02:10 cyan spamassassin[32355]: prefork: child states: II
Jul 10 04:24:08 cyan spamassassin[24483]: spamd: connection from
cyan.dostech.net [127.0.0.1] at port 52736
Jul 10 04:24:11 cyan spamassassin[24483]: spamd: processing message
<97e63073ad81ba8f16ac54a162b57839@abbott-jxihny5u> for dos@csolve.net:501
Jul 10 04:24:29 cyan spamassassin[24483]: spamd: identified spam (15.1/5.0) for
dos@csolve.net:501 in 20.7 seconds, 29897 bytes.
Jul 10 04:24:29 cyan spamassassin[24483]: spamd: result: Y 15 -
AWL,BAYES_99,DCC_CHECK,DIGEST_MULTIPLE,HTML_80_90,HTML_FONT_BIG,HTML_MESSAGE,MAILTO_TO_REMOVE,
MONEY_BACK,RAZOR2_CF_RANGE_51_100,RAZOR2_CF_RANGE_E4_51_100,RAZOR2_CHECK,RM_bpoem_InstantDL,SENT_IN_COMPLIANCE,URIBL_BLACK
scantime=20.7,size=29897,user=dos@c
solve.net,uid=501,required_score=5.0,rhost=cyan.dostech.net,raddr=127.0.0.1,rport=52736,mid=<97e63073ad81ba8f16ac54a162b57839@abbott-jxihny5u>,bayes=1,autolea
rn=no
Jul 10 04:24:29 cyan spamassassin[32355]: prefork: child states: II
Jul 10 04:31:39 cyan spamassassin[32366]: prefork: sysread(8) failed after 300
secs at /usr/local/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/SpamdForkScaling
.pm line 501.
Jul 10 04:31:41 cyan spamassassin[24483]: prefork: sysread(8) failed after 300
secs at /usr/local/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/SpamdForkScaling
.pm line 501.



[dos@cyan dos]$ ps -aux | grep spamd
root     31602  0.0  0.4 26428 1144 ?        S    Jul06   0:01
/usr/local/bin/perl -T -w ../spamd/spamd -D -x -s stderr -C log/test_rules_copy
--siteconfigpat
h log/localrules.tmp -p 64357 -L
root     31606  0.0  0.4 26724 1140 ?        S    Jul06   0:00 spamd child
root     31607  0.0  0.4 26428 1056 ?        S    Jul06   0:00 spamd child
root     32355  0.0  0.3 41900  916 ?        S    Jul06   0:06
/usr/local/bin/spamd -d -m 5 -q -x -u spamd -i 0.0.0.0 -A 66.98.221.156 -A 127.0.0.1
spamd    32366  0.0  0.0     0    0 ?        Z    Jul06   0:16 [spamd <defunct>]
spamd    24483  0.0  0.0     0    0 ?        Z    01:12   0:26 [spamd <defunct>]




Linux cyan.dostech.net 2.4.20-28.9 #1 Thu Dec 18 13:46:42 EST 2003 i686 athlon
i386 GNU/Linux
Comment 1 Justin Mason 2005-07-12 16:36:31 UTC
can you get "strace -f" output of the parent and/or the kids?
Comment 2 Daryl C. W. O'Shea 2005-07-12 16:49:18 UTC
I was, um, quite distracted when it happened Saturday night.

I didn't strace the children since they were zombies, and I forgot about the parent.

The weekly cron jobs, starting at 4:22, were running at the time (makewhatis &
updatedb).  I wonder if it was load related...
Comment 3 Justin Mason 2005-08-04 17:09:56 UTC
has this reoccurred?
Comment 4 Justin Mason 2005-08-04 17:26:29 UTC
lowering pri
Comment 5 Daryl C. W. O'Shea 2005-08-04 22:21:52 UTC
Subject: Re:  prefork: sysread(8) failed after 300 secs (no children
 responding)

> has this reoccurred?

Unfortunately no.

For what it's worth, these two children:

root     31606  0.0  0.4 26724 1140 ?        S    Jul06   0:00 spamd child
root     31607  0.0  0.4 26428 1056 ?        S    Jul06   0:00 spamd child

had to be killed off manually.  'killproc spamd' didn't take care off 
them, but it did take care of the defunct processes.

I'm guessing that when the two children became zombies and the new 
children were created they were never kept track of (why I don't know), 
but did make spamd happy about the number of idle child processes.  It 
almost sounds like spamd mixed up the PIDs of the new and zombie children.

I'm hoping that this happens to someone when RC1 is released since I 
can't reproduce it -- even under insane load.

Comment 6 Daryl C. W. O'Shea 2005-08-11 00:05:17 UTC
I wondering if this is related to bug 4370.  I don't have time to really look
into it (I'm currently working 100+ hour weeks), but the comments in that bug's
patch make it sounds like a posibility.
Comment 7 Justin Mason 2005-08-11 09:17:35 UTC
yeah, it's very likely -- that would indeed result in this symptom.
Comment 8 Duncan Findlay 2005-08-27 17:45:32 UTC
Marking as duplicate of bug 4370, since it can't be reproduced now that it is
fixed, and it seems it may be the same bug. Reopen if necessary.

*** This bug has been marked as a duplicate of 4370 ***
Comment 9 Daryl C. W. O'Shea 2005-09-25 02:13:16 UTC
I just had this happen again on the same machine with 3.1.0.  A strace results
in no output.

Like last time, it happened during my Sunday morning weekly cron processing,
this time at 4:38 am.

Comment 10 Daryl C. W. O'Shea 2005-11-13 04:45:11 UTC
Just last night I thought, "this hasn't happened in a while".

Well, it happened again this afternoon.  Nothing from a strace of the parent,
and I wasn't able to get the defunt children killed to make the parent do something.
Comment 11 Justin Mason 2005-12-02 06:28:26 UTC
note: if you're seeing something like this from other spamd processes around the
same time:

Nov 20 00:07:36 cyan spamd[15882]: prefork: syswrite(7) failed, retrying... at
/usr/local/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/SpamdForkScaling.pm line
554. 

it's probably bug 4590; otherwise, probably not.
Comment 12 Daryl C. W. O'Shea 2005-12-02 06:50:37 UTC
Not in my case.  I've only seen a single failure (at line 501) for each
(suddenly defunct) child and then nothing:


Nov 12 10:26:08 cyan spamd[21524]: prefork: child states: IB
Nov 12 10:26:09 cyan spamd[31065]: spamd: connection from smtp.film-tech.net
[66.98.221.156] at port 1039
Nov 12 10:26:17 cyan spamd[31065]: spamd: processing message
<EiJZzzHa7vK8Lj2Q5iWfftBDw@azoairport.com> for newsroom@domain.com:501
Nov 12 10:26:31 cyan spamd[31065]: spamd: identified spam (44.1/5.0) for
newsroom@domain.com:501 in 22.4 seconds, 22303 bytes.
Nov 12 10:26:32 cyan spamd[31065]: spamd: result: Y 44 -
BAYES_99,...,URIBL_XS_SURBL
scantime=22.4,size=22303,user=newsroom@domain.com,uid=501,
required_score=5.0,rhost=smtp.film-tech.net,raddr=66.98.221.156,rport=1039,mid=<EiJZzzHa7vK8Lj2Q5iWfftBDw@azoairport.com>,bayes=1,autolearn=spam

... lots of messages processed by child 18483 ...

Nov 12 10:53:52 cyan spamd[21524]: prefork: child states: II
Nov 12 10:54:05 cyan spamd[18483]: spamd: connection from cyan.dostech.net
[127.0.0.1] at port 39193
Nov 12 10:54:05 cyan spamd[18483]: spamd: processing message
<200511121554.jACFs4RA021125@d141-166-138.home.cgocable.net> for dos@csolve.net:501
Nov 12 10:54:06 cyan spamd[18483]: spamd: identified spam (24.6/5.0) for
dos@csolve.net:501 in 1.4 seconds, 3096 bytes.
Nov 12 10:54:06 cyan spamd[18483]: spamd: result: Y 24 -
BAYES_99,...,RCVD_IN_XBL
scantime=1.4,size=3096,user=dos@csolve.net,uid=501,required_score=5.0,rhost=cyan.dostech.net,raddr=127.0.0.1,rport=39193,mid=<200511121554.jACFs4RA021125@d141-166-138.home.cgocable.n
et>,bayes=1,autolearn=spam

Nov 12 10:54:06 cyan spamd[21524]: prefork: child states: II
Nov 12 11:02:43 cyan spamd[18483]: prefork: sysread(8) failed after 300 secs at
/usr/local/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/SpamdForkScaling.pm lin
e 501.
Nov 12 11:02:44 cyan spamd[31065]: prefork: sysread(8) failed after 300 secs at
/usr/local/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/SpamdForkScaling.pm lin
e 501.
Comment 13 Pete Pinter 2005-12-04 06:08:33 UTC
Don't know if it's related, but I thought I'd mention that since yum upgraded
our FC4 based server to perl-5.8.6-18 a few days ago, our spamassassin-3.1.0-1
has been solid. Prior to the perl upgrade, I was seeing this error daily.
Comment 14 Justin Mason 2006-02-10 20:21:18 UTC
There are some major bugfixes for the apache-style preforking subsystem in bug
4696.   If you are seeing this issue, please try applying the patch in that bug,
and see if it helps!
Comment 15 Justin Mason 2006-02-22 23:50:25 UTC
is anyone still running into this issue, or can it be closed?
Comment 16 Daryl C. W. O'Shea 2006-02-23 01:00:24 UTC
It may have happened to me again this morning (something hung spamd up).  I
don't have access to the box right now though... I'll have to check later.
Comment 17 Daryl C. W. O'Shea 2006-02-28 03:52:37 UTC
Looks like this could very well be fixed by bug 4696 (closing as fixed).  This
wasn't my problem last week... which was:

Feb 21 22:40:29 cyan spamd[15096]: prefork: sysread(8) failed after 300 secs at
/usr/local/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/SpamdForkScaling.pm line 544.
Feb 21 22:40:30 cyan spamd[10292]: prefork: sysread(10) failed after 300 secs at
/usr/local/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/SpamdForkScaling.pm line 544.
Feb 21 22:40:30 cyan spamd[23929]: spamd: handled cleanup of child pid 15096 due
to SIGCHLD
Feb 21 22:40:30 cyan spamd[23929]: prefork: syswrite(6) failed, retrying... at
/usr/local/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/SpamdForkScaling.pm line 597.
Feb 21 22:40:30 cyan spamd[23929]: spamd: handled cleanup of child pid 10292 due
to SIGCHLD
Feb 21 22:40:30 cyan spamd[23929]: prefork: syswrite(6) failed, retrying... at
/usr/local/lib/perl5/site_perl/5.8.3/Mail/SpamAssassin/SpamdForkScaling.pm line 597.