Bug 6761 - Intermittent fetch failures from daryl.dostech.ca
Summary: Intermittent fetch failures from daryl.dostech.ca
Status: RESOLVED DUPLICATE of bug 6838
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: sa-update (show other bugs)
Version: 3.3.2
Hardware: PC FreeBSD
: P2 normal
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-02-20 09:03 UTC by Jeremy Chadwick
Modified: 2012-09-25 15:48 UTC (History)
4 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Jeremy Chadwick 2012-02-20 09:03:09 UTC
I thought I got done dealing with this type of problem back in Bug 6511, but apparently something else is up/wonky now.

For the past week or so, we've been seeing intermittent failures of different kinds when sa-update runs and tries to download content from http://daryl.dostech.ca/sa-update/asf/

Below are the failures we've seen sa-update spit out, as well as the dates/times when the failures occurred, and what the failure messages were per perl LWP.  Note that the failures are not all the same kind.  We run sa-update once a day.


Date: Tue, 14 Feb 2012 00:43:22 -0800 (PST)
http: GET http://daryl.dostech.ca/sa-update/asf/1243434.tar.gz request failed: 500 Can't connect to daryl.dostech.ca:80 (timeout):
Can't connect to daryl.dostech.ca:80 (timeout) LWP::Protocol::http::Socket: connect: timeout at
/usr/local/lib/perl5/site_perl/5.12.4/LWP/Protocol/http.pm line 51.


Date: Wed, 15 Feb 2012 00:43:50 -0800 (PST)
http: GET http://daryl.dostech.ca/sa-update/asf/1243828.tar.gz request failed: 500 Can't connect to daryl.dostech.ca:80 (timeout):
Can't connect to daryl.dostech.ca:80 (timeout) LWP::Protocol::http::Socket: connect: timeout at
/usr/local/lib/perl5/site_perl/5.12.4/LWP/Protocol/http.pm line 51.


Date: Mon, 20 Feb 2012 00:43:19 -0800 (PST)
http: GET http://daryl.dostech.ca/sa-update/asf/1290969.tar.gz request failed: 500 read timeout: read timeout at
/usr/local/lib/perl5/site_perl/5.12.4/LWP/Protocol/http.pm line 433.


sa-update supposedly worked despite these failures (exit code 0 was returned), so I imagine one of the other mirrors was used, but they all have the same weight so maybe this problem is intermittent in nature due to that?

# cat MIRRORED.BY
# test mirror: zone, cached via Coral
#http://buildbot.spamassassin.org.nyud.net:8090/updatestage/
http://daryl.dostech.ca/sa-update/asf/ weight=5
http://www.sa-update.pccc.com/ weight=5
http://sa-update.secnap.net/ weight=5


So in this case, Daryl C. W. O'Shea needs to look at his server and/or surrounding network to see what is going on.  I'm sorry that I don't have traceroutes or anything else to go on -- I can set up periodic traceroutes to daryl.dostech.ca if folks think its a network layer problem (I'm well aware that the Internet is constantly broken :-) ).  Or maybe it's box maintenance; I don't know.

If you need any other information, let me know.
Comment 1 Kevin A. McGrail 2012-02-20 14:16:24 UTC
Thanks.  We have scripts in place that notify us of the outages.  So this is a known issue but considered non-priority since sa-update will automatically use one of the other update providers if a different one is down.

Regards,
KAM
Comment 2 Jeremy Chadwick 2012-02-21 09:46:48 UTC
Kevin, thanks for the insights.  I'm not so sure this should be set as RESOLVED FIXED however, as it obviously isn't fixed:

Date: Tue, 21 Feb 2012 00:43:06 -0800 (PST)
http: GET http://daryl.dostech.ca/sa-update/asf/1291150.tar.gz request failed: 500 Can't connect to daryl.dostech.ca:80 (timeout):
Can't connect to daryl.dostech.ca:80 (timeout) LWP::Protocol::http::Socket: connect: timeout at
/usr/local/lib/perl5/site_perl/5.12.4/LWP/Protocol/http.pm line 51.

Possibly RESOLVED WONTFIX would be more appropriate?

Also: do you know if this issue is specific to certain times of the day (e.g. should I change my cronjob to run sa-update at a different time)?  How do I squelch the LWP error output (sa-update has no --quiet equivalent) aside from piping it to the equivalent of grep -v?

If this kind of transient error is considered acceptable, then I would advocate that sa-update should trap the LWP error condition either through $ref->is_error() verification or using eval {}; (if needed) and not complain on these types of errors (but make sure to complain in the case that all mirrors fail, of course!).
Comment 3 Kevin A. McGrail 2012-02-21 18:08:26 UTC
(In reply to comment #2)
> Kevin, thanks for the insights.  I'm not so sure this should be set as RESOLVED
> FIXED however, as it obviously isn't fixed:
> 
> Date: Tue, 21 Feb 2012 00:43:06 -0800 (PST)
> http: GET http://daryl.dostech.ca/sa-update/asf/1291150.tar.gz request failed:
> 500 Can't connect to daryl.dostech.ca:80 (timeout):
> Can't connect to daryl.dostech.ca:80 (timeout) LWP::Protocol::http::Socket:
> connect: timeout at
> /usr/local/lib/perl5/site_perl/5.12.4/LWP/Protocol/http.pm line 51.
> 
> Possibly RESOLVED WONTFIX would be more appropriate?

Agreed.

 
> Also: do you know if this issue is specific to certain times of the day (e.g.
> should I change my cronjob to run sa-update at a different time)?  

I don't, sorry.

> How do I
> squelch the LWP error output (sa-update has no --quiet equivalent) aside from
> piping it to the equivalent of grep -v?

sa-update 2>&1 > /dev/null would work.


> If this kind of transient error is considered acceptable, then I would advocate
> that sa-update should trap the LWP error condition either through
> $ref->is_error() verification or using eval {}; (if needed) and not complain on
> these types of errors (but make sure to complain in the case that all mirrors
> fail, of course!).

I agree. I run sa-update -D and didn't care about the copious information.  Are you running sa-update more than once a day?  If so, that's largely unnecessary.

Low priority but this would be good polish.
Comment 4 Jeremy Chadwick 2012-02-22 12:26:05 UTC
Keeping my reply simple: we run sa-update once a day, at 00:40 Pacific Time.
Comment 5 Bernhard Schmidt 2012-09-21 08:30:30 UTC
For a couple of days now, daryl.dostech.ca seems to be lagging behind in updates. Almost every run I get a message like

http: GET http://daryl.dostech.ca/sa-update/asf/1387911.tar.gz request failed: 404 Not Found: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>404 Not Found</title> </head><body> <h1>Not Found</h1> <p>The requested URL /sa-update/asf/1387911.tar.gz was not found on this server.</p> <hr> <address>Apache/2.2.6 (Fedora) Server at daryl.dostech.ca Port 80</address> </body></html> 

not fatal, since sa-update tries the other mirrors too, but something one should have a look at.
Comment 6 Kevin A. McGrail 2012-09-21 15:07:36 UTC
The system is designed to automatically try other mirrors as the main resiliency so as long as you are getting the update, that's the key concern.  However, you might try a later time in the day for your cron to run to help alleviate the issue and I'll reach out to the mirror operator.
Comment 7 Darxus 2012-09-25 15:48:24 UTC
Marking duplicate of bug 6838 since there is more discussion of the same issue there, in case anybody comes across this one later.

*** This bug has been marked as a duplicate of bug 6838 ***