Bug 6083 - sa-update should periodically validate MIRRORED.BY files
Summary: sa-update should periodically validate MIRRORED.BY files
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: sa-update (show other bugs)
Version: unspecified
Hardware: Other All
: P1 normal
Target Milestone: 3.3.0
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-03-12 08:16 UTC by Theo Van Dinter
Modified: 2009-09-15 14:41 UTC (History)
1 user (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Theo Van Dinter 2009-03-12 08:16:18 UTC
I was noticing that after the initial update installation, sa-update will not try to update MIRRORED.BY files unless there is an update to download.  This means that channels w/ infrequent updates will have clients using potentially outdated/changed MIRRORED.BY files.  That then leads to a potential situation where all the old mirrors are no longer function, and updates will fail.

ie:

1- sa-update run completes, gets MIRRORED.BY file from mirrors.[channel] DNS pointer, stores it.  entries are for serverX, serverY, and serverZ.
2- time goes on, no channel updates are published, but the previous servers are replaced by serverA, serverB, and serverC and the MIRRORED.BY file is updated.
3- eventually an update is published and DNS updated.
4- client machines see they have a cached MIRRORED.BY file so try to download updates from serverX, serverY, and serverZ, all which fail.  at this point, sa-update fails the channel as no mirrors are available.


So I would suggest that step 3.5 be inserted such that if the MIRRORED.BY timestamp is old (say, >30d?,) an update attempt is made using the mirrors (current method) w/ failback to mirrors.[channel] if necessary.

It would also be useful, probably, to try this if the channel download fails due to all mirrors failing.
Comment 1 Karsten Bräckelmann 2009-05-16 04:04:41 UTC
Target Milestone 3.3, has been reported on the users list twice already.
Comment 2 Karsten Bräckelmann 2009-05-16 16:33:54 UTC
Unfortunately, this is *not* a potential issue, as mentioned in comment 0. There are outdated MIRRORED.BY files (even for SA 3.2.5) out there, with a single mirror only. Which, coincidentally, has been removed earlier this year. See the users list as of yesterday/today.

Of course, we can't do anything about those that exist already, since a fix release will result in a new, versioned update dir and fresh mirrors. The only cure for those is to rm the MIRRORED.BY file.

Granted, with currently 2 mirrors, this is less likely to occur with future installs. Yet not a potential issue, but a real-life problem.
Comment 3 Adam Katz 2009-05-20 14:48:13 UTC
Why does SA even use a mirrors file instead of direct TXT records?  Here's a live example with a dozen round-robin TXT entries, including the longest real-life mirror I could find plus a ridiculously long example URL.  Am I missing some piece of the RFC that prohibits this?  We're already delayed by the propagation time of the versions, so that doesn't affect anything...

$ host -t txt mirrors.testtxt.khopesh.com. |sort |perl -pne 's/^.*"([^"]+)"$/$1/'
http://abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.museum/abcdefghijklmnopqrstuvwxyz/abcdefghijklmnopqrstuvwxyz/this-is-a-ridiculously-long-sa-update-channel-name-with-tons-and-tons-of-text.cf weight=99999999999999999
http://daryl.dostech.ca/sa-update/sare/72_sare_redirect_post3.0.0.cf weight=500
http://mirror-03.example.com/testtxt weight=5
http://mirror-04.example.com/testtxt weight=4
http://mirror-05.example.com/testtxt weight=4
http://mirror-06.example.com/testtxt weight=4
http://mirror-07.example.com/testtxt weight=4
http://mirror-08.example.com/testtxt weight=2
http://mirror-09.example.com/testtxt weight=2
http://mirror-10.example.com/testtxt weight=2
http://mirror-11.example.com/testtxt weight=1
http://mirror-12.example.com/testtxt weight=1
$ 

Every time there is an update, the mirrors should probably be re-cached.  sa-update should probably also spit out a warning when there is only one mirror.
Comment 4 Theo Van Dinter 2009-05-20 16:34:05 UTC
I don't recall all of the details at the moment, but I think the main concern I had was that the length of the information in the mirby file would cause lots of TCP DNS queries.  The plan was DNS for small/quick things and HTTP for larger bits.

sure enough:

$ host -t txt mirrors.testtxt.khopesh.com
;; Truncated, retrying in TCP mode.
mirrors.testtxt.khopesh.com descriptive text "http://daryl.dostech.ca/sa-update/sare/72_sare_redirect_post3.0.0.cf weight=500"
mirrors.testtxt.khopesh.com descriptive text "http://abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.museum/abcdefghijklmnopqrstuvwxyz/abcdefghijklmnopqrstuvwxyz/this-is-a-ridiculously-long-sa-update-channel-name-with-tons-and-tons-of-text.cf weight=99999999999999999"
[...]



(In reply to comment #3)
> Why does SA even use a mirrors file instead of direct TXT records?  Here's a
> live example with a dozen round-robin TXT entries, including the longest
> real-life mirror I could find plus a ridiculously long example URL.  Am I
> missing some piece of the RFC that prohibits this?  We're already delayed by
> the propagation time of the versions, so that doesn't affect anything...
Comment 5 Justin Mason 2009-06-29 04:26:35 UTC
would this need to be fixed before 3.3.0? if so set pri to P1.
Comment 6 Karsten Bräckelmann 2009-08-19 04:34:52 UTC
Priority 1, not an enhancement. This currently bites again due to an expired third-party mirror domain.
Comment 7 Justin Mason 2009-08-19 05:28:25 UTC
(In reply to comment #6)
> Priority 1, not an enhancement. This currently bites again due to an expired
> third-party mirror domain.

btw, bear in mind that the expired domain (sa-updates.com) are not being used for updates.spamassassin.org, just for third-party rulesets (Daryl's own SARE updates).  so not quite so urgent.  

(I made that mistake myself but caught it before I filed the bug ;)
Comment 8 Justin Mason 2009-08-19 05:29:27 UTC
also, fwiw, I suggest we re-get MIRRORED.BY if it's older than 7 days.
Comment 9 tm 2009-08-19 05:43:07 UTC
I suggest to publish the location of known-good mirror files by DNS via TXT records, so you do one DNS query to find out where the MIRRORED.BY file is, then retrieve it (eg. via HTTP), and go from there.
Comment 10 Karsten Bräckelmann 2009-08-19 05:53:44 UTC
(In reply to comment #7)
> btw, bear in mind that the expired domain (sa-updates.com) are not being used
> for updates.spamassassin.org, just for third-party rulesets (Daryl's own SARE
> updates).  so not quite so urgent.  

I know, and I clearly stated that in comment 6, didn't I? ;)

The reason for Priority 1 (and a normal Severity, mind you) is just according to comment 5, and the fact that this is the second instance where lots of MIRRORED.BY files got invalid and need manual admin intervention.
Comment 11 Justin Mason 2009-08-19 06:10:38 UTC
(In reply to comment #9)
> I suggest to publish the location of known-good mirror files by DNS via TXT
> records, so you do one DNS query to find out where the MIRRORED.BY file is,
> then retrieve it (eg. via HTTP), and go from there.

that, in fact, is exactly what it does right now. ;)  It just doesn't re-retrieve it after it's been retrieved once.
Comment 12 John Hardin 2009-08-19 06:45:01 UTC
(In reply to comment #8)
> also, fwiw, I suggest we re-get MIRRORED.BY if it's older than 7 days.

Minor objection: I have modified my MIRRORED.BY files to use the Coral cache network; having sa-update do this would force me to revisit the MIRRORED.BY files weekly, or set up a cron job to touch them and keep them "fresh" (which would break the welcome self-repairing aspect of this suggestion).

I've added Bug 6181 to make this less of a problem.
Comment 13 Justin Mason 2009-09-15 14:41:11 UTC
some flight time gave me a chance to do this ;)

: 234...; svn commit -m "bug 6083: re-download MIRRRORED.BY files at least once a week, or if 'sa-update --refreshmirrors' switch is used" sa-update.raw 
Sending        sa-update.raw
Transmitting file data .
Committed revision 815500.