Bug 4941 - [review] sa-update doesn't clean up after itself on failure
Summary: [review] sa-update doesn't clean up after itself on failure
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: sa-update (show other bugs)
Version: 3.1.1
Hardware: Other other
: P5 major
Target Milestone: 3.1.4
Assignee: SpamAssassin Developer Mailing List
URL: http://bugs.debian.org/cgi-bin/bugrep...
Whiteboard: can be commited
Keywords:
Depends on:
Blocks: 4997
  Show dependency tree
 
Reported: 2006-06-04 21:50 UTC by spamassassin
Modified: 2006-07-24 08:10 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status
fix patch None Justin Mason [HasCLA]
additional fix patch None Justin Mason [HasCLA]
suggested patch patch None Theo Van Dinter [HasCLA]
suggested patch patch None Theo Van Dinter [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description spamassassin 2006-06-04 21:50:09 UTC
It looks like if sa-update fails, it doesn't clean up the directories it
created. spamassassin then looks in this empty directory for its rules files,
and of course, finds none.

(See bug at provided URL for more detail)
Comment 1 Justin Mason 2006-07-21 08:56:31 UTC
a good explanation from bug 4997:

'The problem is that on the first sa-update run, the new local state directory is
made at the start to hold the temp files, and then when the uncompress fails,
sa-update dies leaving an empty local state dir.  fwiw. :)'
Comment 2 Justin Mason 2006-07-21 14:03:51 UTC
Created attachment 3602 [details]
fix

here's a patch.  it

(a) defers creating the dir until everything's downloaded, gpg-verified, linted
etc.

(b) also attempts a "test rename" to ensure the dir can be written to, uid has
permissions, etc.; if that fails, it rolls back changes to the dir and aborts
the upgrade

(c) makes some "write failed" error cases from try-again warnings into fatal
errors, which they should be

(d) caught a missing closedir().
Comment 3 Justin Mason 2006-07-21 14:05:08 UTC
setting review state for 3.1.x.  checked in as r424325 for trunk.
Comment 4 Theo Van Dinter 2006-07-21 15:07:54 UTC
This could just be my "not quite awake" state, but ...  UPDTmp is under UPDDir:

  my $UPDDir = "$opt{'updatedir'}/$nicechannel";
  my $UPDTmp = "$opt{'updatedir'}/$nicechannel.tmp";

So since UPDTmp is still created at the beginning of the process, $UPDDir will
therefore also be created, and so the problem will still exist.
Comment 5 Justin Mason 2006-07-21 15:22:16 UTC
nope -- UPDTmp ("/path/foo") is parallel to UPDDir ("/path/foo.tmp").  right?
Comment 6 Theo Van Dinter 2006-07-21 15:28:08 UTC
(In reply to comment #5)
> nope -- UPDTmp ("/path/foo") is parallel to UPDDir ("/path/foo.tmp").  right?

Except the problem is that /path (generally called the local state dir) exists. ;)  
Comment 7 Justin Mason 2006-07-21 16:45:10 UTC
oh wait a sec -- I get your point.  UPDTmp is under 'updatedir', rather than
UPDDir, and that's the dir that Mail::SpamAssassin checks for -- so even with
this additional code, it'll still try to use the empty dir (even with just temp
files in it).

good point. :(  ok -- we need an additional fix.

I suggest using '__local_state_dir__/spamassassin/__version__.tmp/$nicechannel'
as the temp dir for '__local_state_dir__/spamassassin/__version__/$nicechannel'.
 sound good?
Comment 8 Theo Van Dinter 2006-07-21 18:02:41 UTC
(In reply to comment #7)
> oh wait a sec -- I get your point.  UPDTmp is under 'updatedir', rather than
> UPDDir, and that's the dir that Mail::SpamAssassin checks for -- so even with
> this additional code, it'll still try to use the empty dir (even with just temp
> files in it).

Yes!  Right, sorry -- wasn't quite awake yet so that previous comment was a bit
off, but the idea got through! :)

> good point. :(  ok -- we need an additional fix.
> 
> I suggest using '__local_state_dir__/spamassassin/__version__.tmp/$nicechannel'
> as the temp dir for '__local_state_dir__/spamassassin/__version__/$nicechannel'.
>  sound good?

I think you mean:

__local_state_dir__.tmp/$nicechannel

since __local_state_dir__ includes "spamassassin/__version__".  :)


I've been thinking about this for a bit (just haven't had a chance to code up a
patch) and what I ended up with is something like this:

- If exists, read in MIRRORED.BY and track the mtime.  Otherwise download a new
one and flag that it was downloaded.
- Create UPDTmp under File::Spec->tmpdir()
- Unarchive the update to UPDTmp, test it, and remove the directory.
- If error, throw error and go onto next channel.
- Clean out $UPDDir or make it if necessary (at this point local_state_dir exists)
- Unarchive the update to UPDDir
- If not downloaded, check for a new MIRRORED.BY file
- Write out MIRRORED.BY file

I think this minimizes the possibility of this problem occuring.
Comment 9 Justin Mason 2006-07-21 18:24:13 UTC
Created attachment 3603 [details]
additional fix

no, here's what I mean ;)

basically it uses '$opt{'updatedir'}.tmp/$nicechannel', which in effect means:

[17852] dbg: channel: update directory
/var/lib/spamassassin/3.002000/updates_spamassassin_org
[17852] dbg: channel: update tmp directory
/var/lib/spamassassin/3.002000.tmp/updates_spamassassin_org

when M:SA goes looking for rules in '/var/lib/spamassassin/3.002000', they now
won't be there until they're ready.
Comment 10 Justin Mason 2006-07-21 18:26:14 UTC
btw, I should point out -- there's a reason I'm attempting to keep the tmpdir as
near as possible to the "finished" location.  It's an attempt to ensure that the
two dirs are on the same filesystem, which minimises the possibility of issues
during the final "mv" step.
Comment 11 Theo Van Dinter 2006-07-21 18:29:24 UTC
(In reply to comment #9)
> basically it uses '$opt{'updatedir'}.tmp/$nicechannel', which in effect means:
> 
> [17852] dbg: channel: update directory
> /var/lib/spamassassin/3.002000/updates_spamassassin_org
> [17852] dbg: channel: update tmp directory
> /var/lib/spamassassin/3.002000.tmp/updates_spamassassin_org
> 
> when M:SA goes looking for rules in '/var/lib/spamassassin/3.002000', they now
> won't be there until they're ready.

Hrm.  My only issue with that is that when UPDTmp is removed,
/var/lib/spamassassin/3.002000.tmp is leftover.  I'd rather just use that as the
temp dir, the heck with $nicechannel.  :)
Comment 12 Justin Mason 2006-07-21 19:04:55 UTC
that's a good point. hmm.  however, if we were to use just
/var/lib/spamassassin/3.002000.tmp, that'd mean that only one "sa-update" can
run at the same time, even for multiple channels.
Comment 13 Theo Van Dinter 2006-07-21 19:19:01 UTC
(In reply to comment #12)
> that's a good point. hmm.  however, if we were to use just
> /var/lib/spamassassin/3.002000.tmp, that'd mean that only one "sa-update" can
> run at the same time, even for multiple channels.

Hrm.  I hadn't thought of people running sa-update separately for multiple
channels, since sa-update can already handle that in a single run (it's serial,
but ...)

I still like my method from comment 8 which fixes the issue, even if you're
running multiple sa-update commands at the same time. :)
Comment 14 Theo Van Dinter 2006-07-22 03:42:14 UTC
(In reply to comment #13)
> I still like my method from comment 8 which fixes the issue, even if you're
> running multiple sa-update commands at the same time. :)

Ok, I hope you don't mind but I ended up spending a few hours this evening
making a bunch of changes to sa-update.  It implements the method from comment
8, strips out some now unused code, and cuts out a bunch of temp files.

I'm making some finishing touches to it and will commit it to 3.2, then make a
patch available here backporting the 3.2 version to 3.1.4.
Comment 15 Theo Van Dinter 2006-07-22 04:20:51 UTC
ok, patch committed to 3.2:

Sending        lib/Mail/SpamAssassin/Util.pm
Sending        sa-update.raw
Transmitting file data ..
Committed revision 424521.

it fixes the issue here, and also another I found which caused update .pre files
to be ignored.  Patch for 3.1 shortly.
Comment 16 Theo Van Dinter 2006-07-22 04:21:30 UTC
Created attachment 3604 [details]
suggested patch
Comment 17 Justin Mason 2006-07-22 11:25:25 UTC
wow!

+1 ;)

there's one issue -- namely that if something stops the files being extracted in
the second taint_safe_archive_extract() call (such as an out-of-space
condition), those files will be left behind in an incomplete state in $UPDDir. 
I'd like to see those rolled back too, for safety.

Also, IMO, it should simply copy the files from the tmpdir, instead of
re-extracting them from the tarball again -- which is inefficient. 
cross_fs_rename() would be useful to keep for that job.

But both of those are icing on the cake, and best considered a separate issue, 
I think... good patch ;)
Comment 18 Theo Van Dinter 2006-07-22 18:17:59 UTC
(In reply to comment #17)
> wow!
> +1 ;)

:)

> there's one issue -- namely that if something stops the files being extracted in
> the second taint_safe_archive_extract() call (such as an out-of-space
> condition), those files will be left behind in an incomplete state in $UPDDir. 
> I'd like to see those rolled back too, for safety.

Fair enough.  I'll put up a new version in a minute which covers this and a few
other related things.  I'm also swapping around "$dir/$file" with
File::Spec->catfile($dir, $file) since we really ought to do that more often. ;)

> Also, IMO, it should simply copy the files from the tmpdir, instead of
> re-extracting them from the tarball again -- which is inefficient. 
> cross_fs_rename() would be useful to keep for that job.

I could go either way I guess.  It seems to be easier, though requires more work
and disk space, to just unarchive again than to go through the process of moving
files from the temp dir.  I liked getting rid of File::Copy as a requirement
though. ;)

BTW: why cross_fs_rename() instead of File::Copy::move() ?

> But both of those are icing on the cake, and best considered a separate issue, 
> I think... good patch ;)

:)  Thanks!  Sorry for coming up from behind and redoing it on you.  This is one
of those bugs that have been floating around in my head for a while, and I just
needed to sit down and focus on getting a patch out.
Comment 19 Theo Van Dinter 2006-07-22 18:28:55 UTC
Created attachment 3605 [details]
suggested patch

A continuation of the last patch -- address JM's issue of cleaning out the
update directory if the extraction fails.  I also cleaned up a few other things
that have been annoying me about sa-update, etc.  Enjoy. :)
Comment 20 Justin Mason 2006-07-23 10:33:39 UTC
'BTW: why cross_fs_rename() instead of File::Copy::move() ?'

ah, missed that ;)
Comment 21 Justin Mason 2006-07-23 10:35:04 UTC
+1 on the newest patch, too
Comment 22 Doc Schneider 2006-07-23 20:33:19 UTC
+1 on the new patch
Comment 23 Theo Van Dinter 2006-07-24 15:10:29 UTC
Sending        lib/Mail/SpamAssassin/Util.pm
Sending        sa-update.raw
Transmitting file data ..
Committed revision 425085.

:)