|
SA Bugzilla – Full Text Bug Listing |
Summary: | New DCC.pm improves performance, is more resilient. | ||
---|---|---|---|
Product: | Spamassassin | Reporter: | Michael Scheidell <michael> |
Component: | Plugins | Assignee: | SpamAssassin Developer Mailing List <dev> |
Status: | RESOLVED FIXED | ||
Severity: | blocker | CC: | apache, kmcgrail, sidney, vjs |
Priority: | P2 | ||
Version: | 3.3.2 | ||
Target Milestone: | 3.4.0 | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
Bug Depends on: | |||
Bug Blocks: | 6699 | ||
Attachments: |
/tmp/DCCpatch.txt and DCC.pm
the committed patch (i.e.: svn diff -c1201384) use -Q instead of -Qmany on incoming mail with existing X-DCC headers |
Description
Michael Scheidell
2011-11-09 21:47:24 UTC
This is quite a hefty change. I updated it for the 3.4 (added the few changes that accumulated between 3.3.1 and the current trunk, plus few minor details), and it seems to run fine at our site. I haven't tested all possible scenarios though. The 3.4.0 is probably the best time to accept such a change (if ever) from the author himself, so I'm tentatively setting the target to 3.4.0. If others agree, I'd like to commit it to trunk, making it easier for everybody to test. Are there any CLA concerns? (In reply to comment #1) > This is quite a hefty change. I updated it for the 3.4 (added the few changes > that accumulated between 3.3.1 and the current trunk, plus few minor details), > and it seems to run fine at our site. I haven't tested all possible scenarios > though. The 3.4.0 is probably the best time to accept such a change (if ever) > from the author himself, so I'm tentatively setting the target to 3.4.0. > > If others agree, I'd like to commit it to trunk, making it easier for > everybody to test. Are there any CLA concerns? A) I think this is perfect for 3.4.0 B) There *are* CLA concerns in my opinion. I will email Vernon. Regards, KAM I respect Vernon's unhappiness with the documentation's statement that DCC is not "open source" but I disagree with his replacement wording for why the plugin is not enabled by default. I propose the following instead. I don't want to upload an entirely new patch. I don't want to make the change in the existing source code for DCC.pm because that would just cause difficulties when it becomes time to apply this extensive patch. The portion of his patch that I disagree with: -Note that DCC is disabled by default in C<init.pre> because it is not -open source. See the DCC license for more details. +See http://www.dcc-servers.net/dcc/ for more information about DCC. -See http://www.rhyolite.com/anti-spam/dcc/ for more information about -DCC. +DCC is disabled by default in C<confdir/v310.pre> because it is free for +only some but not all commercial users. See the DCC license for the free +version at http://www.dcc-servers.net/dcc/LICENSE. The wording that I would put there: Note that DCC is disabled by default in C<init.pre> because its license does not meet the Apache Software Foundation's licensing criteria for third-party software. See the DCC license at http://www.dcc-servers.net/dcc/LICENSE for more details and the ASF criteria at http://www.apache.org/legal/resolved.html#criteria I missed Vernon's change from init.pre to v310.pre, which is more correct. As the comments in the pre files say, a line for a plugin goes in the pre file for the version in which it was first released. This also means that the wording in the comment in v310.pre should be changed to match whatever ends up in DCC.pm I've been convinced by some off-list emails that it is not be true to say that just because the optional DCC.pm plugin requires the dcc software to work after it is enabled, that enabling the plugin in SpamAssassin without distributing the dcc software would violate the ASF third-party software criteria. The ASF policy that the criteria relate to says http://www.apache.org/legal/resolved.html#optional "Can Apache projects rely on components whose licensing affects the Apache product? Apache projects cannot distribute any such components. However, if the component is only needed for optional features, a project can provide the user with instructions on how to obtain and install the non-included work. Optional means that the component is not required for standard use of the product or for the product to achieve a desirable level of quality. The question to ask yourself in this situation is: 'Will the majority of users want to use my product without adding the optional components?'" That says we cannot distribute the dcc software. We can distribute DCC.pm and make it an optional feature. However, I agree with not enabling the plugin by default. It is just plain ugly to have it enabled when it cannot work because required software is not installed. Here is another proposed wording: Note that DCC is disabled by default in C<v310.pre> because its use requires software that is not distributed with SpamAssassin and that has license restrictions for certain commercial uses. See the DCC license at http://www.dcc-servers.net/dcc/LICENSE for details. Mark, a CLA has been registered for Vernon, so the patch is ready for you to commit. Vernon agreed, in an email, to my wording in comment #5, so please use that. If you want to keep in a line about "for more information about DCC" in addition to the link to the license for DCC, make sure that you use the URL http://www.dcc-servers.net/dcc/ I'm agnostic about whether you really need to include that link, since it will be obvious to anyone given the license at http://www.dcc-servers.net/dcc/LICENSE that they can find more information on dcc in the parent URL. On the other hand, including that link doesn't hurt, so I would agree with either way. > Mark, a CLA has been registered for Vernon, so the patch is ready for you to > commit. Great, thanks! > Vernon agreed, in an email, to my wording in comment #5, so please use that. Ok, done. > If you want to keep in a line about "for more information about DCC" in > addition to the link to the license for DCC, make sure that you use the URL > http://www.dcc-servers.net/dcc/ > I'm agnostic about whether you really need to include that link, since it will > be obvious to anyone given the license at > http://www.dcc-servers.net/dcc/LICENSE that they can find more information on > dcc in the parent URL. On the other hand, including that link doesn't hurt, so > I would agree with either way. I left the ref to http://www.dcc-servers.net/dcc/ in, although I wouldn't mind either way. trunk: Bug 6698: New DCC.pm improves performance, is more resilient Sending lib/Mail/SpamAssassin/Plugin/DCC.pm Committed revision 1201384. Please don't close just yet, I'd still like to make a closer look, although it appears alright. Additional testing and code review is welcome if anyone feels like it. Created attachment 5011 [details]
the committed patch (i.e.: svn diff -c1201384)
The diff that was committed - for ease of inspection if anyone
cares. With minor changes from the original proposed patch (updates
for 3.3.1 -> 3.4, some paragraph alignment, and some trivialities).
There is a minor issue with the committed change. Line 523 is now $pgmpath = '/var/dcc/' . $pgm; I think it should be the value that can see in http://www.dcc-servers.net/dcc/dcc-tree/misc/DCC.pm $pgmpath = '/usr/local/dcc/' . $pgm; Or the /var/dcc/ line could be kept and these 2 lines added $pgmpath = '/usr/local/dcc/' . $pgm; return $pgmpath if -x $pgmpath; The intent in using /usr/local/dcc was to look in a common choice for the DCC `./configure --bindir=DIR` value. The default value is /usr/local/bin I apologize for not noticing this issue before the change was committed. A separate issue is what should be done with the copy of DCC.pm that has been in versions of DCC for the last 18 months. It's always asking for trouble to have 2 copies of a value, not to mention code. I could remove the copy in the DCC source after the SpamAssassin version is released. When might that be? I expect to make a DCC release in the next few months. Deleting the copy of DCC.pm in the DCC source would inconvenience installations that are looking in the DCC tarball for DCC.pm. Some of them are using the fact that the DCC version of DCC.pm is actually generated by the DCC ./configure script. See http://www.dcc-servers.net/dcc/dcc-tree/misc/DCC.pm.in Do you have any suggestions? > There is a minor issue with the committed change. Line 523 is now > $pgmpath = '/var/dcc/' . $pgm; > I think it should be the value that can see in > http://www.dcc-servers.net/dcc/dcc-tree/misc/DCC.pm > $pgmpath = '/usr/local/dcc/' . $pgm; > Or the /var/dcc/ line could be kept and these 2 lines added > $pgmpath = '/usr/local/dcc/' . $pgm; > return $pgmpath if -x $pgmpath; > > The intent in using /usr/local/dcc was to look in a common > choice for the DCC `./configure --bindir=DIR` value. > The default value is /usr/local/bin I left in both, added some debugging, and streamlined a bit. Please see what came out (svn diff -c1201835) : trunk: Bug 6698#9: New DCC.pm improves performance, is more resilient' Sending lib/Mail/SpamAssassin/Plugin/DCC.pm Committed revision 1201835. > A separate issue is what should be done with the copy of DCC.pm > that has been in versions of DCC for the last 18 months. > It's always asking for trouble to have 2 copies of a value, Agree. As the DCC.pm plugin is only usable with SpamAssassin, and has been distributed with it since I remember, I see no point of having another copy of this plugin distributed with dcc. > not to mention code. I could remove the copy in the DCC source > after the SpamAssassin version is released. When might that be? Soon ;) > I expect to make a DCC release in the next few months. > Deleting the copy of DCC.pm in the DCC source would inconvenience > installations that are looking in the DCC tarball for DCC.pm. > Some of them are using the fact that the DCC version of DCC.pm > is actually generated by the DCC ./configure script. See > http://www.dcc-servers.net/dcc/dcc-tree/misc/DCC.pm.in > Do you have any suggestions? I think it is mostly up to you to decide. I'd ditch the DCC.pm.in . > I left in both, added some debugging, and streamlined a bit.
> Please see what came out (svn diff -c1201835) :
Because
$conf->{dcc_home}
can be undefined, I think you really need something like
my $home = $conf->{dcc_home} || '/var/dcc';
and then using $home instead of $conf->{dcc_home} in line 516.
If the user has not set $conf->{dcc_home}, won't you at best
get undesired results from such as
($conf->{dcc_home}.'/bin'
There is a similar prlbme with the use of $conf->{dcc_libexec}
in line 516 that will cause surprises or perl interpreter
croaking in line 518 if $conf->{dcc_libexec} is not defined by
the user.
I don't understand how some of the changes to regular
expressions or replacing
if ($x) {
with
if (defined $x && $x ne '') {
qualifies as streamlining, but there's no accounting for
tastes and it's now your code.
Never mind what I just wrote about $conf->{dcc_home} and $conf->{dcc_libexec} possibly being undefined. I now think they're always set to something in ck_dir(). Making this a blocker for 3.4.0 as it contains code that needs a CLA which is in progress. (in reply to comment #13) Kevin, we have a CLA for Vernon. An acknowledgment of receipt was sent to the PMC, and Vernon's name appears on the committers page in the list of non-committers who have sent a CLA. Is there any other CLA that this patch depends on? (In reply to comment #14) > (in reply to comment #13) > > Kevin, we have a CLA for Vernon. An acknowledgment of receipt was sent to the > PMC, and Vernon's name appears on the committers page in the list of > non-committers who have sent a CLA. Is there any other CLA that this patch > depends on? You are correct but there are two CLAs in the works and I believe we are still waiting on the Corporate CLA for Rhyolite. I corresponded w/ Craig Russel at 11:51AM EST today (about 5 hours ago) and he was waiting on a corrected version as it was on the wrong form. I emailed Vernon to ping him as well. If you review emails to PMC you'll note Subject 1: Re: Individual Contributor License Agreement from Vernon Schryver vs Subject 2: Re: Individual Contributor License Agreement from Rhyolite Software regards, KAM Please also forget my comment about !$x versus defined $x && $x ne '' I now realize that the intent was probably to allow files and directories named '0'. (In reply to comment #16) > Please also forget my comment about !$x versus > defined $x && $x ne '' > I now realize that the intent was probably to > allow files and directories named '0'. That's a good edge case actually but you can also have a variable defined but empty. So this is a test to see is it defined and is it empty. Unfortunately, with parameter sanitization, this case comes up more than it should. (in reply to comment #15) Ah, I interpreted the two emails as that he first sent an Individual CLA form filled out for himself with his DBA business name, then corrected it by sending an ICLA form filled out as an individual. You are saying that we also need a corporate CLA from Rhyolite Software. (In reply to comment #18) > (in reply to comment #15) > > Ah, I interpreted the two emails as that he first sent an Individual CLA form > filled out for himself with his DBA business name, then corrected it by sending > an ICLA form filled out as an individual. You are saying that we also need a > corporate CLA from Rhyolite Software. Yes. And that second CLA is winging it's way through the process now. (in reply to comment #19) And it has arrived. All CLA issues are now completed. (In reply to comment #20) > (in reply to comment #19) > > And it has arrived. All CLA issues are now completed. I've updated Vernon's CLA status in bugzilla. Closing this bug as resolved/fixed but feel free to reopen if I missed more things in discussion. > I think you really need something like > my $home = $conf->{dcc_home} || '/var/dcc'; > and then using $home instead of $conf->{dcc_home} in line 516. Sure, if you want. I removed the "my $home=..." from your code as it was not used anywhere. > I don't understand how some of the changes to regular > expressions or replacing > if ($x) { > with > if (defined $x && $x ne '') { > qualifies as streamlining I was referring to 5 unrolled copies of: $pgmpath = ... return $pgmpath if (-x $pgmpath); being replaced by a foreach loop. > Never mind what I just wrote about $conf->{dcc_home} > and $conf->{dcc_libexec} possibly being undefined. > I now think they're always set to something in ck_dir(). I didn't check this. If we are not sure we can put back the test for defined() back. I only checked that $conf->{dcc_home} already is untainted by the config parsing code, so the call to untaint_file_path was redundant. > Please also forget my comment about !$x versus > defined $x && $x ne '' > I now realize that the intent was probably to > allow files and directories named '0'. Exactly! This is one of my pet peeves! Around me it is not uncommon that some throwaway/test program is kept on a file named "0". Surprisingly often this reveals a bug in some program. In the same vein we had a bug in SpamAssassin, where a "Subject: 0" was marked as a missing Subject header field! This is described under Bug 5965: - header field with a value of "0" (zero) invisible; - do not treat user data as perl booleans (a string "0" is a false); - continue work on avoiding user data to be tested as perl booleans, instead test for defined or for an empty string as appropriate; Created attachment 5024 [details]
use -Q instead of -Qmany on incoming mail with existing X-DCC headers
This patch originally from Herbert J. Skuhra fixes a significant problem
seen with mail with pre-existing X-DCC headers.
> Created attachment 5024 [details] > use -Q instead of -Qmany on incoming mail with existing X-DCC headers > This patch originally from Herbert J. Skuhra fixes a significant problem > seen with mail with pre-existing X-DCC headers. Thanks, committed: Bug 6698 #23 - use -Q instead of -Qmany on incoming mail with existing X-DCC headers Sending lib/Mail/SpamAssassin/Plugin/DCC.pm Committed revision 1224826. Fixed the annoying uninitialized dcc_libexec when cdcc binary is not in path. All the path search stuff here is too confusing to make a more "elegant" fix. Also a pointless $dcc_libexec removed. --- DCC.pm (revision 1307734) +++ DCC.pm (working copy) @@ -425,7 +425,6 @@ sub find_dcc_home { my ($self) = @_; - my $dcc_libexec; # just once return if defined $self->{dcc_version}; @@ -515,6 +514,7 @@ # try dcc_home/bin, dcc_libexec, and some desperate last attempts foreach my $dir ($conf->{dcc_home}.'/bin', $conf->{dcc_libexec}, '/usr/local/bin', '/usr/local/dcc', '/var/dcc') { + next unless defined $dir; $pgmpath = $dir . '/' . $pgm; if (-x $pgmpath) { dbg("dcc: dcc_pgm_path, found %s in %s: %s", $pgm,$dir,$pgmpath); Sending DCC.pm Transmitting file data . Committed revision 1308124. There is a bug in this version of DCC.pm that is fixed with this change: --- DCC.pm.old 2013-09-04 14:15:32.000000000 +0000 +++ DCC.pm 2013-09-04 14:15:14.000000000 +0000 @@ -825,8 +825,10 @@ $opts = $self->{dccifd_report_options} } else { $opts = $self->{dccifd_lookup_options}; - # only query if there is an X-DCC header - $opts =~ s/grey-off/& query/ if defined $permsgstatus->{dcc_raw_x_dcc}; + if defined $permsgstatus->{dcc_raw_x_dcc} { + # only query if there is an X-DCC header + $opts =~ s/grey-off/grey-off query/ + } } $sock->print($opts) or die "failed write options\n"; $sock->print($client . "\n") or die "failed write SMTP client\n"; (In reply to Vernon Schryver from comment #26) > There is a bug in this version of DCC.pm that is fixed with this change: Thanks for noticing and the fix! Applied. trunk: Bug 6698#26: DCC dcc_raw_x_dcc fixed Sending lib/Mail/SpamAssassin/Plugin/DCC.pm Committed revision 1520091. |