Bug 6200 - Replace use of Digest::SHA1 by Digest::SHA
Replace use of Digest::SHA1 by Digest::SHA
Status: RESOLVED FIXED
Product: Spamassassin
Classification: Unclassified
Component: Libraries
3.3.0
All All
: P5 enhancement
: 3.3.0
Assigned To: SpamAssassin Developer Mailing List
:
Depends on:
Blocks:
  Show dependency tree
 
Reported: 2009-09-14 12:30 UTC by Mark Martinec
Modified: 2009-11-11 09:42 UTC (History)
3 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status
A patch to Razor agents, mentioned in Comment 4 patch None Mark Martinec [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Martinec 2009-09-14 12:30:03 UTC
- perl module Digest::SHA covers and extends the functionality of Digest::SHA1,
  providing SHA hashing for some additional bit sizes (224, 256, 384, 512),
  besides the 160 bit SHA-1; its usage is compatible with Digest::SHA1;

- with perl 5.10.0 the module Digest::SHA became a perl base module, i.e.
  it comes installed with a perl itself (Digest::SHA1 is not a base module,
  and never was);

- the DKIM signature verification requires support of SHA-256 for a compliant
  implementation; module Mail::DKIM (as used by the DKIM plugin) needs
  to use Digest::SHA for this reason, so Digest::SHA is already 'in';

- the following SA modules currently use Digest::SHA1:
  BayesStore/BDB.pm, BayesStore/SQL.pm, BayesStore/DBM.pm,
  Plugin/Bayes.pm, Plugin/Hashcash.pm, (and Util/DependencyInfo.pm)

So, how about switching from Digest::SHA1 to Digest::SHA for a 3.3.0 ?
The required change is a trivial syntactical change.
Comment 1 Warren Togami 2009-09-14 15:34:59 UTC
I wonder how this would effect spamassassin's perl version compatibility?

Older distributions of perl didn't ship with Digest::SHA.  I recently built Digest::SHA for EPEL5 in order to support Mail::DKIM and it seems to be working fine for DKIM at least.  I didn't test it with RHEL-4 yet, which has perl-5.8.5 and the oldest distribution that we possibly would care for spamassassin.
Comment 2 Mark Martinec 2009-09-14 18:18:32 UTC
> I wonder how this would effect spamassassin's perl version compatibility?
> 
> Older distributions of perl didn't ship with Digest::SHA.  I recently built
> Digest::SHA for EPEL5 in order to support Mail::DKIM and it seems to be
> working fine for DKIM at least.  I didn't test it with RHEL-4 yet, which
> has perl-5.8.5 and the oldest distribution that we possibly would care
> for spamassassin.

With versions of perl before the 5.10.0, one has to install one or
the other module anyway, as it does not come with the perl package.
I do not doubt that Digest::SHA installs and works just fine with 5.8.x,
as this is not a new module. What I don't know is whether it is more
difficult to install Digest::SHA compared to Digest::SHA1 on systems
that you mention. On FreeBSD it is very straightforward. The Solaris
CSW packages also have both modules available.
Comment 3 Mark Martinec 2009-10-16 08:39:41 UTC
So, how about replacing the existing:

use Digest::SHA1 qw(sha1);

with:

BEGIN {
  eval { require Digest::SHA1; import Digest::SHA1 qw(sha1); 1 }
  or do { require Digest::SHA; import Digest::SHA qw(sha1) }
}


(or the other way around, trying SHA first and falling back to SHA1)


This way we could lose hard dependency on Digest::SHA1 (which is not a
base module), but retain compatibility with sites which will disable
the DKIM plugin and refuse to install Digest::SHA on pre-10.0 perls.
Comment 4 Mark Martinec 2009-10-30 17:45:58 UTC
I went through code and tests and seems to have fished out all cases.
The only external perl module that I'm aware of that is needed by SA
and still depends on Digest::SHA1 is Razor. I send a patch (similar to
my suggestion in #3) to its maintainer, although I'm not holding breath
to receive any response.

Running it now on our server (with patched razor) without Digest::SHA1.
Comment 5 Mark Martinec 2009-11-06 11:55:37 UTC
Just in case people would worry about a change in performance, I did some
benchmarking. Generally, the sha1 in Digest::SHA has a higher call cost,
but then it runs faster on data, so this means that Digest::SHA is
better for digesting long strings (like generating message id from half
of the body text), while Digest::SHA1 is better at digesting many small
strings, like tokenization in bayes.

Probably the heaviest consumer of sha1 digesting is converting tokens to
hashes in Bayes.pm, so I instrumented it with some statistics gathering
and let it run on real mail, pouring into our mailer. Typically for one
mail message it needs to process 120..200 tokens, and occasionally more.
Typical average token length is between 9 and 14 characters.

The loop that goes through all the tokens and hashes them typically
takes about 1 ms per message, and a difference between Digest::SHA and
Digest::SHA1 is about 0.2 ms per message. So, nothing to worry about,
hardly measurable. If one would want to squeeze out the last drop,
a MD5 hashing should have been be used.
Comment 6 Mark Martinec 2009-11-09 09:38:48 UTC
Bug 6200: let SA use either Digest::SHA or Digest::SHA1,
whichever is available (the Digest::SHA is now a base
module since perl 5.10.0)
Sending        INSTALL
Sending        build/sha1sum.pl
Sending        lib/Mail/SpamAssassin/BayesStore/BDB.pm
Sending        lib/Mail/SpamAssassin/BayesStore/DBM.pm
Sending        lib/Mail/SpamAssassin/BayesStore/SQL.pm
Sending        lib/Mail/SpamAssassin/Plugin/Bayes.pm
Sending        lib/Mail/SpamAssassin/Plugin/Hashcash.pm
Sending        lib/Mail/SpamAssassin/Util/DependencyInfo.pm
Sending        masses/corpora/fuzzy-hash-maildir
Sending        masses/rule-dev/seek-phrases-in-log
Sending        sa-update.raw
Sending        t/mimeparse.t
Sending        t/rule_names.t
Sending        t/sha1.t
Transmitting file data ..............
Committed 0.
Comment 7 Justin Mason 2009-11-09 10:02:50 UTC
(In reply to comment #6)
> Bug 6200: let SA use either Digest::SHA or Digest::SHA1,
> whichever is available (the Digest::SHA is now a base
> module since perl 5.10.0)

hmm, may still need some work, as Hudson is failing to build.
Comment 8 Mark Martinec 2009-11-09 10:04:19 UTC
Bug 6200 cont'd: fixed DependencyInfo.pm
Sending        lib/Mail/SpamAssassin/Util/DependencyInfo.pm
Committed 0.
Comment 9 Mark Martinec 2009-11-11 09:32:42 UTC
Closing. Please reopen if any issues or concerns pop up.
Comment 10 Mark Martinec 2009-11-11 09:42:19 UTC
Created attachment 4570 [details]
A patch to Razor agents, mentioned in Comment 4

For anyone wanting to loose a dependency on Digest::SHA1 in Razor2,
attached is a patch to razor-agents, and below is my mail that I sent to
Richard Soderberg on 2009-10-26 (who seemed to be a current maintainer,
but apparently is not, as I got no response).



Richard,

I wonder, are you still the one maintaining razor-agents?

Considering the:

  https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6200

it seems Razor2 is about the only remaining module used by SpamAssassin 3.3
still depending on a module Digest::SHA1. The rest can live with Digest::SHA,
which is compatible with Digest::SHA1, but provides additional functionality
(e.g. needed by DKIM signature validation/generation).  With perl 5.10.0
the Digest::SHA became a perl base module (while Digest::SHA1 is not a base
module, it needs to be installed separately).

I'm attaching a patch against razor-agents-2.84, which lets it use either
the Digest::SHA or the Digest::SHA1, whichever happens to be available.
I wonder if this can be incorporated into the next release.


On a related note, examining the use of SHA1 one thing pops out and
appears to me like a bug. The code in Signature::Whiplash::whiplash :

        my $sha1 = Digest::SHA1->new();

        $sha1->add($host);
        $sig = substr $sha1->hexdigest, 0, 12;

        $sha1->add($corrected_length);
        $sig .= substr $sha1->hexdigest, 0, 4;

is equivalent to my:

         $sig = substr sha1_hex($host), 0, 12;
         $sig .= substr sha1_hex($corrected_length), 0, 4;


I wonder if this was really what was meant. Seems like the
intention was to do a:

        my $sha1 = Digest::SHA1->new();

        $sha1->add($host);
        $sig = substr $sha1->clone->hexdigest, 0, 12;

        $sha1->add($corrected_length);
        $sig .= substr $sha1->hexdigest, 0, 4;

(note the use of ->clone), which would be equivalent to:

         $sig = substr sha1_hex($host), 0, 12;
         $sig .= substr sha1_hex($host, $corrected_length), 0, 4;

It seems to me that whoever did the coding forgot that the hexdigest
method (and other related methods) destroy the digests state.

If this indeed a bug, fixing it will probably cause incompatibility
in queries between fixed and unfixed clients.

Regards
   Mark