Bug 5661 - [review] speed up SQL queries by utilizing indexes
Summary: [review] speed up SQL queries by utilizing indexes
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: spamc/spamd (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: All All
: P5 normal
Target Milestone: 3.2.4
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard: go
Keywords:
Depends on:
Blocks:
 
Reported: 2007-09-28 07:38 UTC by Micah
Modified: 2007-12-16 13:30 UTC (History)
0 users



Attachment Type Modified Status Actions Submitter/CLA Status
proposed patch to 3.2.3 patch None Mark Martinec [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Micah 2007-09-28 07:38:26 UTC
The SQL query in /usr/share/perl5/Mail/SpamAssassin/BayesStore/SQL.pm line
243 calculates the expire delta, but in a way that can't use
an index:

   my $sql = "SELECT count(*)
                FROM bayes_token
               WHERE id = ?
                AND (? - atime) > ?";



Changing this as follows would make it so it can utilize the index, thus
examining less rows (which could be locked). Mysql can't look up $newest_atime -
atime from the index because that value is made up, so it uses the "id" portion
of the index on (id, atime)... if it doesn't have to calculate the value for
$newest_atime + $something for every row and only needs to check the index then
it would be much faster.

Index: lib/Mail/SpamAssassin/BayesStore/SQL.pm
===================================================================
--- lib/Mail/SpamAssassin/BayesStore/SQL.pm     (revision 579950)
+++ lib/Mail/SpamAssassin/BayesStore/SQL.pm     (working copy)
@@ -241,7 +241,7 @@
   my $sql = "SELECT count(*)
                FROM bayes_token
               WHERE id = ?
-                AND (? - atime) > ?";
+                AND atime < ?";

   my $sth = $self->{_dbh}->prepare_cached($sql);

@@ -251,7 +251,7 @@
   }

   for (my $i = 1; $i <= $max_expire_mult; $i<<=1) {
-    my $rc = $sth->execute($self->{_userid}, $newest_atime, $start * $i);
+    my $rc = $sth->execute($self->{_userid}, $newest_atime - $start * $i);

     unless ($rc) {
       dbg("bayes: calculate_expire_delta: SQL error:
".$self->{_dbh}->errstr());

Thanks to Mark Martinec and Nils for the patch and analysis.
Comment 1 Mark Martinec 2007-09-28 11:52:10 UTC
Committed to trunk:
  $ svn -m 'simplified SELECT FROM bayes_token to be able
            to use index, see bug 5661' ci
Sending lib/Mail/SpamAssassin/BayesStore/SQL.pm
Committed revision 580455.

This seems simple enough, tentatively setting target to 3.2.4.
Time for a review?
Comment 2 Justin Mason 2007-10-02 09:52:38 UTC
Mark, could you create a patch against 3.2.4, and attach it?  I'm guessing it's
the one that's inline in the first comment, but it's better to be explicit.
Comment 3 Mark Martinec 2007-10-02 10:07:20 UTC
Created attachment 4141 [details]
proposed patch to 3.2.3
Comment 4 Justin Mason 2007-10-07 11:23:51 UTC
+1
Comment 5 Daryl C. W. O'Shea 2007-11-06 13:30:19 UTC
+1
Comment 6 Justin Mason 2007-12-16 13:30:16 UTC
committed to 3.2.x: r604713