Bug 2037 - RFE: SQL efficiency: keep persistent DB connections (in code with a good license)
Summary: RFE: SQL efficiency: keep persistent DB connections (in code with a good lice...
Status: RESOLVED WONTFIX
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: spamc/spamd (show other bugs)
Version: unspecified
Hardware: Other All
: P5 enhancement
Target Milestone: Future
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on: 3097
Blocks: 4560
  Show dependency tree
 
Reported: 2003-06-09 00:32 UTC by Preston A. Elder
Modified: 2019-07-08 10:26 UTC (History)
3 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status
My Persistent Database connection plugin application/octet-stream None Zhang Shunchang [NoCLA]
the plugin code has been updated application/octet-stream None Zhang Shunchang [NoCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Preston A. Elder 2003-06-09 00:32:40 UTC
Every spamd instance (fork) requires a new connection to be created to the SQL
server, then a query happens, and then the session is disconnected.  This is
quite unnecessary and inefficient.

Spamd should actually connect on startup, disconnect on shutdown, and just do
queries through that open connection.  Though, I realise this is hard when spamd
is forking all over the place (oh for perl to have threads!).

One possible way to do this would be instead would be to fork a new process just
for SQL communications, and then when you start a new spamd process, get it to
communicate with the SQL thread for its communications.  That, or fetch user
preferences BEFORE the fork is performed.

In any case, this can lead to significant load on a system processing a lot of
mail, due to the load not only of a bunch of spamd processes running (as I said,
oh for threads), but also due to a process or thread being launched by the SQL
server to accomodate these new connections.  Then when you bring in table
locking (for MySQL databases), this can indeed slow things down.

Something to consider for future revisions.  I like the fact that spamassassin
intergrates with SQL databases, I find it much easier to manage users via. a SQL
database than using files on disk - especially when my users manage spamassassin
configurations via. a web-based mail client (so everything in this case is
running as the 'www' user), so it saves using FTP to transport around files, or
duplicating configurations, leaving 'flag' files about the place, etc.
Comment 1 Michael Parker 2004-03-05 17:08:46 UTC
Just adding a comment before I forget about it.  In the absence of a pre-fork
server that can have persistent database connections DBD::Proxy might be an
option.  I did a very small about of playing with it today and it at least
worked, so it might be fairly easy to get it to cache some db connections for
us.  It has a pre-fork/threaded server mode already built in and knows how to
handle cached_connect calls.
Comment 2 Theo Van Dinter 2004-04-13 20:30:23 UTC
spamd is now uses a prefork model (as of r10000). :)
Comment 3 Preston A. Elder 2004-05-17 09:55:57 UTC
I noticed in the new 3.0.0 version, there are two 'SQL conf' files, namely:
http://spamassassin.rediris.es/full/3.0.x/dist/lib/Mail/SpamAssassin/ConfSourceSQL.pm
and
http://spamassassin.rediris.es/full/3.0.x/dist/lib/Mail/SpamAssassin/Conf/SQL.pm

What I find interesting is neither stores the DB handle in something like
$self->{dbh}, and then later checks to see if its 'active'.

Whereas, the SQL support for the bayesan stuff here:
http://spamassassin.rediris.es/full/3.0.x/dist/lib/Mail/SpamAssassin/BayesStore/SQL.pm
seems to do precisely that.

All it would conceivably take would be to replace:
my $dbh = DBI->connect($dsn, $dbuser, $dbpass, {'PrintError' => 0});

with:
if (!defined($self->{dbh}))
{
  $self->{dbh} = DBI->connect($dsn, $dbuser, $dbpass, {'PrintError' => 0});
}

And then replace any reference to $dbh with $self->{dbh}.

And of course removing the disconnect.  Even better would be to decouple the DB
connection and disconnection routines alltogether (as is done in the bayesan
stuff).  Even better STILL would be to have all DB functions in the same
'pre-forked process' use the same DB handle (regardless of whether its for
bayesan, configuration, or other purposes).  ie. put the dbh in the 'main'
section (which is where the SQL config stuff is read from anyway).

But the bare minimum of stopping the continual re-connects would be a huge
benefit anyway.
Comment 4 Theo Van Dinter 2004-05-29 09:55:34 UTC
moving to 3.1
Comment 5 Duncan Findlay 2004-12-01 17:32:16 UTC
sql -> assigning to michael
Comment 6 Daniel Quinlan 2005-03-30 01:09:03 UTC
move bug to Future milestone (previously set to Future -- I hope)
Comment 7 Michael Parker 2005-12-14 09:19:47 UTC
I've written a plugin that handles persistent database connection and it can be
found here:

http://wiki.apache.org/spamassassin/DBIPlugin

Please note that it requires SpamAssassin 3.1+ to work correctly.

Closing, this will hopfully eventually make it into the main SpamAssassin codebase.
Comment 8 Justin Mason 2006-04-21 18:18:43 UTC
reopening -- we could still do with a plugin that doesn't use DBI due to its
licensing problems.
Comment 9 Justin Mason 2006-05-26 10:18:49 UTC
This was a suggested idea for the Google Summer of Code 2006;
I'm adding it to the bugzilla for future use, and in case anyone feels
like implementing it.

Subject ID: spamassassin-persistent-db-conns
Keywords: perl, databases, sql
Description: http://issues.apache.org/SpamAssassin/show_bug.cgi?id=2037 :
persistent database connections for SpamAssassin's Bayes subsystem. Michael:
'This exists, but is not an ASL friendly license.  So a "clean room"
implementation might be cool.'
Possible Mentors: Michael Parker (parkerm -at- pobox.com)
Comment 10 Michael Parker 2006-05-29 02:09:20 UTC
Back to dev
Comment 11 Zhang Shunchang 2007-08-20 18:27:56 UTC
Created attachment 4105 [details]
My Persistent Database connection plugin

This is a plugin written during the period of Google Summer Code 2007. I have
done the benchmark with MySQL and PostgreSQL.
Comment 12 Michael Parker 2007-08-25 15:10:12 UTC
The only real thing this needs is to turn the $CONNECT variable into a hash so
that we can have it handle multiple connection strings, but that is a very minor
change.
Comment 13 Zhang Shunchang 2007-08-30 05:43:49 UTC
Created attachment 4111 [details]
the plugin code has been updated
Comment 14 Henrik Krohns 2019-07-08 10:26:46 UTC
Closing old stale bug. Not sure if SQL is already persistent to some degree, but Redis should be used for anything high volume anyway.