|
SA Bugzilla – Full Text Bug Listing |
Summary: | RFE: pluginize the Bayes subsystem | ||
---|---|---|---|
Product: | Spamassassin | Reporter: | Justin Mason <jm> |
Component: | Libraries | Assignee: | SpamAssassin Developer Mailing List <dev> |
Status: | RESOLVED FIXED | ||
Severity: | enhancement | ||
Priority: | P5 | ||
Version: | SVN Trunk (Latest Devel Version) | ||
Target Milestone: | 3.3.0 | ||
Hardware: | Other | ||
OS: | other | ||
Whiteboard: | |||
Bug Depends on: | |||
Bug Blocks: | 5686 | ||
Attachments: | svn diff for review |
Description
Justin Mason
2007-01-11 06:09:56 UTC
I'm going to do this, since I'm planning to investigate an alternative pluggable classifier for 3.3.0... Would you mind posting your hooks and plans as you go along (a full blow spec if you feel like it :))? I'm very keen on this as well. Thanks (In reply to comment #2) > Would you mind posting your hooks and plans as you go along (a full blow spec if > you feel like it :))? I'm very keen on this as well. full-blown spec! not a hope ;) here's the hooks I've got so far, though: sanity_check_bayes_is_untied learn_message forget_message learner_sync learner_expire_old_training learner_is_scan_available learner_dump_database learner_get_implementation They match up with the methods in Bayes.pm, pretty much. So it's a very high-level pluginization; I figure if different learner backends wind up sharing code, it'll be easier to refactor and do that "behind the scenes", than try to design it upfront into the plugin hooks, without having the benefit of writing an alternative backend to learn from. Also, in my experience writing the first few drafts of Bayes.pm, the storage requirements for different backends were radically different, so code sharing may just not be possible :( some notes on those hooks: 'sanity_check_bayes_is_untied': I'm not sure if this should still be exposed as a hook, to be honest. it probably needs a better name anyway. 'learner_get_implementation': this is definitely a hack; it's needed for the test suite to work. it probably doesn't need to be a public hook, since it can be kept undocumented ;) 'learner_expire_old_training', 'learner_sync': backends for which this makes no sense, can of course just "return 1" for these. ok, check out https://svn.apache.org/repos/asf/spamassassin/branches/bug-5293-pluginized-bayes -- I've just checked in a first blast at this. All the bayes*.t tests pass, so it seems functional... regarding the memory-footprint reduction when the Bayes plugin is off: here's ps axuww output with bayes on: jm 7651 40732:36992 0 pts/7 0:00 spamd child and with the Bayes plugin commented: jm 7678 39500:35812 0 pts/7 0:00 spamd child not exactly massive. still, there's probably ways to reduce that further... I've now merged this up to current SVN trunk's HEAD, and added documentation to Plugin.pm for the new APIs. There are a few minor updates: - 'sanity_check_bayes_is_untied': I renamed this to the much clearer 'learner_close', which is, it turns out, what it should be used for. ;) It's just our basic impl that optimises things to such an extent that it's actually used as just a sanity check. - 'learner_get_implementation' is less thorny now, since it is only used in the bayes-specific test cases. - the {bayes} member in the Mail::SpamAssassin::BayesStore classes is now a ref to a Mail::SpamAssassin::Plugin::Bayes object, not a Mail::SpamAssassin::Bayes. - there are duplicates of the Mail::SpamAssassin::BayesStore classes under Mail/SpamAssassin/OSBF/Store, used to provide storage for the experimental OSBF classifier (bug 5686). if we get OSBF working ok, it would be nice to refactor to share more code. no biggie. - there are config settings for the new, experimental OSBF code commented out in Conf.pm. Otherwise it's unchanged. I propose to merge this into trunk and get rid of this branch... speak up if you're not keen ;) I'll attach the changes in patch form for review. Created attachment 4203 [details] svn diff for review this is a (slightly rearranged) copy of svn diff https://svn.apache.org/repos/asf/spamassassin/trunk https://svn.apache.org/repos/asf/spamassassin/branches/bug-5293-pluginized-bayes I'd forgotten about this -- I thought it was already in 3.3.0! so I've merged up to latest trunk HEAD, again, as of r613117; and then merged the Bayes-only parts of that, omitting the bug 5686 (OSBF) parts, into trunk. That's now in trunk as : jm 130...; svn commit -m "bug 5923: pluginize the Bayes subsystem, allowing replacement of our probabilistic classifier with alternative learning plugins. Merged from the bug-5293-pluginized-bayes branch" Sending lib/Mail/SpamAssassin/Bayes.pm Sending lib/Mail/SpamAssassin/BayesStore/DBM.pm Sending lib/Mail/SpamAssassin/BayesStore/SQL.pm Sending lib/Mail/SpamAssassin/BayesStore.pm Sending lib/Mail/SpamAssassin/Conf.pm Sending lib/Mail/SpamAssassin/PerMsgLearner.pm Sending lib/Mail/SpamAssassin/PerMsgStatus.pm Sending lib/Mail/SpamAssassin/Plugin/Bayes.pm Sending lib/Mail/SpamAssassin/Plugin/WLBLEval.pm Sending lib/Mail/SpamAssassin/Plugin.pm Sending lib/Mail/SpamAssassin.pm Sending masses/bayes-testing/bayes-10pcv-driver Sending rules/23_bayes.cf Sending t/bayesdbm.t Sending t/bayesdbm_flock.t Sending t/bayessdbm.t Sending t/bayessdbm_seen_delete.t Sending t/bayessql.t Transmitting file data .................. Committed revision 613124. |