Bug 6452 - __HK_LOTTO_BALLOT too broad
Summary: __HK_LOTTO_BALLOT too broad
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Rules (show other bugs)
Version: 3.3.1
Hardware: PC Linux
: P2 normal
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-06-15 15:23 UTC by Andrew Daviel
Modified: 2011-10-31 19:19 UTC (History)
4 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status
Election notice from APS application/octet-stream None Andrew Daviel [NoCLA]
Election notice from APS application/octet-stream None Andrew Daviel [NoCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Daviel 2010-06-15 15:23:12 UTC
HK_LOTTO currently scores 3.599 2.755 2.993 3.599
One component is __HK_LOTTO_BALLOT
which matches "on-line ballot" in the body

This is way too high a score for such a broad pattern.
It falsely scored an election announcement from American Physical Society
Comment 1 AXB 2010-06-15 16:31:42 UTC
(In reply to comment #0)
> HK_LOTTO currently scores 3.599 2.755 2.993 3.599
> One component is __HK_LOTTO_BALLOT
> which matches "on-line ballot" in the body
> 
> This is way too high a score for such a broad pattern.
> It falsely scored an election announcement from American Physical Society

Pls attach a raw sample message including full SA report header
(munge rcpt address)
Comment 2 Andrew Daviel 2010-06-15 21:06:05 UTC
Created attachment 4778 [details]
Election notice from APS
Comment 3 John Hardin 2010-06-15 21:15:06 UTC
Just FYI, that rule may also overlap some things I'm doing in LOTSA_MONEY...

Andrew, do you have any objections to putting that sample into my ham corpus?
Comment 4 Andrew Daviel 2010-06-15 21:51:00 UTC
(In reply to comment #3)
> Just FYI, that rule may also overlap some things I'm doing in LOTSA_MONEY...
> 
> Andrew, do you have any objections to putting that sample into my ham corpus?

For myself, no, though perhaps further obfuscate triumf.ca to example.com or xxxx.

Perahps one should ask Ken Cole at APS. I am unfamiliar with standard practise in these cases. It's obviously not a personal message but one widely distributed to APS members and I can't see that it contains anything sensitive, except perhaps the contact address.
Comment 5 Karsten Bräckelmann 2010-06-16 09:00:48 UTC
(In reply to comment #3)
> Andrew, do you have any objections to putting that sample into my ham corpus?

Attaching the sample to this bug report already made it public. What objection could there possibly be to use it in a local, non-published corpus?
Comment 6 John Hardin 2010-06-16 09:21:22 UTC
(In reply to comment #5)
> (In reply to comment #3)
> > Andrew, do you have any objections to putting that sample into my ham corpus?
> 
> Attaching the sample to this bug report already made it public. What objection
> could there possibly be to use it in a local, non-published corpus?

It would actually go into my uploaded corpus. I currently don't do local checks and upload just the results. That's why I asked.
Comment 7 Karsten Bräckelmann 2010-06-16 09:26:24 UTC
(In reply to comment #6)
> It would actually go into my uploaded corpus. I currently don't do local checks
> and upload just the results. That's why I asked.

That still is not public (unlike this bug report and its attachments!), and access is strictly limited to SA devs.
Comment 8 Andrew Daviel 2010-06-17 16:50:05 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > (In reply to comment #3)
> > > Andrew, do you have any objections to putting that sample into my ham corpus?
> > 
> > Attaching the sample to this bug report already made it public. What objection
> > could there possibly be to use it in a local, non-published corpus?
> 
> It would actually go into my uploaded corpus. I currently don't do local checks
> and upload just the results. That's why I asked.

As has been pointed out, I have already made it public (or more public than on a wide mailout), so I'd say go ahead, if debating the issue will hold things up.
The only reason (other than carelessness) that I had not redacted more PII is that some tools like the Razor plugin depend on an unmodified message body.
Comment 9 John Hardin 2010-06-18 11:22:17 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > Just FYI, that rule may also overlap some things I'm doing in LOTSA_MONEY...
> > 
> > Andrew, do you have any objections to putting that sample into my ham corpus?
> 
> For myself, no, though perhaps further obfuscate triumf.ca to example.com or
> xxxx.

I note you've replaced the member IDs and codes with descriptive text or "xxxxx". Those sort of things in a message can also be rule fodder - could I ask you to re-sanitize the original message and instead of what you did, just change the codes while retaining their format? For example, if the APS member ID is a string of numbers and letters, change the numbers to different numbers and the letters to different letters.

If you're willing to do that, thanks; if not, I understand.
Comment 10 Andrew Daviel 2010-06-18 17:46:58 UTC
Created attachment 4781 [details]
Election notice from APS

Sanitised as follows, keeping length of original fields:
Replace recipient uid with "userxx"
Replace APS member ID with different digits
Replace APS PI code with different letters, digits
Replace APS executive personal name with "John Doe"
Replace APS executive personal address with "someuserxx"
Replace APS executive phone number with 555-1234
Comment 11 John Hardin 2010-06-18 21:30:00 UTC
Thanks. Uploaded to nightly masscheck ham corpora.
Comment 12 Kevin A. McGrail 2011-10-31 19:19:24 UTC
Scores are now generating much lower on the overall meta and can be considered resolved.
72_scores.cf:score HK_LOTTO_NAME                         0.999 0.042 0.999 0.042