6315 – Detect spammy words like drug promos in From: headers

Bug 6315 - Detect spammy words like drug promos in From: headers

Summary: Detect spammy words like drug promos in From: headers

Status:	RESOLVED WONTFIX

Alias:	None

Product:	Spamassassin
Classification:	Unclassified
Component:	Rules (show other bugs)
Version:	3.2.5
Hardware:	All All

Importance:	P5 enhancement
Target Milestone:	Undefined
Assignee:	SpamAssassin Developer Mailing List

URL:
Whiteboard:
Keywords:

Depends on:	6319
Blocks:
	Show dependency tree

Reported:	2010-01-31 04:20 UTC by /cbx
Modified:	2018-08-22 04:26 UTC (History)
CC List:	3 users (show)

Attachment	Type	Modified	Status	Actions	Submitter/CLA Status
sample message containing From: spam	application/octet-stream			None	/cbx
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description /cbx 2010-01-31 04:20:42 UTC

Created attachment 4664 [details]
sample message containing From: spam

Recently we receive increasing amounts of Messages passing sa unharmed because the promo message is hidden inside the From: header. It is, however displayed clearly inside the MUA message list.I suggest adding From: header checks to 20_drugs.cf. Perhaps something like:

header FROM_VIAGRA_URL         From =~ /viagra|cialis|levitra|xanax|tamiflu/i
describe FROM_VIAGRA_URL       From: contains drugs in envelope sender name

Comment 1 Karsten Bräckelmann 2010-02-01 11:07:45 UTC

While not the same request, this is about the very same recent pattern as bug 6317. Candidate for DUPE.

Comment 2 Adam Katz 2010-02-01 15:09:54 UTC

Okay, let's separate the two bugs.

Bug 6315 primarily focuses on spammy text in the From field.
Old name: "New spam type with drugs promo in envelope From: string"
New name: "Detect spammy words like drug promos in From: headers"

Bug 6317 primarily focuses on uri patterns in the From field.
Old name: "Enhancement: include sender text in the message body so body and uri tests can scan it"
New name: "Enable URI testing in From: headers"

That puts half of the scope of bug 6317 into this bug without really affecting this bug's nature.

As to this bug, I'm pretty sure Bayes can handle this, as noted in
bug 6317, comment 1  since (unless somebody corrects me) it tokenizes the subject and sender items with special prefixes so as to differentiate between them and the body.  Therefore, it will be an annoyance at first, but the system will eventually learn that those are spam.  Just make sure you're training on those messages.

Then again, we seem to find these rules worthwhile in the body (where they're more common anyway), so perhaps this has merit after all.  It should certainly be noted that the rules in 20_drugs.cf are *very* specific and quite careful to avoid drugs that are spelled correctly and in decent case.  The rule proposed by comment 0 head straight into that trouble.

Comment 3 Karsten Bräckelmann 2010-02-01 15:25:01 UTC

(In reply to comment #2)
> As to this bug, I'm pretty sure Bayes can handle this, as noted in
> bug 6317, comment 1  since (unless somebody corrects me) it tokenizes the
> subject and sender items with special prefixes so as to differentiate between
> them and the body.

Correct.  Even if one by profession frequently and legitimately discusses certain drugs, the appearance of such words in headers like From are unaffected by that as far as Bayes is concerned.

Comment 4 Adam Katz 2010-02-01 16:04:10 UTC

(In reply to comment #3)
> (In reply to comment #2)
> > As to this bug, I'm pretty sure Bayes can handle this, as noted in
> > bug 6317, comment 1  since (unless somebody corrects me) it tokenizes the
> > subject and sender items with special prefixes so as to differentiate between
> > them and the body.
> 
> Correct.  Even if one by profession frequently and legitimately discusses
> certain drugs, the appearance of such words in headers like From are unaffected
> by that as far as Bayes is concerned.

Now I'm confused.  That answer was ambiguous.

Example:  an incoming message has "From: Viagra Jones <vjones@example.com>" as a header. Bayes is configured to tokenize that, though those tokens are fully independent from tokens collected from other sources like the body or the subject.  The message gets taught as spam, and then later, another incoming message has "From: Victoria Viagra <victoria@example.net>" in its headers.  Bayes then notices that "Viagra" is there and biases accordingly.  That message is also taught as spam.  A third incoming message comes in, this time with "From: Daryl's Drugs <daryl@example.biz>" and "Subject: Viagra tastes great with oatmeal" in its headers.  Bayes does not factor its knowledge of Viagra into the equation because (in this example) it does not exist in the *subject* token list.

Right?

Comment 5 Adam Katz 2010-02-02 11:17:39 UTC

(In reply to comment #4)
> Example:  an incoming message has "From: Viagra Jones <vjones@example.com>" as
> a header. Bayes is configured to tokenize that, though those tokens are fully
> independent from tokens collected from other sources like the body or the
> subject.  The message gets taught as spam, and then later, another incoming
> message has "From: Victoria Viagra <victoria@example.net>" in its headers. 
> Bayes then notices that "Viagra" is there and biases accordingly. ...
> 
> Right?

Apparently wrong due to bug 6319 which explicitly states this is not the case.  I'm marking that bug as a dependency of this one.

Comment 6 Dave Jones 2018-01-28 18:31:09 UTC

This is 8 years old and easily blocked now be default SA scores and rulesets with Bayesian DB training.

Comment 7 Kevin A. McGrail 2018-08-22 04:26:54 UTC

Considered a Bayes or ruleset issue long since addressed