SA Bugzilla – Bug 6315
Detect spammy words like drug promos in From: headers
Last modified: 2018-08-22 04:26:54 UTC
Created attachment 4664 [details] sample message containing From: spam Recently we receive increasing amounts of Messages passing sa unharmed because the promo message is hidden inside the From: header. It is, however displayed clearly inside the MUA message list.I suggest adding From: header checks to 20_drugs.cf. Perhaps something like: header FROM_VIAGRA_URL From =~ /viagra|cialis|levitra|xanax|tamiflu/i describe FROM_VIAGRA_URL From: contains drugs in envelope sender name
While not the same request, this is about the very same recent pattern as bug 6317. Candidate for DUPE.
Okay, let's separate the two bugs. Bug 6315 primarily focuses on spammy text in the From field. Old name: "New spam type with drugs promo in envelope From: string" New name: "Detect spammy words like drug promos in From: headers" Bug 6317 primarily focuses on uri patterns in the From field. Old name: "Enhancement: include sender text in the message body so body and uri tests can scan it" New name: "Enable URI testing in From: headers" That puts half of the scope of bug 6317 into this bug without really affecting this bug's nature. As to this bug, I'm pretty sure Bayes can handle this, as noted in bug 6317, comment 1 since (unless somebody corrects me) it tokenizes the subject and sender items with special prefixes so as to differentiate between them and the body. Therefore, it will be an annoyance at first, but the system will eventually learn that those are spam. Just make sure you're training on those messages. Then again, we seem to find these rules worthwhile in the body (where they're more common anyway), so perhaps this has merit after all. It should certainly be noted that the rules in 20_drugs.cf are *very* specific and quite careful to avoid drugs that are spelled correctly and in decent case. The rule proposed by comment 0 head straight into that trouble.
(In reply to comment #2) > As to this bug, I'm pretty sure Bayes can handle this, as noted in > bug 6317, comment 1 since (unless somebody corrects me) it tokenizes the > subject and sender items with special prefixes so as to differentiate between > them and the body. Correct. Even if one by profession frequently and legitimately discusses certain drugs, the appearance of such words in headers like From are unaffected by that as far as Bayes is concerned.
(In reply to comment #3) > (In reply to comment #2) > > As to this bug, I'm pretty sure Bayes can handle this, as noted in > > bug 6317, comment 1 since (unless somebody corrects me) it tokenizes the > > subject and sender items with special prefixes so as to differentiate between > > them and the body. > > Correct. Even if one by profession frequently and legitimately discusses > certain drugs, the appearance of such words in headers like From are unaffected > by that as far as Bayes is concerned. Now I'm confused. That answer was ambiguous. Example: an incoming message has "From: Viagra Jones <vjones@example.com>" as a header. Bayes is configured to tokenize that, though those tokens are fully independent from tokens collected from other sources like the body or the subject. The message gets taught as spam, and then later, another incoming message has "From: Victoria Viagra <victoria@example.net>" in its headers. Bayes then notices that "Viagra" is there and biases accordingly. That message is also taught as spam. A third incoming message comes in, this time with "From: Daryl's Drugs <daryl@example.biz>" and "Subject: Viagra tastes great with oatmeal" in its headers. Bayes does not factor its knowledge of Viagra into the equation because (in this example) it does not exist in the *subject* token list. Right?
(In reply to comment #4) > Example: an incoming message has "From: Viagra Jones <vjones@example.com>" as > a header. Bayes is configured to tokenize that, though those tokens are fully > independent from tokens collected from other sources like the body or the > subject. The message gets taught as spam, and then later, another incoming > message has "From: Victoria Viagra <victoria@example.net>" in its headers. > Bayes then notices that "Viagra" is there and biases accordingly. ... > > Right? Apparently wrong due to bug 6319 which explicitly states this is not the case. I'm marking that bug as a dependency of this one.
This is 8 years old and easily blocked now be default SA scores and rulesets with Bayesian DB training.
Considered a Bayes or ruleset issue long since addressed