Bug 5491 - [review] MIME_QP_LONG_LINE triggering on valid email
[review] MIME_QP_LONG_LINE triggering on valid email
Status: RESOLVED FIXED
Product: Spamassassin
Classification: Unclassified
Component: Rules
3.2.0
Other other
: P5 normal
: 3.3.2
Assigned To: SpamAssassin Developer Mailing List
ready to commit
:
Depends on:
Blocks:
  Show dependency tree
 
Reported: 2007-05-31 19:32 UTC by Jason Haar
Modified: 2011-05-06 00:19 UTC (History)
5 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status
email demonstrating bad QP text/plain None Jason Haar [NoCLA]
Proposed patch to change a QP line limit from 76 to 78 patch None Mark Martinec [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Jason Haar 2007-05-31 19:32:02 UTC
I know the RFCs say any email that contains QP is meant to have 76char line
limits, but it appears to me a lot of email apps developers don't read the RFCs.

I have noticed this rule triggering a lot here: 6% of email classified as HAM by
SA triggered the  MIME_QP_LONG_LINE rule.

Attached is a mail message that got a score of 5.7/5 - 1.8 points of which was
from  MIME_QP_LONG_LINE.

Even though the RFC states 76 chars, would upping it to (say) 80 chars stop it
hitting spam? i.e. are the spammers who this rule affects the sorts of morons
who write one loooooong QP sentence, or are they writing it as 78char too?

Thanks

Jason
Comment 1 Jason Haar 2007-05-31 19:32:38 UTC
Created attachment 3955 [details]
email demonstrating bad QP
Comment 2 Ben Lentz 2007-06-22 07:48:34 UTC
I concur, I've had several FPs at my site, with this rule being the straw that
breaks the camel's back. This is mostly happening on newsletter-type emails
(Boston Globe, for example), scoring a combination of MIME_HTML_ONLY +
MIME_QP_LONG_LINE + DCC/Razor2/Pyzor (because it's bulk, but it's not spam).

I'm thinking maybe the point value should be adjusted down, rather than changing
the length, if the existing length is based on a particular RFC defining QP.
Comment 3 Theo Van Dinter 2007-06-25 11:40:44 UTC
FWIW, I wrote up a quoted-printable length function similar to the base64 length
function.  Here are some results:

  0.352   0.4118   0.0100    0.976   1.00    0.00  T_QP_LENGTH_84_85
  0.253   0.2971   0.0000    1.000   0.92    0.00  T_QP_LENGTH_82_83
  0.314   0.3666   0.0100    0.973   0.88    0.00  T_QP_LENGTH_83_84
  2.567   2.9953   0.0903    0.971   0.79    0.00  T_QP_LENGTH_81_82
  4.209   4.8874   0.2911    0.944   0.71    0.00  T_QP_LENGTH_79_80
 14.397  16.4744   2.3989    0.873   0.53    0.00  MIME_QP_LONG_LINE
  4.975   5.7196   0.6725    0.895   0.50    0.00  T_QP_LENGTH_90_INF
  5.701   6.4911   1.1342    0.851   0.46    0.00  T_QP_LENGTH_78_79
  0.142   0.1668   0.0000    1.000   0.45    0.00  T_QP_LENGTH_89_90
  2.755   3.1656   0.3814    0.892   0.42    0.00  T_QP_LENGTH_80_81
 10.542   9.4239  17.0029    0.357   0.25    0.00  T_QP_LENGTH_77_78
  0.230   0.2571   0.0703    0.785   0.24    0.00  T_QP_LENGTH_87_88
  0.267   0.2902   0.1305    0.690   0.15    0.00  T_QP_LENGTH_86_87
  0.243   0.2624   0.1305    0.668   0.03    0.00  T_QP_LENGTH_85_86
  0.095   0.0990   0.0703    0.585   0.00    0.00  T_QP_LENGTH_88_89


So there's no clear winner here, though 81-85 may be interesting.
Comment 4 Dennis Pearson 2008-08-27 14:33:19 UTC
76 is the max BUT this is supposed to 'exclude' trailing CR/LF (if used)... so testing for 76 or 78 seems acceptable...

Comment 5 James Ralston 2009-08-20 13:34:20 UTC
I can confirm this behavior as of 2009-07-07.

This mailer:

    X-Mailer: Apple Mail (2.935.3)

has an error in the way it performs QP-encoding. Specifically, if the last character in a line is a raw "=" character, it doesn't seem to include the length of the expansion caused by encoding ("=" -> "=3D") in the line length calculation, which means that it will generate a QP-line that is 2 characters too long.

For example, this 66-character line:

blah blah blah blah blah blah blah blah blah foo foo foo foo =====

gets QP-encoded to this 77-character line:

blah blah blah blah blah blah blah blah blah foo foo foo foo =3D=3D=3D=3D=3D=

This trips the MIME_QP_LONG_LINE test.

I suspect "Apple Mail (2.935.3)" could produce a 78-character line as well, depending on the column in which the final raw "=" falls.

Can we change the maximum length for the MIME_QP_LONG_LINE test from 76 characters to 78 characters, please? That should stop this test from erroneously hitting on ham "Apple Mail (2.935.3)" mail...
Comment 6 Dale Blount 2010-05-17 16:20:36 UTC
Status?  I'm still seeing problems with this hitting on HAM, granted I'm still running 3.2.5.
Comment 7 Dale Blount 2010-05-18 13:20:40 UTC
I got a note from the user that the mail that triggered this was composed using Outlook Web Access.
Comment 8 Mark Martinec 2010-06-03 12:49:12 UTC
Created attachment 4766 [details]
Proposed patch to change a QP line limit from 76 to 78

-        if (length > 77) {
+        # RFC 5322: Each line SHOULD be no more than 78 characters,
+        #           excluding the CRLF
+        # RFC 2045: The Quoted-Printable encoding REQUIRES that
+        #           encoded lines be no more than 76 characters long.
+        # Bug 5491: 6% of email classified as HAM by SA triggered the
+        #           MIME_QP_LONG_LINE rule. Apple Mail can generate a QP-line
+        #           that is 2 chars too long. Same goes for Outlook Web Access.
+        # lines include one trailing \n character
+      # if (length > 76+1) {  # conforms to RFC 5322 and RFC 2045
+        if (length > 78+1) {  # conforms to RFC 5322 only, not RFC 2045


trunk:
  Bug 5491: MIME_QP_LONG_LINE triggering on valid email
  change a QP line limit from 76 to 78
Sending lib/Mail/SpamAssassin/Plugin/MIMEEval.pm
Committed revision 951065.
Comment 9 AXB 2010-06-03 12:55:39 UTC
(In reply to comment #8)
> Created an attachment (id=4766) [details]
> Proposed patch to change a QP line limit from 76 to 78
> 
> -        if (length > 77) {
> +        # RFC 5322: Each line SHOULD be no more than 78 characters,
> +        #           excluding the CRLF
> +        # RFC 2045: The Quoted-Printable encoding REQUIRES that
> +        #           encoded lines be no more than 76 characters long.
> +        # Bug 5491: 6% of email classified as HAM by SA triggered the
> +        #           MIME_QP_LONG_LINE rule. Apple Mail can generate a QP-line
> +        #           that is 2 chars too long. Same goes for Outlook Web
> Access.
> +        # lines include one trailing \n character
> +      # if (length > 76+1) {  # conforms to RFC 5322 and RFC 2045
> +        if (length > 78+1) {  # conforms to RFC 5322 only, not RFC 2045
> 
> 
> trunk:
>   Bug 5491: MIME_QP_LONG_LINE triggering on valid email
>   change a QP line limit from 76 to 78
> Sending lib/Mail/SpamAssassin/Plugin/MIMEEval.pm
> Committed revision 951065.

+1
Comment 10 Kevin A. McGrail 2010-06-03 16:57:37 UTC
Before we commit this, can we create a test rule that gives us an idea of the change in HAM/SPAM %'s that hit the rule when it's changed from 76 to 78?

In short, the code looks great but my concern is that by NOT enforcing the RFC, is the rule going to be pointless.
Comment 11 Kevin A. McGrail 2011-05-05 20:08:42 UTC
(In reply to comment #10)
> Before we commit this, can we create a test rule that gives us an idea of the
> change in HAM/SPAM %'s that hit the rule when it's changed from 76 to 78?
> 
> In short, the code looks great but my concern is that by NOT enforcing the RFC,
> is the rule going to be pointless.

I'll change my vote to a +1.  To delay the change is almost as pointless.

KAM
Comment 12 Mark Martinec 2011-05-06 00:19:02 UTC
branch 3.3:
  Bug 5491: MIME_QP_LONG_LINE triggering on valid email
  Sending lib/Mail/SpamAssassin/Plugin/MIMEEval.pm
Committed revision 1100005.

Closing.