Bug 6708 - Message get_decoded_body_text_array() injects empty MIME-part
Summary: Message get_decoded_body_text_array() injects empty MIME-part
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Libraries (show other bugs)
Version: 3.3 SVN branch
Hardware: All All
: P2 blocker
Target Milestone: 3.4.2
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-11-25 23:39 UTC by Karsten Bräckelmann
Modified: 2018-06-30 06:13 UTC (History)
2 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status
Email example patch None Giovanni Bechis [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Karsten Bräckelmann 2011-11-25 23:39:15 UTC
M::SA::Message get_decoded_body_text_array() injects an empty (chunk of a) MIME-part between all textual MIME-parts (or their chunks) into the returned array of strings. Used for rawbody rules.

See this line in the function.

  push(@{$self->{text_decoded}}, "\n") if ( @{$self->{text_decoded}} );

I didn't find any purpose this might serve. However, rawbody rules will actually be applied against these almost empty strings. No big impact, but needless RE matching nonetheless.

Moreover, these empty rawbody lines prevent short-body size checks from working as expected. Non-intuitively, the following rule matches any message with 2 textual MIME-parts, regardless how long.

  rawbody __MIMEPART_LE_200  /^.{0,200}$/s

It matches the injected almost empty string. To prevent that, the minimum needs to be 2 rather than 0.

There is one caveat: Plugin/BodyEval.pm check_blank_line_ratio(). The ratio of empty lines is affected by this, since we introduce some empty lines ourself. Probably not intended, but it will be effected by eliminating these empty lines between MIME-parts.
Comment 1 Kevin A. McGrail 2011-11-26 13:48:49 UTC
I'll target this for 3.4.1.

It needs more testing IMO.  Plus that loop hurts my brain to unroll ;-)
Comment 2 Karsten Bräckelmann 2011-11-26 22:20:04 UTC
(In reply to comment #1)
> It needs more testing IMO.  Plus that loop hurts my brain to unroll ;-)

Referring to split_into_array_of_short_paragraphs() I assume? Agreed. However, that's just breaking large-ish MIME-parts up into chunks, and can be ignored. It used to be one string pushed onto the array per MIME-part, now it's one or more.

The loop in question for this bug, in get_decoded_body_text_array():

  # Go through each part
  for(my $pt = 0 ; $pt <= $#parts ; $pt++ ) {

    push(@{$self->{text_decoded}}, "\n") if ( @{$self->{text_decoded}} );
    push(@{$self->{text_decoded}},
         split_into_array_of_short_paragraphs($parts[$pt]->decode()));
  }

That's simply going through all relevant MIME-parts, and pushing them onto the text_decoded array -- with an additional "\n" string pushed onto the array, if it evaluates true in scalar context, or simply, if there already has been content pushed onto the array before. Aka inject an empty line between MIME-parts.

Anyone know why this has been done? Probably left-over from the old days, where the content for the rawbody rules actually has been lines -- rather than MIME-parts, or chunks thereof since 3.3.


Grepping for this function name shows it really is used only for rawbody rules.

Well, and the BodyEval check_blank_line_ratio(), which likely should not count artificially injected blank lines between MIME-parts anyway. That's not content...
Comment 3 Kevin A. McGrail 2015-04-13 22:07:38 UTC
Sorry this got overlooked again.  Making it a blocker for 3.4.2 so it gets looked at more.
Comment 4 Giovanni Bechis 2018-02-15 08:08:27 UTC
Created attachment 5529 [details]
Email example

I do not think that 
"push(@{$self->{text_decoded}}, "\n") if ( @{$self->{text_decoded}} );"
is needed, IMHO injecting a blank line between mime parts let us analyze an email that is less equal to the one that has been submitted.
Comment 5 Giovanni Bechis 2018-06-30 06:13:12 UTC
Fix committed in r1834722 for both trunk and 3.4.