SA Bugzilla – Bug 6708
Message get_decoded_body_text_array() injects empty MIME-part
Last modified: 2018-06-30 06:13:12 UTC
M::SA::Message get_decoded_body_text_array() injects an empty (chunk of a) MIME-part between all textual MIME-parts (or their chunks) into the returned array of strings. Used for rawbody rules. See this line in the function. push(@{$self->{text_decoded}}, "\n") if ( @{$self->{text_decoded}} ); I didn't find any purpose this might serve. However, rawbody rules will actually be applied against these almost empty strings. No big impact, but needless RE matching nonetheless. Moreover, these empty rawbody lines prevent short-body size checks from working as expected. Non-intuitively, the following rule matches any message with 2 textual MIME-parts, regardless how long. rawbody __MIMEPART_LE_200 /^.{0,200}$/s It matches the injected almost empty string. To prevent that, the minimum needs to be 2 rather than 0. There is one caveat: Plugin/BodyEval.pm check_blank_line_ratio(). The ratio of empty lines is affected by this, since we introduce some empty lines ourself. Probably not intended, but it will be effected by eliminating these empty lines between MIME-parts.
I'll target this for 3.4.1. It needs more testing IMO. Plus that loop hurts my brain to unroll ;-)
(In reply to comment #1) > It needs more testing IMO. Plus that loop hurts my brain to unroll ;-) Referring to split_into_array_of_short_paragraphs() I assume? Agreed. However, that's just breaking large-ish MIME-parts up into chunks, and can be ignored. It used to be one string pushed onto the array per MIME-part, now it's one or more. The loop in question for this bug, in get_decoded_body_text_array(): # Go through each part for(my $pt = 0 ; $pt <= $#parts ; $pt++ ) { push(@{$self->{text_decoded}}, "\n") if ( @{$self->{text_decoded}} ); push(@{$self->{text_decoded}}, split_into_array_of_short_paragraphs($parts[$pt]->decode())); } That's simply going through all relevant MIME-parts, and pushing them onto the text_decoded array -- with an additional "\n" string pushed onto the array, if it evaluates true in scalar context, or simply, if there already has been content pushed onto the array before. Aka inject an empty line between MIME-parts. Anyone know why this has been done? Probably left-over from the old days, where the content for the rawbody rules actually has been lines -- rather than MIME-parts, or chunks thereof since 3.3. Grepping for this function name shows it really is used only for rawbody rules. Well, and the BodyEval check_blank_line_ratio(), which likely should not count artificially injected blank lines between MIME-parts anyway. That's not content...
Sorry this got overlooked again. Making it a blocker for 3.4.2 so it gets looked at more.
Created attachment 5529 [details] Email example I do not think that "push(@{$self->{text_decoded}}, "\n") if ( @{$self->{text_decoded}} );" is needed, IMHO injecting a blank line between mime parts let us analyze an email that is less equal to the one that has been submitted.
Fix committed in r1834722 for both trunk and 3.4.