James Mime4j
  1. James Mime4j
  2. MIME4J-138

DecoderUtil.decodeEncodedWords() fails if encoded-text starts with an equals sign

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.6
    • Fix Version/s: 0.6.1, 0.7
    • Component/s: None
    • Labels:
      None

      Description

      For example "=?utf-8?Q?=20test?=" should be decoded as " test". Instead it does not get decoded at all.

      The problem is that DecoderUtil.decodeEncodedWords() falsely identifies =?utf-8?Q?= as the encoded word which cannot be decoded.

      1. DecoderUtil.diff
        13 kB
        Aron Wieck
      2. DecoderUtil.java
        11 kB
        Aron Wieck
      3. mime4j-138.patch
        8 kB
        Markus Wiederkehr
      4. RegexDecoderUtil.java
        4 kB
        Aron Wieck

        Activity

        Markus Wiederkehr created issue -
        Markus Wiederkehr made changes -
        Field Original Value New Value
        Assignee Markus Wiederkehr [ wmax ]
        Hide
        Aron Wieck added a comment -

        Proposed fix using Regular Expression. The class RegexDecoderUtil only contains the 3 changed methods.

        Show
        Aron Wieck added a comment - Proposed fix using Regular Expression. The class RegexDecoderUtil only contains the 3 changed methods.
        Aron Wieck made changes -
        Attachment RegexDecoderUtil.java [ 12416865 ]
        Hide
        Bernd Fondermann added a comment -

        Hi Aron,

        is there any change you can provide this as a diff patch?

        You can create patches with "svn diff" on the command line or by using your favorite IDEs functionality.

        Thanks!

        Show
        Bernd Fondermann added a comment - Hi Aron, is there any change you can provide this as a diff patch? You can create patches with "svn diff" on the command line or by using your favorite IDEs functionality. Thanks!
        Hide
        Aron Wieck added a comment - - edited

        Attached svn diff and full DecoderUtil.java with the changes.

        This differs from my previously uploaded RegexDecoderUtil.java in the following points:

        • the regular expression pattern has been changed to be more tolerant on wrong encoding data. It now assumes that no "?" sign may be in the encoded word other than these four: =?charset?enc?text?=
        • the log output now includes the encoded word again
        • added comments
        Show
        Aron Wieck added a comment - - edited Attached svn diff and full DecoderUtil.java with the changes. This differs from my previously uploaded RegexDecoderUtil.java in the following points: the regular expression pattern has been changed to be more tolerant on wrong encoding data. It now assumes that no "?" sign may be in the encoded word other than these four: =?charset?enc?text?= the log output now includes the encoded word again added comments
        Aron Wieck made changes -
        Attachment DecoderUtil.diff [ 12416869 ]
        Attachment DecoderUtil.java [ 12416870 ]
        Hide
        Bernd Fondermann added a comment -

        thanks, looks like a proper patch. makes reviewing really easy. (others will judge whether to include it into the codebase, though.)

        Show
        Bernd Fondermann added a comment - thanks, looks like a proper patch. makes reviewing really easy. (others will judge whether to include it into the codebase, though.)
        Hide
        Markus Wiederkehr added a comment -

        I've been working on a regex based patch myself since yesterday.

        Today I have incorporated some of Aron's ideas into my code. The result is mime4j-138.patch.

        @All: please review

        @Aron: contributions are always welcome of course, many thanks!

        Show
        Markus Wiederkehr added a comment - I've been working on a regex based patch myself since yesterday. Today I have incorporated some of Aron's ideas into my code. The result is mime4j-138.patch. @All: please review @Aron: contributions are always welcome of course, many thanks!
        Markus Wiederkehr made changes -
        Attachment mime4j-138.patch [ 12416903 ]
        Hide
        Aron Wieck added a comment -

        Great, your regex seems to be even more strict than my suggestion.

        You do not use matcher.appendReplacement and matcher.appendTail, which seems to be better than using them for everything except the separator like I did.

        So thumbs up for the patch from me.

        Show
        Aron Wieck added a comment - Great, your regex seems to be even more strict than my suggestion. You do not use matcher.appendReplacement and matcher.appendTail, which seems to be better than using them for everything except the separator like I did. So thumbs up for the patch from me.
        Markus Wiederkehr made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Nevin added a comment - - edited

        There is an issue in the fix.
        158 if ((!previousWasEncoded) || (!CharsetUtil.isWhitespace(separator)))
        159 sb.append(separator);
        160 matcher.appendReplacement(sb, decoded);
        161 previousWasEncoded = true;

        in funciton call matcher.appendReplacement( ), if the decoded contains "$", an exception will be thrown:
        java.lang.IllegalArgumentException: Illegal group reference
        at java.util.regex.Matcher.appendReplacement(Matcher.java:706)
        at org.apache.james.mime4j.codec.DecoderUtil.decodeEncodedWords(DecoderUtil.java:160)

        As described in java api doc, public Matcher appendReplacement(StringBuffer sb, String replacement):
        Note that backslashes () and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string. Dollar signs may be treated as references to captured subsequences as described above, and backslashes are used to escape literal characters in the replacement string.

        Here is the temporary fix between line 159 and line 160:
        decoded = decoded.replace("
        ", "\\\\");
        decoded = decoded.replace("$", "
        $");

        Show
        Nevin added a comment - - edited There is an issue in the fix. 158 if ((!previousWasEncoded) || (!CharsetUtil.isWhitespace(separator))) 159 sb.append(separator); 160 matcher.appendReplacement(sb, decoded); 161 previousWasEncoded = true; in funciton call matcher.appendReplacement( ), if the decoded contains "$", an exception will be thrown: java.lang.IllegalArgumentException: Illegal group reference at java.util.regex.Matcher.appendReplacement(Matcher.java:706) at org.apache.james.mime4j.codec.DecoderUtil.decodeEncodedWords(DecoderUtil.java:160) As described in java api doc, public Matcher appendReplacement(StringBuffer sb, String replacement): Note that backslashes () and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string. Dollar signs may be treated as references to captured subsequences as described above, and backslashes are used to escape literal characters in the replacement string. Here is the temporary fix between line 159 and line 160: decoded = decoded.replace(" ", "\\\\"); decoded = decoded.replace("$", " $");
        Hide
        Stefano Bagnara added a comment -

        Nevin, there is no "appendReplacement" method call in the committed fix, so I guess you are referring to an older fix. If you have a string that show the problem I can add it to the test suite to prove that the bug does not exist anymore.

        That said, FYI in Java 5 you can use Pattern.quote() to get an escaped version of the string (but there is no more need for escaping in the new code, so this is just for your interest).

        Show
        Stefano Bagnara added a comment - Nevin, there is no "appendReplacement" method call in the committed fix, so I guess you are referring to an older fix. If you have a string that show the problem I can add it to the test suite to prove that the bug does not exist anymore. That said, FYI in Java 5 you can use Pattern.quote() to get an escaped version of the string (but there is no more need for escaping in the new code, so this is just for your interest).
        Hide
        Markus Wiederkehr added a comment -

        @Nevin: a unit test has already been committed to prove that this works. See MIME4J-142 and testEncodedTextMayContainDollarSign() in DecoderUtilTest.

        Show
        Markus Wiederkehr added a comment - @Nevin: a unit test has already been committed to prove that this works. See MIME4J-142 and testEncodedTextMayContainDollarSign() in DecoderUtilTest.
        Hide
        Nevin added a comment -

        I have tried the latest version from the svn, my testcases are passed now.

        The problem originated from the wrong DecoderUtil.java(old version) attached in this page.

        I guess MEMERJ-142 is also originated from the attached DecoderUtil.java.

        Show
        Nevin added a comment - I have tried the latest version from the svn, my testcases are passed now. The problem originated from the wrong DecoderUtil.java(old version) attached in this page. I guess MEMERJ-142 is also originated from the attached DecoderUtil.java.
        Norman Maurer made changes -
        Fix Version/s 0.6.1 [ 12316030 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        12d 9h 41m 1 Markus Wiederkehr 29/Aug/09 22:35

          People

          • Assignee:
            Markus Wiederkehr
            Reporter:
            Markus Wiederkehr
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development