Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.6
    • Fix Version/s: 0.7
    • Component/s: parser (core)
    • Labels:
      None

      Description

      The name of sub-boundary to the upper boundary in the name + @ If you can not retrieve the attachment area.

      modify : BufferedLineReaderInputStream.indexOf(final byte[] pattern, int off, int len)

      see image.

      boundary="NextPart__5980.1307607747"
      boundary="NextPart__5980.13076077471"

      'NextPart_5980.1307607747' is also searched as a search NextPart_5980.13076077471.

      [ex eml]--------------------------------------------------------------------------------------------------------------------------------
      Content-Type: multipart/mixed;
      boundary="NextPart__5980.1307607747"

      This is a multi-part message in MIME format.

      --NextPart__5980.1307607747
      Content-Type: multipart/alternative;
      boundary="NextPart__5980.13076077471"

      --NextPart__5980.13076077471
      Content-Type: text/plain;
      charset="ks_c_5601-1987"
      Content-Transfer-Encoding: base64

      --NextPart__5980.13076077471
      Content-Type: text/html;
      charset="ks_c_5601-1987"
      Content-Transfer-Encoding: base64
      ...

      -NextPart__5980.13076077471-

      --NextPart__5980.1307607747
      Content-Type: application/octet-stream;
      name="1"
      Content-Transfer-Encoding: base64
      Content-Disposition: attachment;
      filename="1"
      ...
      --------------------------------------------------------------------------------------------------------------------------------

      1. ASF.LICENSE.NOT.GRANTED--screenshot-1.jpg
        94 kB
        Yong-Seong Kim
      2. boundary_Test_mail.eml
        138 kB
        Yong-Seong Kim

        Activity

        Hide
        Oleg Kalnichevski added a comment -

        Boundary scanning code in MimeBoundaryInputStream has been improved and an extra check has been added to ensure that multipart boundary is properly terminated with a white space or '-' character.

        Please re-test your application against the latest SVN snapshot.

        Oleg

        Show
        Oleg Kalnichevski added a comment - Boundary scanning code in MimeBoundaryInputStream has been improved and an extra check has been added to ensure that multipart boundary is properly terminated with a white space or '-' character. Please re-test your application against the latest SVN snapshot. Oleg
        Hide
        Oleg Kalnichevski added a comment -

        I agree with Stefano the message is malformed and the easiest way would be to treat such messages as invalid. I'll see, though, if support for parsing such messages in lenient mode could be added without breaking existing test cases.

        Oleg

        Show
        Oleg Kalnichevski added a comment - I agree with Stefano the message is malformed and the easiest way would be to treat such messages as invalid. I'll see, though, if support for parsing such messages in lenient mode could be added without breaking existing test cases. Oleg
        Hide
        Yong-Seong Kim added a comment -

        The pattern of the incoming mail made ​​a test mail.

        Show
        Yong-Seong Kim added a comment - The pattern of the incoming mail made ​​a test mail.
        Hide
        Stefano Bagnara added a comment -

        Forgot to say that the fact that we accept a line starting with the boundary but not ending with CRLF as a valid boundary is already a lenient behaviour.

        The strict behaviour should be to fail the parsing of that message as it is invalid (BTW I'm not sure this is really handled by the strict flag, but this is not importart WRT this issue).

        The lenient behaviour is something we have to carefully choose. IIRC we have a testcase that checks for the current lenient behaviour so if we change it to support this use case, the older test will probably fail.

        Show
        Stefano Bagnara added a comment - Forgot to say that the fact that we accept a line starting with the boundary but not ending with CRLF as a valid boundary is already a lenient behaviour. The strict behaviour should be to fail the parsing of that message as it is invalid (BTW I'm not sure this is really handled by the strict flag, but this is not importart WRT this issue). The lenient behaviour is something we have to carefully choose. IIRC we have a testcase that checks for the current lenient behaviour so if we change it to support this use case, the older test will probably fail.
        Hide
        Stefano Bagnara added a comment -

        the message you "propose" is not a valid mime message.

        here is the spec violated:

        body-part = <"message" as defined in RFC 822,
        with all header fields optional, and with the
        specified delimiter not occurring anywhere in
        the message body, either on a line by itself
        or as a substring anywhere. Note that the
        semantics of a part differ from the semantics
        of a message, as described in the text.>

        The "specified delimited" (boundary) cannot occour in the content of a multipart message, even as a substring. The boundary sequence "NextPart__5980.1307607747" appears multiple times as a substring in the content of that message.

        So, given an invalid message we have to decide how to deal. In this case we probably chose to use the fastest algorythm and to consider the first occource as a malformed boundary. I reread the RFC and I think our current behaviour is correct.

        The fix you propose is invalid (IMO) because "indexOf" is a generic function and changing its behaviour to "indexOfLineEndingWith" is not an option.

        The RFC also says:

        The encapsulation boundary MUST NOT appear inside any of the encapsulated parts

        boundary is defined as

        boundary := 0*69<bchars> bcharsnospace

        so the CRLF before and after the boundary are not part of the boundary.

        Can you say what MUA produces this badly formatted email?

        Show
        Stefano Bagnara added a comment - the message you "propose" is not a valid mime message. here is the spec violated: — body-part = <"message" as defined in RFC 822, with all header fields optional, and with the specified delimiter not occurring anywhere in the message body, either on a line by itself or as a substring anywhere. Note that the semantics of a part differ from the semantics of a message, as described in the text.> — The "specified delimited" (boundary) cannot occour in the content of a multipart message, even as a substring. The boundary sequence "NextPart__5980.1307607747" appears multiple times as a substring in the content of that message. So, given an invalid message we have to decide how to deal. In this case we probably chose to use the fastest algorythm and to consider the first occource as a malformed boundary. I reread the RFC and I think our current behaviour is correct. The fix you propose is invalid (IMO) because "indexOf" is a generic function and changing its behaviour to "indexOfLineEndingWith" is not an option. The RFC also says: — The encapsulation boundary MUST NOT appear inside any of the encapsulated parts — boundary is defined as — boundary := 0*69<bchars> bcharsnospace — so the CRLF before and after the boundary are not part of the boundary. Can you say what MUA produces this badly formatted email?
        Hide
        Norman Maurer added a comment -

        Can you supply a test case ? I think its already fixed in upcomming 0.7

        Show
        Norman Maurer added a comment - Can you supply a test case ? I think its already fixed in upcomming 0.7
        Hide
        Yong-Seong Kim added a comment -

        modify org.apache.james.mime4j.io.BufferedLineReaderInputStream

        Show
        Yong-Seong Kim added a comment - modify org.apache.james.mime4j.io.BufferedLineReaderInputStream

          People

          • Assignee:
            Unassigned
            Reporter:
            Yong-Seong Kim
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 4h
              4h
              Remaining:
              Remaining Estimate - 4h
              4h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development