James Mime4j
  1. James Mime4j
  2. MIME4J-6

Loading bodies on demand instead of using temporary files

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:
      None

      Description

      Make the DOM-like parser capable of loading bodies (at least large attachment)
      on demand instead of using temporary files.

        Activity

        Hide
        Jochen Wiedmann added a comment -

        I can't imagine how this should work. Suggestions?

        Show
        Jochen Wiedmann added a comment - I can't imagine how this should work. Suggestions?
        Hide
        Niklas Therning added a comment -

        If the mail is read from disk Mime4j could scan it first to find the boundaries of the body parts. It would then store the start and end file positions of each part. When a body part is requested it would read that particular portion of the message from the file using the start and end positions. The header of each part could probably be kept in memory.

        One drawback compared to the temporary file solution would be that when using temporary files the decoded part data can be stored directly. The decoding is only done once. One could also use a hybrid solution: decoded parts are stored in temp files the first time they are decoded.

        Temporary files would also still have to be used I guess if the message isn't read from disk (no underlying FileChannel/RandomAccessFile).

        Show
        Niklas Therning added a comment - If the mail is read from disk Mime4j could scan it first to find the boundaries of the body parts. It would then store the start and end file positions of each part. When a body part is requested it would read that particular portion of the message from the file using the start and end positions. The header of each part could probably be kept in memory. One drawback compared to the temporary file solution would be that when using temporary files the decoded part data can be stored directly. The decoding is only done once. One could also use a hybrid solution: decoded parts are stored in temp files the first time they are decoded. Temporary files would also still have to be used I guess if the message isn't read from disk (no underlying FileChannel/RandomAccessFile).
        Hide
        Jochen Wiedmann added a comment -

        There's no specific Mime4J support required for your proposal. It could simply be implemented by running Mime4J in streaming mode on the disk file while ignoring the attachments.

        Show
        Jochen Wiedmann added a comment - There's no specific Mime4J support required for your proposal. It could simply be implemented by running Mime4J in streaming mode on the disk file while ignoring the attachments.
        Hide
        Niklas Therning added a comment -

        The idea is to make the Message class and friends be able to do this so the user won't have to. Message uses the streaming parser internally to build the message tree. IIRC the streaming parser does some look-ahead when searching for boundaries etc. This means that when your callback in your ContentHandler is called you cannot assume that the current file position in the underlying stream corresponds to the start of the body part. Mime4j doesn't currently provide a means to determine the actually start of a body part in the underlying stream. At least that has to be supported by Mime4j. When that has been done I don't think it would be very hard to extend Message and friends to be able to load parts as they are requested.

        Show
        Niklas Therning added a comment - The idea is to make the Message class and friends be able to do this so the user won't have to. Message uses the streaming parser internally to build the message tree. IIRC the streaming parser does some look-ahead when searching for boundaries etc. This means that when your callback in your ContentHandler is called you cannot assume that the current file position in the underlying stream corresponds to the start of the body part. Mime4j doesn't currently provide a means to determine the actually start of a body part in the underlying stream. At least that has to be supported by Mime4j. When that has been done I don't think it would be very hard to extend Message and friends to be able to load parts as they are requested.
        Hide
        Robert Burrell Donkin added a comment -

        I think that would be possible but would require some deep changes. Really need to add seek to architecture. Leave till 0.5.

        Show
        Robert Burrell Donkin added a comment - I think that would be possible but would require some deep changes. Really need to add seek to architecture. Leave till 0.5.
        Hide
        Stefano Bagnara added a comment -

        In the JavaMail/Activation world they use the SharedInputStream to address a similar issue.

        When the InputStream passed to the parser is a SharedInputStream then it is possibile to create child streams at any time by using a reference to the original stream and a pos/len parameter. This way it should be possible to keep reading stuff from the original source instead of creating temporary files.

        When a SharedInputStream is passed Javamail does not need to clone the content somewhere else when you need dom access, otherwise Javamail simply copy the inputstream to an internal bytearray (and use it via the SharedInputStream interface).

        Unfortunately we have nested decodings so most time to lazily load some nested content you will have to redecode most of the file.

        Show
        Stefano Bagnara added a comment - In the JavaMail/Activation world they use the SharedInputStream to address a similar issue. When the InputStream passed to the parser is a SharedInputStream then it is possibile to create child streams at any time by using a reference to the original stream and a pos/len parameter. This way it should be possible to keep reading stuff from the original source instead of creating temporary files. When a SharedInputStream is passed Javamail does not need to clone the content somewhere else when you need dom access, otherwise Javamail simply copy the inputstream to an internal bytearray (and use it via the SharedInputStream interface). Unfortunately we have nested decodings so most time to lazily load some nested content you will have to redecode most of the file.
        Hide
        Robert Burrell Donkin added a comment -

        Need to review StorageProvider WRT semi-DOM use cases but this should wait until after the 0.6 release

        Show
        Robert Burrell Donkin added a comment - Need to review StorageProvider WRT semi-DOM use cases but this should wait until after the 0.6 release

          People

          • Assignee:
            Unassigned
            Reporter:
            Norman Maurer
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development