Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
0.6
-
None
-
None
Description
Weird things are:
- We have 2 separate code to handle headless messages in the 2 type of parsers (MimeStreamParser does not rely on the MimeTokenStream solution).
- MimeTokenStream headless parsing is done so to start the parsing and the state events from "MimeTokenStream.T_END_HEADER": I think this is unexpected and either it should alternatively:
a) "simulate" all of the events (starting from T_START_MESSAGE... and simulating a full event stream for an header with only the supplied content-type)
b) return only from START_MULTIPART to END_MULTIPART (or simply T_BODY if the content-type was not a multipart), but not return a T_END_MESSAGE as it never returned a T_START_MESSAGE.
c) return all of the events as if it was interrupted after the header, so starting from START_MULTIPART / T_BODY through all of the events including T_END_MESSAGE/T_END_OF_STREAM until it consumed all of the stream.
I've a small preference for a and c because b doesn't seem to be feasible (currently the parser does not stop in the last boundary but includes all of the content after the last boundary in the epilogue, so you can't really use the headless mode to run partial parsing if you don't have a limited stream). Clearly the current way (starting from T_END_HEADER) seems the worst one.
- MimeStreamParser instead simulate the headless parsing by simply prepending a fake header including an artificial content-type based on the supplied contentType
- In both cases it is not clear, in the contract, what kind of encoding/wrapping is done/expected on the "contenttype" parameter.
Related: http://issues.apache.org/jira/browse/MIME4J-128
I have a question: what are the use-cases for the 2 current headless parsing? I'd like to better understand them so to be able to choose the best "fix" for this issue.