Hi POI team, My enhancement is related to ContentType support in the openxml4j part of the POI library. In the current 3.9 version, ContentType containing parameters throw a "malformed content type" exception when parsing the OPC document. Such ContentType could be of the form "application/xml;key1=value1;key2=value2" There's already code to support this format in the ContentType class but it's commented out ! Is it possible to activate this ContentType format in a future version ? Thank you, Sebastien.
Do you have a sample file that has parameters in it? And if so, could you please upload it, ideally along with a short unit test that shows you trying to load + read them?
Created attachment 30341 [details] OPC file with Content_Types.xml containing parameters I attach a very simple OPC file with a "Content_Types.xml" which contains parameters: ContentType="application/x-resqml+xml;version=2.0;type=obj_global2dCrs" The only line of code I need to highlight the problem is the OPCPackage.open method call like that: OPCPackage p = OPCPackage.open("opc_contenttype_test_wparams.opc", PackageAccess.READ); This call throw the following exception: org.apache.poi.openxml4j.exceptions.InvalidFormatException: The specified content type 'application/x-resqml+xml;version=2.0;type=obj_global1dCrs' is not compliant with RFC 2616: malformed content type. I think that it's because the code from the /[Apache-SVN]/poi/trunk/src/ooxml/java/org/apache/poi/openxml4j/opc/internal/ContentType.java doesn't support such ContentType string format. Thank you, cheers, Sebastien.
In r1487657 I have added your unit test, and stubbed out the unit tests we'll need The next step is to review the ooxml spec, then write the unit tests for valid parameters based on the stubbed out bits Finally, we can then try to enable the parameter logic If you have some time to have on part #2, that'd be great!
Unfortunately I won't have time to work on that now. I hope to be able to help a little bit next month ... (In reply to Nick Burch from comment #3) > In r1487657 I have added your unit test, and stubbed out the unit tests > we'll need > > The next step is to review the ooxml spec, then write the unit tests for > valid parameters based on the stubbed out bits > > Finally, we can then try to enable the parameter logic > > If you have some time to have on part #2, that'd be great!
I just reviewed the ooxml spec from the document ISO_IEC_29500-2_2012.pdf, the ContentType format is specified in 9.1.2 by referencing the RFC2616, paragraph 3.7. The format of the media-type defined by ContentType is as follows: media-type = type "/" subtype *( ";" parameter ) where parameter is expressed as attribute "=" value Now needs to complete unit test and enable the corresponding code in ContentType.java parsing implementation. Sebastien. (In reply to Nick Burch from comment #3) > In r1487657 I have added your unit test, and stubbed out the unit tests > we'll need > > The next step is to review the ooxml spec, then write the unit tests for > valid parameters based on the stubbed out bits > > Finally, we can then try to enable the parameter logic > > If you have some time to have on part #2, that'd be great!
Created attachment 30782 [details] Patch for both files ContentType.java and TestContentType.java I propose you the fix for this bug. I complete the unit test with hard coded parameterized content type but I don't implement the file unit test. I had an issue with the Java pattern matcher that do not handle multiple group when matching automatically, so I had to add a second matcher specialized to process parameters. It works well on my files. Thank you in advance for integration and feel free to modify it the proper way.
Thanks for this patch, and sorry it got forgotten I've done some work on this myself, and then incorporated much of your logic and tests too. As of r1569976 we're now able to process these content types without error, and we have a lot more testing around it all. Thanks for your help!