Bug 55026 - Parse the parameter part in the ContentType definition
Summary: Parse the parameter part in the ContentType definition
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: POI Overall (show other bugs)
Version: 3.9-FINAL
Hardware: All All
: P1 enhancement with 1 vote (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-05-29 14:20 UTC by Sebastien Schneider
Modified: 2014-02-19 23:36 UTC (History)
2 users (show)



Attachments
OPC file with Content_Types.xml containing parameters (3.15 KB, application/octet-stream)
2013-05-29 20:43 UTC, Sebastien Schneider
Details
Patch for both files ContentType.java and TestContentType.java (4.48 KB, patch)
2013-08-29 16:01 UTC, Sebastien Schneider
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Sebastien Schneider 2013-05-29 14:20:49 UTC
Hi POI team,

My enhancement is related to ContentType support in the openxml4j part of the POI library.
In the current 3.9 version, ContentType  containing parameters throw a "malformed content type" exception when parsing the OPC document.
Such ContentType could be of the form "application/xml;key1=value1;key2=value2"

There's already code to support this format in the ContentType class but it's commented out !

Is it possible to activate this ContentType format in a future version ?

Thank you,
Sebastien.
Comment 1 Nick Burch 2013-05-29 14:35:28 UTC
Do you have a sample file that has parameters in it? And if so, could you please upload it, ideally along with a short unit test that shows you trying to load + read them?
Comment 2 Sebastien Schneider 2013-05-29 20:43:27 UTC
Created attachment 30341 [details]
OPC file with Content_Types.xml containing parameters

I attach a very simple OPC file with a "Content_Types.xml" which contains parameters:
ContentType="application/x-resqml+xml;version=2.0;type=obj_global2dCrs"

The only line of code I need to highlight the problem is the OPCPackage.open method call like that:
OPCPackage p = OPCPackage.open("opc_contenttype_test_wparams.opc", PackageAccess.READ);

This call throw the following exception:
org.apache.poi.openxml4j.exceptions.InvalidFormatException: The specified content type 'application/x-resqml+xml;version=2.0;type=obj_global1dCrs' is not compliant with RFC 2616: malformed content type.

I think that it's because the code from the /[Apache-SVN]/poi/trunk/src/ooxml/java/org/apache/poi/openxml4j/opc/internal/ContentType.java doesn't support such ContentType string format.

Thank you,
cheers,
Sebastien.
Comment 3 Nick Burch 2013-05-29 22:24:54 UTC
In r1487657 I have added your unit test, and stubbed out the unit tests we'll need

The next step is to review the ooxml spec, then write the unit tests for valid parameters based on the stubbed out bits

Finally, we can then try to enable the parameter logic

If you have some time to have on part #2, that'd be great!
Comment 4 Sebastien Schneider 2013-05-30 07:29:07 UTC
Unfortunately I won't have time to work on that now. I hope to be able to help a little bit next month ...


(In reply to Nick Burch from comment #3)
> In r1487657 I have added your unit test, and stubbed out the unit tests
> we'll need
> 
> The next step is to review the ooxml spec, then write the unit tests for
> valid parameters based on the stubbed out bits
> 
> Finally, we can then try to enable the parameter logic
> 
> If you have some time to have on part #2, that'd be great!
Comment 5 Sebastien Schneider 2013-05-30 09:39:02 UTC
I just reviewed the ooxml spec from the document ISO_IEC_29500-2_2012.pdf, the ContentType format is specified in 9.1.2 by referencing the RFC2616, paragraph 3.7. The format of the media-type defined by ContentType is as follows:
media-type = type "/" subtype *( ";" parameter )
where parameter is expressed as
attribute "=" value

Now needs to complete unit test and enable the corresponding code in ContentType.java parsing implementation.

Sebastien.


(In reply to Nick Burch from comment #3)
> In r1487657 I have added your unit test, and stubbed out the unit tests
> we'll need
> 
> The next step is to review the ooxml spec, then write the unit tests for
> valid parameters based on the stubbed out bits
> 
> Finally, we can then try to enable the parameter logic
> 
> If you have some time to have on part #2, that'd be great!
Comment 6 Sebastien Schneider 2013-08-29 16:01:10 UTC
Created attachment 30782 [details]
Patch for both files ContentType.java and TestContentType.java

I propose you the fix for this bug. I complete the unit test with hard coded parameterized content type but I don't implement the file unit test.

I had an issue with the Java pattern matcher that do not handle multiple group when matching automatically, so I had to add a second matcher specialized to process parameters.

It works well on my files.

Thank you in advance for integration and feel free to modify it the proper way.
Comment 7 Nick Burch 2014-02-19 23:36:07 UTC
Thanks for this patch, and sorry it got forgotten

I've done some work on this myself, and then incorporated much of your logic and tests too. As of r1569976 we're now able to process these content types without error, and we have a lot more testing around it all.

Thanks for your help!