Bug 50154 - POI corrupts file when hyperlink relation is not valid java.util.URI
Summary: POI corrupts file when hyperlink relation is not valid java.util.URI
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: POIFS (show other bugs)
Version: 3.7-dev
Hardware: PC All
: P2 normal with 4 votes (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-10-25 17:42 UTC by Trey Hyde
Modified: 2010-11-17 16:16 UTC (History)
0 users



Attachments
Will corrupt if you open with POI and save a new copy. (29.58 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2010-11-05 18:13 UTC, Trey Hyde
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Trey Hyde 2010-10-25 17:42:50 UTC
I sent this to the POI-users mailing list a few weeks ago and was advised to open a bug here.


I've been injecting custom document properties with POI in both OOXML and OLE files for about 4 months now covering a very large number of documents.    I do have one customer complaining when they open my meta-data injected version of their file that Excel complains about a missing drawing part.

The recovery log is:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> 
- <recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"> 
<logFileName>error027960_01.xml</logFileName> 
<summary>Errors were detected in file 'Z:\Downloads\bad.xlsx'</summary> 
- <removedParts summary="Following is a list of removed parts:"> 
<removedPart>Removed Part: /xl/drawings/drawing1.xml part. (Drawing shape)</removedPart> 
</removedParts> 
</recoveryLog>



Extracting the original file and the one I modified (only the custom document properties) and diff I get the attached diff.



Largely the differences are just formatting changes ... POI making the XML pretty print.

POI drops the standalone="yes" attribute in the XML declaration ... that shouldn't be a big deal.

The thing that jumps out at me is that in xl/drawings/_rels/drawing1.xml.rels, I'm losing relationship rId1 (<Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/hyperlink" Target="#'Instructions (Text)'!B21"/></Relationships>) in version that POI saves to disk.

Does anyone have any suggestions or seen anything like this before?  I started this project back at the beginning of the year with an early 3.7 snapshot, migrated to beta2 and now to beta3 which all exhibit the same issues.






On Wed, 13 Oct 2010, Trey Hyde wrote:
The drawings all seem intact but only tangible difference I see is the missing hyperlink rel in drawing1.  I just modified the code to not actually make any changes and I see the same thing.

OK, looks like a more general bug then, not related to properties modifications. Could you create a bug in bugzilla, and upload a file that demonstrates the problem along with some code that shows it?

Thanks
Nick
Comment 1 Trey Hyde 2010-10-25 17:44:16 UTC
I ran it again after turning on logging in poi... here are the VERY relevant warnings.



[org.apache.poi.openxml4j.opc.PackageRelationshipCollection] Cannot convert #'Instructions (Text)'!B21 in a valid relationship URI-> ignored
java.net.URISyntaxException: Illegal character in fragment at index 14: #'Instructions (Text)'!B21
  at java.net.URI$Parser.fail(URI.java:2809)
  at java.net.URI$Parser.checkChars(URI.java:2982)
  at java.net.URI$Parser.parse(URI.java:3028)
  at java.net.URI.<init>(URI.java:578)
  at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.parseRelationshipsPart(PackageRelationshipCollection.java:363)
  at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.<init>(PackageRelationshipCollection.java:156)
  at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.<init>(PackageRelationshipCollection.java:124)
  at org.apache.poi.openxml4j.opc.PackagePart.loadRelationships(PackagePart.java:527)
  at org.apache.poi.openxml4j.opc.PackagePart.<init>(PackagePart.java:112)
  at org.apache.poi.openxml4j.opc.PackagePart.<init>(PackagePart.java:83)
  at org.apache.poi.openxml4j.opc.PackagePart.<init>(PackagePart.java:128)
  at org.apache.poi.openxml4j.opc.ZipPackagePart.<init>(ZipPackagePart.java:78)
  at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:187)
  at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:592)
  at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:201)
  at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:178)
  at org.apache.poi.openxml4j.opc.OPCPackage.openOrCreate(OPCPackage.java:240)
  at com.centraldesktop.office.metadata.OpenXMLMDReader.isOutdated(OpenXMLMDReader.java:35)
  at com.centraldesktop.office.metadata.Main.main(Main.java:128)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at com.simontuffs.onejar.Boot.run(Boot.java:340)
  at com.simontuffs.onejar.Boot.main(Boot.java:166)
[org.apache.poi.openxml4j.opc.PackageRelationshipCollection] Cannot convert #'Instructions (Text)'!B21 in a valid relationship URI-> ignored
java.net.URISyntaxException: Illegal character in fragment at index 14: #'Instructions (Text)'!B21
  at java.net.URI$Parser.fail(URI.java:2809)
  at java.net.URI$Parser.checkChars(URI.java:2982)
  at java.net.URI$Parser.parse(URI.java:3028)
  at java.net.URI.<init>(URI.java:578)
  at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.parseRelationshipsPart(PackageRelationshipCollection.java:363)
  at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.<init>(PackageRelationshipCollection.java:156)
  at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.<init>(PackageRelationshipCollection.java:124)
  at org.apache.poi.openxml4j.opc.PackagePart.loadRelationships(PackagePart.java:527)
  at org.apache.poi.openxml4j.opc.PackagePart.<init>(PackagePart.java:112)
  at org.apache.poi.openxml4j.opc.PackagePart.<init>(PackagePart.java:83)
  at org.apache.poi.openxml4j.opc.PackagePart.<init>(PackagePart.java:128)
  at org.apache.poi.openxml4j.opc.ZipPackagePart.<init>(ZipPackagePart.java:78)
  at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:187)
  at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:592)
  at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:201)
  at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:178)
  at org.apache.poi.openxml4j.opc.OPCPackage.openOrCreate(OPCPackage.java:240)
  at com.centraldesktop.office.metadata.OpenXMLMDWriter.save(OpenXMLMDWriter.java:33)
  at com.centraldesktop.office.metadata.Main.main(Main.java:130)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at com.simontuffs.onejar.Boot.run(Boot.java:340)
  at com.simontuffs.onejar.Boot.main(Boot.java:166)
Comment 2 Trey Hyde 2010-11-04 21:05:48 UTC
Just found another document that has the same issue where the hyperlink relation is not in a format that is compatible with java.Uri


[org.apache.poi.openxml4j.opc.PackageRelationshipCollection] Cannot convert Macintosh%20HD:Users:marienl:Desktop:##20Lisa%20work:CC_PowerPoint:CC_S.jpg in a valid relationship URI-> ignored
java.net.URISyntaxException: Illegal character in scheme name at index 9: Macintosh%20HD:Users:marienl:Desktop:##20Lisa%20work:CC_PowerPoint:CC_S.jpg
  at java.net.URI$Parser.fail(URI.java:2809)
  at java.net.URI$Parser.checkChars(URI.java:2982)
  at java.net.URI$Parser.parse(URI.java:3009)
  at java.net.URI.<init>(URI.java:578)
  at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.parseRelationshipsPart(PackageRelationshipCollection.java:363)
  at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.<init>(PackageRelationshipCollection.java:156)
  at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.<init>(PackageRelationshipCollection.java:124)
  at org.apache.poi.openxml4j.opc.PackagePart.loadRelationships(PackagePart.java:527)
  at org.apache.poi.openxml4j.opc.PackagePart.<init>(PackagePart.java:112)
  at org.apache.poi.openxml4j.opc.PackagePart.<init>(PackagePart.java:83)
  at org.apache.poi.openxml4j.opc.PackagePart.<init>(PackagePart.java:128)
  at org.apache.poi.openxml4j.opc.ZipPackagePart.<init>(ZipPackagePart.java:78)
  at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:187)
  at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:592)
  at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:201)
...
Comment 3 Trey Hyde 2010-11-05 18:13:54 UTC
Created attachment 26263 [details]
Will corrupt if you open with POI and save a new copy.

Will corrupt if you open with POI and save a new copy.
Comment 4 Trey Hyde 2010-11-05 18:16:12 UTC
The file attached will cause this.   To recreate this file.  Open a new sheet, add an image.   Create a second sheet.  Right click on the image in the first sheet and add a hyperlink.   Make it a link to an "Document" -> "Anchor" and select the other sheet.

[org.apache.poi.openxml4j.opc.PackageRelationshipCollection] Cannot convert #'Another Sheet'!A1 in a valid relationship URI-> ignored
java.net.URISyntaxException: Illegal character in fragment at index 9: #'Another Sheet'!A1
        at java.net.URI$Parser.fail(URI.java:2809)
        at java.net.URI$Parser.checkChars(URI.java:2982)
        at java.net.URI$Parser.parse(URI.java:3028)
        at java.net.URI.<init>(URI.java:578)
        at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.parseRelationshipsPart(PackageRelationshipCollection.java:363)
        at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.<init>(PackageRelationshipCollection.java:156)
        at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.<init>(PackageRelationshipCollection.java:124)
        at org.apache.poi.openxml4j.opc.PackagePart.loadRelationships(PackagePart.java:527)
        at org.apache.poi.openxml4j.opc.PackagePart.<init>(PackagePart.java:112)
        at org.apache.poi.openxml4j.opc.PackagePart.<init>(PackagePart.java:83)
        at org.apache.poi.openxml4j.opc.PackagePart.<init>(PackagePart.java:128)
        at org.apache.poi.openxml4j.opc.ZipPackagePart.<init>(ZipPackagePart.java:78)
        at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:187)
        at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:592)
        at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:201)
        at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:178)
        at org.apache.poi.openxml4j.opc.OPCPackage.openOrCreate(OPCPackage.java:240)
        at com.centraldesktop.office.metadata.OpenXMLMDWriter.save(OpenXMLMDWriter.java:33)
        at com.centraldesktop.office.metadata.Main.main(Main.java:130)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at com.simontuffs.onejar.Boot.run(Boot.java:340)
        at com.simontuffs.onejar.Boot.main(Boot.java:166)
Comment 5 Yegor Kozlov 2010-11-12 07:51:33 UTC
The problem is in the OpenXml4J module. It doesn't support white spaces in target URIs. 

Compare two targets:

  <Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/hyperlink" Target="#ThirdSheet!A1"/>
  <Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/hyperlink" Target="#'Another Sheet'!A1"/>

The first one is handled OK and survives across read-write. 
The second one issues a warning and causes troubles. 

The fix seems easy - we should percent-encode white spaces (and perhaps any other non-uri characters) when reading OPC packages and un-percent-encode them when saving. 

The fix is coming soon.

Yegor
Comment 6 Yegor Kozlov 2010-11-17 16:11:34 UTC
Fixed in r1036215. 

Yegor
Comment 7 Trey Hyde 2010-11-17 16:16:05 UTC
Fantastic, I'll give trunk a good run through in the next few days and eagerly await the fix to hit a stable build in the maven repository.