Bug 54233 - When attached testing code is executed against the attached document, it generates the exception here under.
Summary: When attached testing code is executed against the attached document, it gene...
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: HPSF (show other bugs)
Version: unspecified
Hardware: All All
: P2 critical (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-12-03 09:49 UTC by Philippe Dubois
Modified: 2013-06-25 23:56 UTC (History)
1 user (show)



Attachments
Java test to generate exception (2.09 KB, text/plain)
2012-12-03 09:49 UTC, Philippe Dubois
Details
Document used to generate the exception (11.50 KB, application/msword)
2012-12-03 09:51 UTC, Philippe Dubois
Details
Proposed patch (1.90 KB, patch)
2012-12-03 09:58 UTC, Philippe Dubois
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Philippe Dubois 2012-12-03 09:49:10 UTC
Created attachment 29666 [details]
Java test to generate exception

When attached testing example is executed against the attached document, it generates the exception:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
	at java.lang.System.arraycopy(Native Method)
	at org.apache.poi.util.LittleEndian.getByteArray(LittleEndian.java:72)
	at org.apache.poi.hpsf.UnicodeString.<init>(UnicodeString.java:44)
	at org.apache.poi.hpsf.TypedPropertyValue.readValue(TypedPropertyValue.java:162)
	at org.apache.poi.hpsf.Vector.read(Vector.java:74)
	at org.apache.poi.hpsf.TypedPropertyValue.readValue(TypedPropertyValue.java:218)
	at org.apache.poi.hpsf.VariantSupport.read(VariantSupport.java:163)
	at org.apache.poi.hpsf.Property.<init>(Property.java:164)
	at org.apache.poi.hpsf.Section.<init>(Section.java:277)
	at org.apache.poi.hpsf.PropertySet.init(PropertySet.java:451)
	at org.apache.poi.hpsf.PropertySet.<init>(PropertySet.java:246)
	at org.alfresco.sample.TestPoi.main(TestPoi.java:46)

Information:
Test document was generated usin APSOSE: http://www.aspose.com/
An anlisys of the document content and format can be found here: https://issues.alfresco.com/jira/browse/ALF-16896

The questions are?
"It appears that the length is little endian but in this file it always starts on a 4 byte boundary. I don't know if that is what should happen or if this is an error in the file. However as a result I have been able to work out a patch (UnicodeString.java.patch attached) which when applied to our POI works for both this file and existing files."
Comment 1 Philippe Dubois 2012-12-03 09:51:57 UTC
Created attachment 29667 [details]
Document used to generate the exception

Document used to generate the exception. Generate using ASPOSE
Comment 2 Philippe Dubois 2012-12-03 09:58:06 UTC
Created attachment 29668 [details]
Proposed patch
Comment 3 Alan Davis 2012-12-03 10:15:15 UTC
The attached UnicodeString.java.patch allows POI to recover from the type of error found in the file generated by http://www.aspose.com The file specifies an offset to a UnicodeString parameter, which is out by 2 bytes. The real offset starts on a 4 byte boundary.

The patch works by checking the offsets provided to make sure the UnicodeString appears valid. The original code checked the UnicodeString ends in a NULL character, AFTER it had copied the string into a new byte[]. The patch does this check BEFORE the copy avoiding the creation of a very large byte[] followed by an ArrayIndexOutOfBoundsException. As a result it is able to also check if changing the offset to a 4 byte boundary would solve the problem.
Comment 4 Yegor Kozlov 2012-12-03 13:46:19 UTC
It needs some work. At least one unit test started to fail after I applied your patch:


org.apache.poi.hpsf.IllegalPropertySetDataException: UnicodeString started at offset #68 is not NULL-terminated
	at org.apache.poi.hpsf.UnicodeString.<init>(UnicodeString.java:48)
	at org.apache.poi.hpsf.TypedPropertyValue.readValue(TypedPropertyValue.java:162)
	at org.apache.poi.hpsf.VariantSupport.read(VariantSupport.java:166)
	at org.apache.poi.hpsf.Property.<init>(Property.java:164)
	at org.apache.poi.hpsf.Section.<init>(Section.java:277)
	at org.apache.poi.hpsf.PropertySet.init(PropertySet.java:451)
	at org.apache.poi.hpsf.PropertySet.<init>(PropertySet.java:246)
	at org.apache.poi.hpsf.PropertySetFactory.create(PropertySetFactory.java:59)
	at org.apache.poi.POIDocument.getPropertySet(POIDocument.java:165)
	at org.apache.poi.POIDocument.readProperties(POIDocument.java:126)
	at org.apache.poi.POIDocument.getSummaryInformation(POIDocument.java:93)
	at org.apache.poi.TestPOIDocumentMain.testCreateNewPropertiesOnExistingFile(TestPOIDocumentMain.java:161)

Please run the "test" ant target and ensure it completes OK.

Yegor
Comment 5 Nick Burch 2012-12-27 02:30:17 UTC
Alan and/or Philippe - any luck on a version of the patch that doesn't break the unit tests?
Comment 6 Nick Burch 2013-06-25 23:56:21 UTC
I've had a go at fixing this in r1496675. All the POI tests pass with my fix, and the code is hopefully a little easier to follow than in the original patch. I've added a unit test based on the sample file supplied, which shows we can now read the metadata without error

This has just missed out on being in poi 3.10 beta 1 though, so I guess we're stuck with a patched copy of POI in Alfresco for a little bit longer :/ If it helps, I can raise a new Alfresco support ticket, and/or buy a round in the Bear...!