44857 – Problem parsing Escher records, OutOfMemoryError from UnknownEscherRecord.fillFields

Bug 44857 - Problem parsing Escher records, OutOfMemoryError from UnknownEscherRecord.fillFields

Summary: Problem parsing Escher records, OutOfMemoryError from UnknownEscherRecord.fil...

Status:	RESOLVED FIXED

Alias:	None

Product:	POI
Classification:	Unclassified
Component:	HSSF (show other bugs)
Version:	3.0-FINAL
Hardware:	PC Windows Vista

Importance:	P2 major (vote)
Target Milestone:	---
Assignee:	POI Developers List

URL:
Keywords:

Depends on:
Blocks:

Reported:	2008-04-23 00:19 UTC by Trejkaz (pen name)
Modified:	2008-04-27 10:59 UTC (History)
CC List:	0 users

Attachments
container.dat (32.84 KB, application/octet-stream) 2008-04-23 00:19 UTC, Trejkaz (pen name)	Details
proposed fix, but probably dodgy (784 bytes, patch) 2008-04-23 00:27 UTC, Trejkaz (pen name)	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Trejkaz (pen name) 2008-04-23 00:19:04 UTC

There is a particular test Excel file which we have unit tests for, which isn't working in POI 3.0.1 or 3.0.2 (although it's working in our custom 3.0.1 branch, I can't figure out why.)

The file itself is complicated but I have managed to 
Simple test exhibiting the problem:

    public void testEscher() throws Exception
    {
        byte[] data = FileUtils.readFileToByteArray(new File("D:\\temp\\container.dat"));
        EscherContainerRecord record = new EscherContainerRecord();
        record.fillFields(data, 0, new DefaultEscherRecordFactory());
    }

This throws:

java.lang.OutOfMemoryError: Java heap space
	at org.apache.poi.ddf.UnknownEscherRecord.fillFields(UnknownEscherRecord.java:76)
	at org.apache.poi.ddf.EscherContainerRecord.fillFields(EscherContainerRecord.java:56)
	at org.apache.poi.ddf.EscherContainerRecord.fillFields(EscherContainerRecord.java:56)


I tracked it down to an EscherMetafileBlip underneath EscherBSERecord.  EscherBSERecord is assuming that getRecordSize() will be consistent with its own bytesRemaining value and this is not the case -- there are supposed to be 1125 bytes after the header but field_5_cbSave is only 163.

But from there I can't say whether it's a trivial fix or not.  The code in our real unit test asserts an MD5 for the uncompressed metafile -- if I rewrite EscherMetafileBlip to read the whole thing then it avoids the exception but the MD5 still fails.  Problem is, I don't know whether the MD5 was wrong the whole time, due to some other obscure bug.

Someone who knows more about EscherMetafileBlip would probably be able to say whether the simple and obvious fix is applicable here.

Comment 1 Trejkaz (pen name) 2008-04-23 00:19:53 UTC

Created attachment 21846 [details]
container.dat

Here's the container record by itself, should be better for testing as the real thing contains quite a bit more rubbish...

Comment 2 Trejkaz (pen name) 2008-04-23 00:27:17 UTC

Created attachment 21847 [details]
proposed fix, but probably dodgy

Attaching proposed fix.  Results in consistency, but like I said my MD5 from before is different. :-/

It could be that my previous test was wrong.

But why would it declare the stored size as only 10% of the available space?  It's almost as if the thing isn't actually compressed and yet the size field is still recording the compressed size, but that would be ludicrous.  Maybe all the extra space is simply padding?

And the real mystery, how could our local branch of 3.0.1 work, when 3.0.1 itself does not, and yet this file has not changed?  Did the record reading code previously use the bytesRemaining return value instead of getRecordSize()?

Comment 3 Nick Burch 2008-04-27 10:59:24 UTC

Thanks for the patch, file and testcase, patch applied to trunk

In terms of what has changed, "svn log" and "svn blame" are probably your friends here :)