52385 – [REGRESSION] HPSF corrupts output when starting file has unsupported variant props

Bug 52385 - [REGRESSION] HPSF corrupts output when starting file has unsupported variant props

Summary: [REGRESSION] HPSF corrupts output when starting file has unsupported variant ...

Status:	RESOLVED FIXED

Alias:	None

Product:	POI
Classification:	Unclassified
Component:	HPSF (show other bugs)
Version:	3.13-dev
Hardware:	All All

Importance:	P2 critical (vote)
Target Milestone:	---
Assignee:	POI Developers List

URL:
Keywords:

Depends on:
Blocks:	52337 52538
	Show dependency tree

Reported:	2011-12-25 20:11 UTC by Yegor Kozlov
Modified:	2015-10-18 10:44 UTC (History)
CC List:	0 users

Attachments
Diagram of the HeadingPair/DocParts TypedProperty structures (38.11 KB, image/png) 2012-01-11 01:39 UTC, Niklas Rehfeld	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Yegor Kozlov 2011-12-25 20:11:36 UTC

[REGRESSION] HPSF 

It looks like we have a regression caused by recent changes in HPSF: an OLE2 file becomes unreadable after write if it contains a variant property of unsupported type. In my research the problematic variant types were 4126 and 4108. The log warninga are below:

HPSF does not yet support the variant type 4126 (unknown variant type, 000000000000101E).  
HPSF does not yet support the variant type 4108 (unknown variant type, 000000000000100C). 

I was working on some improvements in HSSF and noticed Excel coudn't open the output file. At first I thought it was my changes, but it turned out that even simple read-write results in unreadble output: 


  HSSFWorkbook wb = new HSSFWorkbook(new FileInputStream(inputFile));

  FileOutputStream os = new FileOutputStream(outputFile);
  wb.write(os);
  os.close();

Try the code above against the following files from our collection of test files and the output will be coruppted. 
  

12843-1.xls        34775.xls              45365.xls    ContinueRecordProblem.xls         OddStyleRecord.xls
13224.xls          37684-2.xls            45365-2.xls  ex42570-20305.xls                 RangePtg.xls
14460.xls          41139.xls              46137.xls    ex44921-21902.xls                 testNames.xls
24207.xls          42464-ExpPtg-bad.xls   47034.xls    ex45978-extraLinkTableSheets.xls  XRefCalc.xls
27852.xls          42464-ExpPtg-ok.xls    47847.xls    ex46548-23133.xls                 XRefCalcData.xls
29982.xls          42844.xls              48026.xls    IndexFunctionTestCaseData.xls
30978-deleted.xls  44010-SingleChart.xls  49185.xls    IrrNpvTestCaseData.xls
32822.xls          44010-TwoCharts.xls    50939.xls    MRExtraLines.xls

Excel 2010 shows a warning when opening such files.  

The problem seems to be reelated to OLE properties and HPSF. If I comment the line 1218 in HSSFWorkbook then all is fine and Excel is happy to open the output files:

        // Write out our HPFS properties, if we have them
        writeProperties(fs, excepts);

This is a must for 3.8-final. 

Yegor

Comment 1 Niklas Rehfeld 2012-01-03 21:47:30 UTC

I think this is related to (or rather, causes) bug #52337, as the returned structure should be of type VT_VECTOR | VT_VARIANT (0x100C). 

So it seems to me that the problem is in the code that reads the property sets, rather than the writing. 

Nik

Comment 2 Niklas Rehfeld 2012-01-05 02:55:53 UTC

I had a look around the code, the bug seems to be in 

TypedPropertyValue.read(byte[], int)

in the fact that it automatically pads the result, i.e. returns a 'padded' offset. This is bad when reading the Heading Pairs vector (and possibly others) in the DocumentSummaryInformation stream, as they use *unpadded* strings of the type UnalignedLpstr (http://msdn.microsoft.com/en-us/library/dd950621%28v=office.12%29.aspx).
I hope that this is the same bug, and not completely unrelated. 

Nik

Comment 3 Niklas Rehfeld 2012-01-11 01:39:30 UTC

Created attachment 28134 [details]
Diagram of the HeadingPair/DocParts TypedProperty structures

Just thought this might be useful for this bug, it shows some of the structure of the docparts and headingpair properties, which as far as I have been able to find, are the only ones that use unaligned strings in property sets. 

All the info comes straight from MS-OSHARED (and maybe a little bit from MS-OLEPS)

Ignore the green stuff on the left, that was from a project that I'm working on. 

Nik

Comment 4 Yegor Kozlov 2012-02-15 07:53:08 UTC

Your hypothesis seems to be correct. I changed TypedPropertyValue.read(byte[], int) to return the unpadded offset and it fixed the problem. 

The fix has been committed in 1244388

Regards,
Yegor

(In reply to comment #2)
> I had a look around the code, the bug seems to be in 
> 
> TypedPropertyValue.read(byte[], int)
> 
> in the fact that it automatically pads the result, i.e. returns a 'padded'
> offset. This is bad when reading the Heading Pairs vector (and possibly others)
> in the DocumentSummaryInformation stream, as they use *unpadded* strings of the
> type UnalignedLpstr
> (http://msdn.microsoft.com/en-us/library/dd950621%28v=office.12%29.aspx).
> I hope that this is the same bug, and not completely unrelated. 
> 
> Nik