[REGRESSION] HPSF It looks like we have a regression caused by recent changes in HPSF: an OLE2 file becomes unreadable after write if it contains a variant property of unsupported type. In my research the problematic variant types were 4126 and 4108. The log warninga are below: HPSF does not yet support the variant type 4126 (unknown variant type, 000000000000101E). HPSF does not yet support the variant type 4108 (unknown variant type, 000000000000100C). I was working on some improvements in HSSF and noticed Excel coudn't open the output file. At first I thought it was my changes, but it turned out that even simple read-write results in unreadble output: HSSFWorkbook wb = new HSSFWorkbook(new FileInputStream(inputFile)); FileOutputStream os = new FileOutputStream(outputFile); wb.write(os); os.close(); Try the code above against the following files from our collection of test files and the output will be coruppted. 12843-1.xls 34775.xls 45365.xls ContinueRecordProblem.xls OddStyleRecord.xls 13224.xls 37684-2.xls 45365-2.xls ex42570-20305.xls RangePtg.xls 14460.xls 41139.xls 46137.xls ex44921-21902.xls testNames.xls 24207.xls 42464-ExpPtg-bad.xls 47034.xls ex45978-extraLinkTableSheets.xls XRefCalc.xls 27852.xls 42464-ExpPtg-ok.xls 47847.xls ex46548-23133.xls XRefCalcData.xls 29982.xls 42844.xls 48026.xls IndexFunctionTestCaseData.xls 30978-deleted.xls 44010-SingleChart.xls 49185.xls IrrNpvTestCaseData.xls 32822.xls 44010-TwoCharts.xls 50939.xls MRExtraLines.xls Excel 2010 shows a warning when opening such files. The problem seems to be reelated to OLE properties and HPSF. If I comment the line 1218 in HSSFWorkbook then all is fine and Excel is happy to open the output files: // Write out our HPFS properties, if we have them writeProperties(fs, excepts); This is a must for 3.8-final. Yegor
I think this is related to (or rather, causes) bug #52337, as the returned structure should be of type VT_VECTOR | VT_VARIANT (0x100C). So it seems to me that the problem is in the code that reads the property sets, rather than the writing. Nik
I had a look around the code, the bug seems to be in TypedPropertyValue.read(byte[], int) in the fact that it automatically pads the result, i.e. returns a 'padded' offset. This is bad when reading the Heading Pairs vector (and possibly others) in the DocumentSummaryInformation stream, as they use *unpadded* strings of the type UnalignedLpstr (http://msdn.microsoft.com/en-us/library/dd950621%28v=office.12%29.aspx). I hope that this is the same bug, and not completely unrelated. Nik
Created attachment 28134 [details] Diagram of the HeadingPair/DocParts TypedProperty structures Just thought this might be useful for this bug, it shows some of the structure of the docparts and headingpair properties, which as far as I have been able to find, are the only ones that use unaligned strings in property sets. All the info comes straight from MS-OSHARED (and maybe a little bit from MS-OLEPS) Ignore the green stuff on the left, that was from a project that I'm working on. Nik
Your hypothesis seems to be correct. I changed TypedPropertyValue.read(byte[], int) to return the unpadded offset and it fixed the problem. The fix has been committed in 1244388 Regards, Yegor (In reply to comment #2) > I had a look around the code, the bug seems to be in > > TypedPropertyValue.read(byte[], int) > > in the fact that it automatically pads the result, i.e. returns a 'padded' > offset. This is bad when reading the Heading Pairs vector (and possibly others) > in the DocumentSummaryInformation stream, as they use *unpadded* strings of the > type UnalignedLpstr > (http://msdn.microsoft.com/en-us/library/dd950621%28v=office.12%29.aspx). > I hope that this is the same bug, and not completely unrelated. > > Nik