I keep getting this exception in an application which parses excel files sent to us. It's because the file has one or more ColumnInfoRecord right after a row block. Like this: Offset=0x00008772(34674) recno=1873 sid=0x00FD size=0x000A(10) [LABELSST] .row = 0x00D2 .col = 0x0006 .xfindex= 0x002A .sstIndex = 0x001D [/LABELSST] Offset=0x00008780(34688) recno=1874 sid=0x0201 size=0x0006(6) [BLANK] row= 0x00D2 col= 0x0007 xf = 0x002A [/BLANK] Offset=0x0000878A(34698) recno=1875 sid=0x0201 size=0x0006(6) [BLANK] ... several empty cells in row 0xD3 [/BLANK] Offset=0x000087D0(34768) recno=1882 sid=0x007D size=0x000B(11) [COLINFO] colfirst = 0 collast = 0 colwidth = 10240 xfindex = 0 options = 0x0000 hidden = false olevel = 0 collapsed= false [/COLINFO] Several versions of Excel and LibreOffice 3.5.4.2 open these files without problem. I don't know how they are generated, but I wonder if the record order described in the OOO excelfileformat.pdf is strict or it is allowed for a ColumnInfoRecord to appear after a row block. Also I would like to know if the solution is as simple as adding ColumnInfoRecord to RecordOrderer.isEndOfRowBlock which I guess was the solution to a very similar bug (bug 50426) Thank you very much.
Can you attache file ? Yegor
Well, in fact, I can't right now, since I cannot share the contents of the files, and modifying and saving the files fixes the issues. But... I attached the relevant part of the BiffViewer output for one of the files And after debugging the execution I can explain what happens exactly. Having COLINFO record right after the row block means that ColumnInfoRecords are included in the row block and when processing them the RowRecordsAggregate(RecordStream rs, SharedValueManager svm) constructor throws an exception when checking the type of record: if (!(rec instanceof CellValueRecordInterface)) { // TRUE for ColumnInfoRecord throw new RuntimeException("Unexpected record type (" + rec.getClass().getName() + ")"); } Oops sorry, I just realized I didn't attach the exception stack, i was going to paste it after the bug title, but forgot to do it. Here it is: java.lang.RuntimeException: Unexpected record type (org.apache.poi.hssf.record.ColumnInfoRecord) at org.apache.poi.hssf.record.aggregates.RowRecordsAggregate.<init>(RowRecordsAggregate.java:107) ~[poi-3.8.jar:3.8] at org.apache.poi.hssf.model.InternalSheet.<init>(InternalSheet.java:208) ~[poi-3.8.jar:3.8] at org.apache.poi.hssf.model.InternalSheet.createSheet(InternalSheet.java:163) ~[poi-3.8.jar:3.8] at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:296) ~[poi-3.8.jar:3.8] at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:49) ~[poi-ooxml-3.8.jar:3.8] I have fixed this modifying the RecordOrderer.isEndOfRowBlock method, but don't know if that's the right thing to do: public static boolean isEndOfRowBlock(int sid) { switch(sid) { case ViewDefinitionRecord.sid: // should have been prefixed with DrawingRecord (0x00EC), but bug 46280 seems to allow this case DrawingRecord.sid: case DrawingSelectionRecord.sid: case ObjRecord.sid: case TextObjectRecord.sid: case GutsRecord.sid: // see Bug 50426 case ColumnInfoRecord.sid: // see Bug 53984 case WindowOneRecord.sid: // should really be part of workbook stream, but some apps seem to put this before WINDOW2 case WindowTwoRecord.sid: return true; case DVALRecord.sid: return true; case EOFRecord.sid: // WINDOW2 should always be present, so shouldn't have got this far throw new RuntimeException("Found EOFRecord before WindowTwoRecord was encountered"); } return PageSettingsBlock.isComponentRecord(sid); } Please, let me know if this info is enough or not. In the meantime I will try to get one of those files without any sensible information so I can attach it here. Thanks.
the fix looks sane but I'd rather not commit it without a unit test. BiffViewer dump is not enough, we need a file. Yegor
changing status to NEEDINFO until a test file is provided
Created attachment 29880 [details] Test file throwing the reported exception when openend
After all this time, at last I have a file that I can share with you. The problem is that even though you said the fix proposed look sane, it's not perfect, since I found another file which was opening fine that stopped working when I added the workaround suggested, so we have to find another solution for this. Let me know if I can be of any help with it.
Fixed in r1614884. I'm fairly sure that the file in question wasn't generated by Excel, as it does some very very odd things. We do now handle the ColumnInfo coming at the end not the start of the sheet, and we also warn + skip over sheets where the BOFRecord type isn't one we support (this file has a totally invalid one at the end)