Impossible to create an HSSFWorkbook from an excel file. There is a StringIndexOutOfBoundsException in POIDocument.readProperties(). It worked with POI 3.0.1. Here is the full stack trace: Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 541934449 at java.lang.String.checkBounds(String.java:372) at java.lang.String.<init>(String.java:404) at org.apache.poi.hpsf.Property.readDictionary(Property.java:257) at org.apache.poi.hpsf.Property.<init>(Property.java:153) at org.apache.poi.hpsf.Section.<init>(Section.java:291) at org.apache.poi.hpsf.PropertySet.init(PropertySet.java:454) at org.apache.poi.hpsf.PropertySet.<init>(PropertySet.java:249) at org.apache.poi.hpsf.PropertySetFactory.create(PropertySetFactory.java:61) at org.apache.poi.POIDocument.getPropertySet(POIDocument.java:97) at org.apache.poi.POIDocument.readProperties(POIDocument.java:77) at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:171) at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:148) at Test.<init>(Test.java:18) at Test.main(Test.java:38)
Created attachment 21493 [details] xls file not readable with POI HSSF 3.0.2 (ok with 3.0.1) To reproduce, simply try: POIFSFileSystem fs=new POIFSFileSystem(new FileInputStream("C:/temp/test.xls")); new HSSFWorkbook(fs); // this line throws a StringIndexOutOfBoundsException
Hmm, no changes to org.apache.poi.hpsf.Property have been made since 2006, so it's not anything obvious there
I don't know if your document has a corrupt SummaryInformation stream, or if there's a bug in the SummaryInformation stream parsing. I've added a disabled failing testcase for it to svn trunk, which can be a start for someone to take a look at why the SummaryInformation isn't working. (3.0.1 didn't do document metadata by default, but 3.0.2 does)
It seems that the method: org.apache.poi.hpsf.Property.readDictionary(byte[], long, int, int) is not exercised by any of the existing junits. When comparing the execution flow of this bug with the successful test cases, divergence can be seen at line 151 of the constructor - org.apache.poi.hpsf.Property.Property(long, byte[], long, int, int) For the sample spreadsheet, the Property constructor is invoked successfully 19 times before this.id==0 and readDictionary() gets invoked.
The properties are broken. Neither the Windows XP Explorer nor Excel are able to show them. But at least they don't fail. I am going to implement the same behaviour in HPSF.
Fixed with revision 619765. HPSF now copes with a broken dictionary in Document Summary Information streams. RuntimeExceptions that occured when trying to read bogus data are now caught. Dictionary entries up to but not including the bogus one are preserved, the rest is ignored.