Description
Hi,
I am getting an ArrayIndexOutOfBoundsException from POI.
Possibly related to TIKA-577.
$ java -jar tika-app-1.0-SNAPSHOT.jar http://www.arb.ca.gov/msprog/smogcheck/july00/iiif.doc Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@44aea710 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:199) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135) at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:107) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:302) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:91) Caused by: java.lang.ArrayIndexOutOfBoundsException: 610125 at org.apache.poi.util.LittleEndian.getShort(LittleEndian.java:45) at org.apache.poi.ddf.EscherRecord$EscherRecordHeader.readHeader(EscherRecord.java:250) at org.apache.poi.ddf.DefaultEscherRecordFactory.createRecord(DefaultEscherRecordFactory.java:56) at org.apache.poi.hwpf.model.PicturesTable.searchForPictures(PicturesTable.java:169) at org.apache.poi.hwpf.model.PicturesTable.searchForPictures(PicturesTable.java:180) at org.apache.poi.hwpf.model.PicturesTable.searchForPictures(PicturesTable.java:180) at org.apache.poi.hwpf.model.PicturesTable.getAllPictures(PicturesTable.java:207) at org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.<init>(WordExtractor.java:430) at org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.<init>(WordExtractor.java:420) at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:75) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:182) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197) ... 5 more
Thank you!