Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.7, 1.17
-
None
-
None
Description
Hi Team,
We are using tika to parse diffrent kind of files but some XLS we are getting below exception. Presently we are using tika-app-1.7.jar and we have tried tika-app-1.17 and 1.20 but still we are getting same exception. Please help us on this
org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@15aab8c6org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@15aab8c6 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at TextExtraction.extract(TextExtraction.java:50) at TextExtraction.main(TextExtraction.java:68)Caused by: org.apache.poi.hssf.record.RecordInputStream$LeftoverDataException: Initialisation of record 0x85(BoundSheetRecord) left 1 bytes remaining still to be read. at org.apache.poi.hssf.record.RecordInputStream.hasNextRecord(RecordInputStream.java:177) at org.apache.poi.hssf.extractor.OldExcelExtractor.getText(OldExcelExtractor.java:239) at org.apache.tika.parser.microsoft.OldExcelParser.parse(OldExcelParser.java:57) at org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:156) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:183) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ... 4 more
code we have written
iStream = new FileInputStream(new File(fname)); iStream = new FileInputStream(new File(fname)); mData = new Metadata(); cHandler = new BodyContentHandler(-1); adp = new AutoDetectParser();//AutoDetectParser()OldExcelParser; System.out.println(); ParseContext parseContext = new ParseContext();
parseContext.set(Parser.class, adp); System.out.println(iStream+" "cHandler" "mData""+parseContext); System.out.println("Extracting ......\nPls wait..............\n"); adp.parse(iStream, cHandler, mData, parseContext);