Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3086

not able to parse XLS file

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.7, 1.17
    • None
    • tika-batch
    • None

    Description

      Hi Team,

      We are using tika to parse diffrent kind of files but some XLS we are getting below exception. Presently we are using tika-app-1.7.jar and we have tried tika-app-1.17 and 1.20 but still we are getting same exception. Please help us on this

      org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@15aab8c6org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@15aab8c6 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at TextExtraction.extract(TextExtraction.java:50) at TextExtraction.main(TextExtraction.java:68)Caused by: org.apache.poi.hssf.record.RecordInputStream$LeftoverDataException: Initialisation of record 0x85(BoundSheetRecord) left 1 bytes remaining still to be read. at org.apache.poi.hssf.record.RecordInputStream.hasNextRecord(RecordInputStream.java:177) at org.apache.poi.hssf.extractor.OldExcelExtractor.getText(OldExcelExtractor.java:239) at org.apache.tika.parser.microsoft.OldExcelParser.parse(OldExcelParser.java:57) at org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:156) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:183) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ... 4 more

       

      code we have written

      iStream = new FileInputStream(new File(fname)); iStream = new FileInputStream(new File(fname)); mData = new Metadata(); cHandler = new BodyContentHandler(-1); adp = new AutoDetectParser();//AutoDetectParser()OldExcelParser;                        System.out.println();                                                ParseContext parseContext = new ParseContext();
                  parseContext.set(Parser.class, adp);            System.out.println(iStream+"  "cHandler" "mData""+parseContext);            System.out.println("Extracting ......\nPls wait..............\n");            adp.parse(iStream, cHandler, mData, parseContext);            

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            saikuladeep kuladeep
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: