Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2207

ArrayIndexOutOfBoundsException on a valid Excel file

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.14
    • 1.15, 2.0.0
    • parser
    • None
    • Windows 7 x64, JVM 1.8.0_101

    Description

      The attached file, which opens in Excel, errors out in Tika:

      java.lang.ArrayIndexOutOfBoundsException: 32
      at org.apache.commons.compress.compressors.lzw.LZWInputStream.initializeTables:126
      at org.apache.commons.compress.compressors.z.ZCompressorInputStream.<init>:54
      at org.apache.commons.compress.compressors.CompressorStreamFactory.createCompressorInputStream:237
      at org.apache.tika.parser.pkg.ZipContainerDetector.detectCompressorFormat:109
      at org.apache.tika.parser.pkg.ZipContainerDetector.detect:95
      at org.apache.tika.detect.CompositeDetector.detect:77
      at org.apache.tika.parser.AutoDetectParser.parse:112
      at org.apache.tika.parser.DelegatingParser.parse:72
      at org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded:102
      at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedOLE:245
      at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedParts:197
      at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML:115
      at org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.getXHTML:105
      at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse:112
      at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse:87

      Attachments

        1. Merck 9333 MPS 9-22-16.xlsx
          2.97 MB
          Seva Alekseyev

        Activity

          People

            Unassigned Unassigned
            sevaa Seva Alekseyev
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: