Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-795

[PATCH] NoSuchMethod - XSLFPowerPointExtractorDecorator.buildXHTML POI - XSLFSlide.getMasterSheet()

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 1.1
    • Fix Version/s: None
    • Component/s: parser
    • Labels:

      Description

      POI-3.8-beta5-daily exposed bug after poi.revision 1198658. (POI bugzilla bug #52262 already opened for root cause).

      Bug was discovered using Daily builds of both TIKA and POI. Root cause of issue lies within POI due to an accidental change of the return type provided by XSLFSlide.getMasterSheet(). However, TIKA is affected by this bug by making use of this call with an unused variable.

      I've included a patch file which removes the instance of the unused variable. An example multi-embedded word document example used with a Tika based RecursiveMetadataParser is also included.

      java.lang.NoSuchMethodError: org.apache.poi.xslf.usermodel.XSLFSlide.getMasterSheet()Lorg/apache/poi/xslf/usermodel/XSLFSlideMaster;
      at org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator.buildXHTML(XSLFPowerPointExtractorDecorator.java:81)
      at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:110)
      at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:97)
      at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:69)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
      at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
      at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
      at com.eastportanalytics.services.textextract.TikaTextExtractionService$RecursiveMetadataParser.parse(TikaTextExtractionService.java:364)
      at org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72)
      at org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:109)
      at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedFile(AbstractOOXMLExtractor.java:228)
      at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedParts(AbstractOOXMLExtractor.java:148)
      at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:113)
      at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:97)
      at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:69)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
      at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
      at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
      at com.eastportanalytics.services.textextract.TikaTextExtractionService$RecursiveMetadataParser.parse(TikaTextExtractionService.java:364)

        Attachments

        1. Patch_795_XSLF.patch
          0.8 kB
          Jeremy Anderson
        2. testWORD_embeded.docx
          154 kB
          Jeremy Anderson

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                rpialum Jeremy Anderson
              • Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: