Tika
  1. Tika
  2. TIKA-795

[PATCH] NoSuchMethod - XSLFPowerPointExtractorDecorator.buildXHTML POI - XSLFSlide.getMasterSheet()

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: 1.1
    • Fix Version/s: None
    • Component/s: parser
    • Labels:

      Description

      POI-3.8-beta5-daily exposed bug after poi.revision 1198658. (POI bugzilla bug #52262 already opened for root cause).

      Bug was discovered using Daily builds of both TIKA and POI. Root cause of issue lies within POI due to an accidental change of the return type provided by XSLFSlide.getMasterSheet(). However, TIKA is affected by this bug by making use of this call with an unused variable.

      I've included a patch file which removes the instance of the unused variable. An example multi-embedded word document example used with a Tika based RecursiveMetadataParser is also included.

      java.lang.NoSuchMethodError: org.apache.poi.xslf.usermodel.XSLFSlide.getMasterSheet()Lorg/apache/poi/xslf/usermodel/XSLFSlideMaster;
      at org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator.buildXHTML(XSLFPowerPointExtractorDecorator.java:81)
      at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:110)
      at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:97)
      at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:69)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
      at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
      at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
      at com.eastportanalytics.services.textextract.TikaTextExtractionService$RecursiveMetadataParser.parse(TikaTextExtractionService.java:364)
      at org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72)
      at org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:109)
      at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedFile(AbstractOOXMLExtractor.java:228)
      at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedParts(AbstractOOXMLExtractor.java:148)
      at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:113)
      at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:97)
      at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:69)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
      at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
      at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
      at com.eastportanalytics.services.textextract.TikaTextExtractionService$RecursiveMetadataParser.parse(TikaTextExtractionService.java:364)

      1. Patch_795_XSLF.patch
        0.8 kB
        Jeremy Anderson
      2. testWORD_embeded.docx
        154 kB
        Jeremy Anderson

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          5d 7h 22m 1 Nick Burch 05/Dec/11 00:39
          Nick Burch made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Duplicate [ 3 ]
          Hide
          Nick Burch added a comment -

          Resolving as a duplicate of TIKA-700, as the change was deliberate and the patch on TIKA-700 covers this

          Show
          Nick Burch added a comment - Resolving as a duplicate of TIKA-700 , as the change was deliberate and the patch on TIKA-700 covers this
          Jeremy Anderson made changes -
          Link This issue blocks TIKA-700 [ TIKA-700 ]
          Hide
          Jeremy Anderson added a comment -

          Thanks nick. Yegor follwed up on this issue on POI's side and confirmed the changes done there were on purpose and that using the other method in the TiKA-700 patch is correct.

          Can you close this issue noting that Tika-700 resolves it?

          Thanks again.

          Show
          Jeremy Anderson added a comment - Thanks nick. Yegor follwed up on this issue on POI's side and confirmed the changes done there were on purpose and that using the other method in the TiKA-700 patch is correct. Can you close this issue noting that Tika-700 resolves it? Thanks again.
          Jeremy Anderson made changes -
          Comment [ Thanks Nick.

          Do you know if the modification by yegor in POI for the return type of getMasterSheet() from XSLFSlideMaster to XSLFSlideLayout was done on purpose for a reason or is it truly a bug? (I did miss that the getSlideMaster() also exposes the XSLFSlideMaster, thanks.)

          If you think that this change in POI was correct, then I'll go ahead and close my POI Bugzilla 52262 bug.

          Noting the return type for the classes overriding the XSLFSheet.getMasterSheet() method:

          CLASS RETURN TYPE
          XSLFNotes XSLFSheet
          XSLFNotesMaster XSLFSheet
          XSLFSlide XSLFSlideLayout
          XSLFSlideLayout XSLFSlideMaster
          XSLFSlideMaster XSLFSheet

          You can also probably go ahead and close this issue noting it is fixed by the Tika-700 patch. ]
          Jeremy Anderson made changes -
          Description POI-3.8-beta5-daily exposed bug after poi.revision 1190347. (POI bugzilla bug #52262 already opened for root cause).

          Bug was discovered using Daily builds of both TIKA and POI. Root cause of issue lies within POI due to an accidental change of the return type provided by XSLFSlide.getMasterSheet(). However, TIKA is affected by this bug by making use of this call with an unused variable.

          I've included a patch file which removes the instance of the unused variable. An example multi-embedded word document example used with a Tika based RecursiveMetadataParser is also included.

          java.lang.NoSuchMethodError: org.apache.poi.xslf.usermodel.XSLFSlide.getMasterSheet()Lorg/apache/poi/xslf/usermodel/XSLFSlideMaster;
          at org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator.buildXHTML(XSLFPowerPointExtractorDecorator.java:81)
          at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:110)
          at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:97)
          at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:69)
          at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
          at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
          at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
          at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
          at com.eastportanalytics.services.textextract.TikaTextExtractionService$RecursiveMetadataParser.parse(TikaTextExtractionService.java:364)
          at org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72)
          at org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:109)
          at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedFile(AbstractOOXMLExtractor.java:228)
          at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedParts(AbstractOOXMLExtractor.java:148)
          at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:113)
          at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:97)
          at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:69)
          at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
          at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
          at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
          at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
          at com.eastportanalytics.services.textextract.TikaTextExtractionService$RecursiveMetadataParser.parse(TikaTextExtractionService.java:364)


          POI-3.8-beta5-daily exposed bug after poi.revision 1198658. (POI bugzilla bug #52262 already opened for root cause).

          Bug was discovered using Daily builds of both TIKA and POI. Root cause of issue lies within POI due to an accidental change of the return type provided by XSLFSlide.getMasterSheet(). However, TIKA is affected by this bug by making use of this call with an unused variable.

          I've included a patch file which removes the instance of the unused variable. An example multi-embedded word document example used with a Tika based RecursiveMetadataParser is also included.

          java.lang.NoSuchMethodError: org.apache.poi.xslf.usermodel.XSLFSlide.getMasterSheet()Lorg/apache/poi/xslf/usermodel/XSLFSlideMaster;
          at org.apache.tika.parser.microsoft.ooxml.XSLFPowerPointExtractorDecorator.buildXHTML(XSLFPowerPointExtractorDecorator.java:81)
          at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:110)
          at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:97)
          at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:69)
          at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
          at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
          at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
          at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
          at com.eastportanalytics.services.textextract.TikaTextExtractionService$RecursiveMetadataParser.parse(TikaTextExtractionService.java:364)
          at org.apache.tika.parser.DelegatingParser.parse(DelegatingParser.java:72)
          at org.apache.tika.extractor.ParsingEmbeddedDocumentExtractor.parseEmbedded(ParsingEmbeddedDocumentExtractor.java:109)
          at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedFile(AbstractOOXMLExtractor.java:228)
          at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.handleEmbeddedParts(AbstractOOXMLExtractor.java:148)
          at org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:113)
          at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:97)
          at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:69)
          at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
          at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
          at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
          at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
          at com.eastportanalytics.services.textextract.TikaTextExtractionService$RecursiveMetadataParser.parse(TikaTextExtractionService.java:364)


          Hide
          Nick Burch added a comment -

          We are going to want this variable though, as it's needed for TIKA-712 (once we're able to re-enable that)

          I'd suggest you use the patch I uploaded to TIKA-700

          Show
          Nick Burch added a comment - We are going to want this variable though, as it's needed for TIKA-712 (once we're able to re-enable that) I'd suggest you use the patch I uploaded to TIKA-700
          Jeremy Anderson made changes -
          Field Original Value New Value
          Attachment Patch_795_XSLF.patch [ 12505501 ]
          Attachment testWORD_embeded.docx [ 12505502 ]
          Hide
          Jeremy Anderson added a comment -

          Patch to remove unused variable. Example multi-embedded word document used by RecursiveMetadataParser to expose issue.

          Show
          Jeremy Anderson added a comment - Patch to remove unused variable. Example multi-embedded word document used by RecursiveMetadataParser to expose issue.
          Jeremy Anderson created issue -

            People

            • Assignee:
              Unassigned
              Reporter:
              Jeremy Anderson
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development