Tika
  1. Tika
  2. TIKA-792

NoSuchMethodException "CTMarkupImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)" processing a OOXML document

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.0
    • Fix Version/s: 1.2
    • Component/s: parser
    • Labels:
      None
    • Environment:

      Linux, JDK 1.6, Jetty 8.x, Tomcat 6.x

      Description

      Parsing some OOXML documents, this stacktrace is logged many times:

      java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTMarkupImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)
      at java.lang.Class.getConstructor0(Class.java:2723)
      at java.lang.Class.getDeclaredConstructor(Class.java:2002)
      at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1749)
      at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1886)
      at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1875)
      at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1021)
      at org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:893)
      at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1657)
      at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2654)
      at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2647)
      at org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)
      at org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)
      at org.apache.poi.xwpf.usermodel.XWPFParagraph.<init>(XWPFParagraph.java:83)
      at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:145)
      at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:159)
      at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:115)
      at org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:53)
      at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:180)
      at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:63)
      at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:69)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
      at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)

      Looking at the poi code java is right here, there is no constructor with a SchemaType and a boolean, only with SchemaType.
      My guess is this one was missed during upgrade to poi beta4, but only a guess, anyway needs a fix .

      1. test10.docx
        17 kB
        Tim Allison

        Activity

        Torsten Krah created issue -
        Hide
        Nick Burch added a comment -

        Your quick fix is to replace the poi-ooxml-schemas jar with the full ooxml-schemas-1.1 jar - the former is an excerpt of just the "common" parts

        For a full fix, we need to add a unit test to POI the uses the same method. To decide what's common, the POI build script looks at what is used in the unit tests, so adding a test that uses a method will cause the appropriate parts to be included in the next poi-ooxml-schemas file

        Show
        Nick Burch added a comment - Your quick fix is to replace the poi-ooxml-schemas jar with the full ooxml-schemas-1.1 jar - the former is an excerpt of just the "common" parts For a full fix, we need to add a unit test to POI the uses the same method. To decide what's common, the POI build script looks at what is used in the unit tests, so adding a test that uses a method will cause the appropriate parts to be included in the next poi-ooxml-schemas file
        Hide
        Nick Burch added a comment -

        Are you able to share one of the files that triggers this? The easiest way to add the unit test to POI that would have the classes included is with a file that triggers the problem

        Show
        Nick Burch added a comment - Are you able to share one of the files that triggers this? The easiest way to add the unit test to POI that would have the classes included is with a file that triggers the problem
        Hide
        Torsten Krah added a comment -

        I'll try to find some document(s), may take a few days.

        Show
        Torsten Krah added a comment - I'll try to find some document(s), may take a few days.
        Hide
        Marek Slama added a comment -

        We have the same problem. But this time it is Jackrabbit background indexing task so I cannot easily say which file causes this. We put our files into Jackrabbit repository. I will try to replace schemas files as suggested but I have to do it also in Jackabbit files.

        Show
        Marek Slama added a comment - We have the same problem. But this time it is Jackrabbit background indexing task so I cannot easily say which file causes this. We put our files into Jackrabbit repository. I will try to replace schemas files as suggested but I have to do it also in Jackabbit files.
        Hide
        Marek Slama added a comment -

        I do not see this problem now as we upgraded to Jackrabbit 2.4.0 which uses POI 3.8-beta4.

        Show
        Marek Slama added a comment - I do not see this problem now as we upgraded to Jackrabbit 2.4.0 which uses POI 3.8-beta4.
        Hide
        Nick Burch added a comment -

        Thanks for the feedback Marek. As of r1309005 we're now on POI 3.8 Final, so I'll mark this as fixed

        Show
        Nick Burch added a comment - Thanks for the feedback Marek. As of r1309005 we're now on POI 3.8 Final, so I'll mark this as fixed
        Nick Burch made changes -
        Field Original Value New Value
        Status Open [ 1 ] Resolved [ 5 ]
        Fix Version/s 1.2 [ 12320169 ]
        Resolution Fixed [ 1 ]
        Hide
        Eric Pascal added a comment -

        Problem still there for me in version 3.8 final of POI

        Show
        Eric Pascal added a comment - Problem still there for me in version 3.8 final of POI
        Hide
        Zhuravskiy Vitaliy added a comment -

        Bug still present with poi-ooxml-schemas-3.9.jar and tika-parsers-1.3.jar.
        Stack trace:
        java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTMarkupRangeImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)
        at java.lang.Class.getConstructor0(Class.java:2730)
        at java.lang.Class.getDeclaredConstructor(Class.java:2004)
        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1749)
        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1886)
        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1875)
        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1021)
        at org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:893)
        at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1657)
        at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2654)
        at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2647)
        at org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)
        at org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)
        at org.apache.poi.xwpf.usermodel.XWPFParagraph.buildRunsInOrderFromXml(XWPFParagraph.java:124)
        at org.apache.poi.xwpf.usermodel.XWPFParagraph.<init>(XWPFParagraph.java:79)
        at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:146)
        at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:159)
        at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:116)
        at org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:53)
        at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:180)
        at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:87)
        at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82)
        at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91)
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)

        Show
        Zhuravskiy Vitaliy added a comment - Bug still present with poi-ooxml-schemas-3.9.jar and tika-parsers-1.3.jar. Stack trace: java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTMarkupRangeImpl.<init>(org.apache.xmlbeans.SchemaType, boolean) at java.lang.Class.getConstructor0(Class.java:2730) at java.lang.Class.getDeclaredConstructor(Class.java:2004) at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1749) at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1886) at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1875) at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1021) at org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:893) at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1657) at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2654) at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2647) at org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995) at org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904) at org.apache.poi.xwpf.usermodel.XWPFParagraph.buildRunsInOrderFromXml(XWPFParagraph.java:124) at org.apache.poi.xwpf.usermodel.XWPFParagraph.<init>(XWPFParagraph.java:79) at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:146) at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:159) at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:116) at org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:53) at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:180) at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:87) at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82) at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:91) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
        Hide
        Cyrille Levandowski added a comment -

        Got same issue with tika 1.3 :

        java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTMarkupRangeImpl.<init>(org.apache.xmlbeans.SchemaType, boolean)
        at java.lang.Class.getConstructor0(Class.java:2730)
        at java.lang.Class.getDeclaredConstructor(Class.java:2004)
        at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1749)

        etc...

        Show
        Cyrille Levandowski added a comment - Got same issue with tika 1.3 : java.lang.NoSuchMethodException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTMarkupRangeImpl.<init>(org.apache.xmlbeans.SchemaType, boolean) at java.lang.Class.getConstructor0(Class.java:2730) at java.lang.Class.getDeclaredConstructor(Class.java:2004) at org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1749) etc...
        Hide
        Nick Burch added a comment -

        Do you have a small file that shows up this problem? Only all of our tests files in the current unit test suite pass without this issue...

        Show
        Nick Burch added a comment - Do you have a small file that shows up this problem? Only all of our tests files in the current unit test suite pass without this issue...
        Hide
        Tim Allison added a comment -

        Example document that triggers no such method exceptions for:
        CTMarkupRangeImpl, CTMarkupImpl and CTBookmarkRangeImpl

        Show
        Tim Allison added a comment - Example document that triggers no such method exceptions for: CTMarkupRangeImpl, CTMarkupImpl and CTBookmarkRangeImpl
        Tim Allison made changes -
        Attachment test10.docx [ 12596081 ]
        Hide
        Nick Burch added a comment -

        Tim - I'd suggest you add this test document to POI, then write a unit test that triggers the same code in POI from the test that Tika is doing. Unit test should pass in POI, but will then trigger the inclusion of these extra classes in the cut-down poi-ooxml-schemas class, which should then fix the issue in Tika

        Show
        Nick Burch added a comment - Tim - I'd suggest you add this test document to POI, then write a unit test that triggers the same code in POI from the test that Tika is doing. Unit test should pass in POI, but will then trigger the inclusion of these extra classes in the cut-down poi-ooxml-schemas class, which should then fix the issue in Tika
        Hide
        Tim Allison added a comment -

        Opened https://issues.apache.org/bugzilla/show_bug.cgi?id=55361 and uploaded patch for feedback. If there are no objections, I'll commit 55361 tomorrow.

        Show
        Tim Allison added a comment - Opened https://issues.apache.org/bugzilla/show_bug.cgi?id=55361 and uploaded patch for feedback. If there are no objections, I'll commit 55361 tomorrow.
        Hide
        Tim Allison added a comment -

        Committed in POI. Once POI3.9beta2 is released, I'll increment POI's version in Tika's build file and confirm that this is taken care of. There may be other sources of this than the one that my test document triggered.

        Show
        Tim Allison added a comment - Committed in POI. Once POI3.9beta2 is released, I'll increment POI's version in Tika's build file and confirm that this is taken care of. There may be other sources of this than the one that my test document triggered.
        Hide
        Tim Allison added a comment -

        This is now fixed by TIKA-1173.

        Can anyone recommend a more obvious test of the solution to this than kicking off a process to extract text from the document and capturing std.err? It would be nice to have something that we can generalize to other documents that trigger this issue because of a different set of missing beans.

        Show
        Tim Allison added a comment - This is now fixed by TIKA-1173 . Can anyone recommend a more obvious test of the solution to this than kicking off a process to extract text from the document and capturing std.err? It would be nice to have something that we can generalize to other documents that trigger this issue because of a different set of missing beans.
        Hide
        Tim Allison added a comment -

        added test that catches stderr.
        r1526570.
        reopening just to record this.

        Show
        Tim Allison added a comment - added test that catches stderr. r1526570. reopening just to record this.
        Tim Allison made changes -
        Resolution Fixed [ 1 ]
        Status Resolved [ 5 ] Reopened [ 4 ]
        Tim Allison made changes -
        Status Reopened [ 4 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Tim Allison made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Unassigned
            Reporter:
            Torsten Krah
          • Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development