Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.11
    • Component/s: cli, server
    • Labels:
      None

      Description

      Hey Sergey do you have any idea why CXF's 3.0.3 rt-client would work fine in tika-server, but fail in tika-app? I'm seeing that with the GROBID parser. See:

      https://issues.apache.org/jira/browse/CXF-6545

      Try calling the GROBID parser from Tika app:

      java -classpath $HOME/git/grobidparser-resources/:target/tika-app-1.11-SNAPSHOT.jar org.apache.tika.cli.TikaCLI --config=$HOME/git/grobidparser-resources/tika-config.xml -J $HOME/git/grobid/papers/ICSE06.pdf

      After following this guide:

      https://wiki.apache.org/tika/GrobidJournalParser

      Works fine in Tika-Server - dies in Tika-app with:

      java.lang.NullPointerException
      	at org.apache.cxf.jaxrs.client.AbstractClient.setupOutInterceptorChain(AbstractClient.java:849)
      	at org.apache.cxf.jaxrs.client.AbstractClient.createMessage(AbstractClient.java:923)
      	at org.apache.cxf.jaxrs.client.WebClient.finalizeMessage(WebClient.java:1125)
      	at org.apache.cxf.jaxrs.client.WebClient.doChainedInvocation(WebClient.java:1098)
      	at org.apache.cxf.jaxrs.client.WebClient.doInvoke(WebClient.java:894)
      	at org.apache.cxf.jaxrs.client.WebClient.doInvoke(WebClient.java:865)
      	at org.apache.cxf.jaxrs.client.WebClient.invoke(WebClient.java:331)
      	at org.apache.cxf.jaxrs.client.WebClient.post(WebClient.java:340)
      	at org.apache.tika.parser.journal.GrobidRESTParser.parse(GrobidRESTParser.java:82)
      	at org.apache.tika.parser.journal.JournalParser.parse(JournalParser.java:67)
      	at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:177)
      	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
      	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
      	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
      	at org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:158)
      	at org.apache.tika.cli.TikaCLI.handleRecursiveJson(TikaCLI.java:504)
      	at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:484)
      	at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:139)
      java.lang.NullPointerException
      	at org.apache.tika.parser.journal.GrobidRESTParser.parse(GrobidRESTParser.java:89)
      	at org.apache.tika.parser.journal.JournalParser.parse(JournalParser.java:67)
      	at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:177)
      	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
      	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
      	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
      	at org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:158)
      	at org.apache.tika.cli.TikaCLI.handleRecursiveJson(TikaCLI.java:504)
      	at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:484)
      	at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:139)
      

        Issue Links

          Activity

          Hide
          sergey_beryozkin Sergey Beryozkin added a comment -

          Hi Chris, all, sorry I did miss this issue completely, I might've thought it was related to the actual parsing only.

          Can you please give me a favour and attach a sample GROBID resource plus whatever is needed in a grobidparser-resources folder to have the server starting and the issue reproduced ?

          Thanks

          Show
          sergey_beryozkin Sergey Beryozkin added a comment - Hi Chris, all, sorry I did miss this issue completely, I might've thought it was related to the actual parsing only. Can you please give me a favour and attach a sample GROBID resource plus whatever is needed in a grobidparser-resources folder to have the server starting and the issue reproduced ? Thanks
          Hide
          sergey_beryozkin Sergey Beryozkin added a comment -

          Never mind, steps are described

          Show
          sergey_beryozkin Sergey Beryozkin added a comment - Never mind, steps are described
          Hide
          sergey_beryozkin Sergey Beryozkin added a comment -

          Hi Chris

          The problem is META-INF/cxf/bus-extensions.txt in tika-app.jar is incomplete, it only contains what is available inside cxf-rt-transports-http, but has no content available in cxf-core/bus-extensions.txt, after updating the file in tika-app manually I got the GROBID example working.

          The solution is to have the content of all META-INF/cxf/bus-extensions.txt files available in tika-app, in a single file.
          Not sure how this can be realized though
          Sergey

          Show
          sergey_beryozkin Sergey Beryozkin added a comment - Hi Chris The problem is META-INF/cxf/bus-extensions.txt in tika-app.jar is incomplete, it only contains what is available inside cxf-rt-transports-http, but has no content available in cxf-core/bus-extensions.txt, after updating the file in tika-app manually I got the GROBID example working. The solution is to have the content of all META-INF/cxf/bus-extensions.txt files available in tika-app, in a single file. Not sure how this can be realized though Sergey
          Hide
          sergey_beryozkin Sergey Beryozkin added a comment - - edited

          Add the following to bus-extensions.txt in your local tika-app.jar to validate it will fix it:

          org.apache.cxf.bus.managers.PhaseManagerImpl:org.apache.cxf.phase.PhaseManager:true
          org.apache.cxf.bus.managers.WorkQueueManagerImpl:org.apache.cxf.workqueue.WorkQueueManager:true
          org.apache.cxf.bus.managers.CXFBusLifeCycleManager:org.apache.cxf.buslifecycle.BusLifeCycleManager:true
          org.apache.cxf.bus.managers.ServerRegistryImpl:org.apache.cxf.endpoint.ServerRegistry:true
          org.apache.cxf.bus.managers.EndpointResolverRegistryImpl:org.apache.cxf.endpoint.EndpointResolverRegistry:true
          org.apache.cxf.bus.managers.HeaderManagerImpl:org.apache.cxf.headers.HeaderManager:true
          org.apache.cxf.service.factory.FactoryBeanListenerManager::true
          org.apache.cxf.bus.managers.ServerLifeCycleManagerImpl:org.apache.cxf.endpoint.ServerLifeCycleManager:true
          org.apache.cxf.bus.managers.ClientLifeCycleManagerImpl:org.apache.cxf.endpoint.ClientLifeCycleManager:true
          org.apache.cxf.bus.resource.ResourceManagerImpl:org.apache.cxf.resource.ResourceManager:true
          org.apache.cxf.catalog.OASISCatalogManager:org.apache.cxf.catalog.OASISCatalogManager:true

          The 1st line is probably enough...

          Show
          sergey_beryozkin Sergey Beryozkin added a comment - - edited Add the following to bus-extensions.txt in your local tika-app.jar to validate it will fix it: org.apache.cxf.bus.managers.PhaseManagerImpl:org.apache.cxf.phase.PhaseManager:true org.apache.cxf.bus.managers.WorkQueueManagerImpl:org.apache.cxf.workqueue.WorkQueueManager:true org.apache.cxf.bus.managers.CXFBusLifeCycleManager:org.apache.cxf.buslifecycle.BusLifeCycleManager:true org.apache.cxf.bus.managers.ServerRegistryImpl:org.apache.cxf.endpoint.ServerRegistry:true org.apache.cxf.bus.managers.EndpointResolverRegistryImpl:org.apache.cxf.endpoint.EndpointResolverRegistry:true org.apache.cxf.bus.managers.HeaderManagerImpl:org.apache.cxf.headers.HeaderManager:true org.apache.cxf.service.factory.FactoryBeanListenerManager::true org.apache.cxf.bus.managers.ServerLifeCycleManagerImpl:org.apache.cxf.endpoint.ServerLifeCycleManager:true org.apache.cxf.bus.managers.ClientLifeCycleManagerImpl:org.apache.cxf.endpoint.ClientLifeCycleManager:true org.apache.cxf.bus.resource.ResourceManagerImpl:org.apache.cxf.resource.ResourceManager:true org.apache.cxf.catalog.OASISCatalogManager:org.apache.cxf.catalog.OASISCatalogManager:true The 1st line is probably enough...
          Hide
          sergey_beryozkin Sergey Beryozkin added a comment -

          Hi Chris,
          Dan Kulp reminded me that back in CXF 2.7.x a maven-shade-plugin was used:

          https://fisheye6.atlassian.com/browse/cxf/osgi/bundle/all/pom.xml?r=12acd46e3dbe98fa1321374b09174d5876271f08#to448

          and it can help with collapsing files into a single one.
          I guess tika-app assembly needs to be updated accordingly, I've never tried working with the shade plugin myself

          thanks

          Show
          sergey_beryozkin Sergey Beryozkin added a comment - Hi Chris, Dan Kulp reminded me that back in CXF 2.7.x a maven-shade-plugin was used: https://fisheye6.atlassian.com/browse/cxf/osgi/bundle/all/pom.xml?r=12acd46e3dbe98fa1321374b09174d5876271f08#to448 and it can help with collapsing files into a single one. I guess tika-app assembly needs to be updated accordingly, I've never tried working with the shade plugin myself thanks
          Hide
          chrismattmann Chris A. Mattmann added a comment -

          Sergey Beryozkin and Daniel Kulp thanks MUCHO! That tipped me off. It was the way the shade plugin assembled tika-app as you suggested. I looked at:

          https://fisheye6.atlassian.com/browse/cxf/osgi/bundle/all/pom.xml?r=12acd46e3dbe98fa1321374b09174d5876271f08#to448

          And

          https://maven.apache.org/plugins/maven-shade-plugin/examples/resource-transformers.html#AppendingTransformer

          and now have I have a fix! I will commit shortly. Thank you two!

          Show
          chrismattmann Chris A. Mattmann added a comment - Sergey Beryozkin and Daniel Kulp thanks MUCHO! That tipped me off. It was the way the shade plugin assembled tika-app as you suggested. I looked at: https://fisheye6.atlassian.com/browse/cxf/osgi/bundle/all/pom.xml?r=12acd46e3dbe98fa1321374b09174d5876271f08#to448 And https://maven.apache.org/plugins/maven-shade-plugin/examples/resource-transformers.html#AppendingTransformer and now have I have a fix! I will commit shortly. Thank you two!
          Hide
          chrismattmann Chris A. Mattmann added a comment -
          • fixed in r1696311. Thanks guys!
          Show
          chrismattmann Chris A. Mattmann added a comment - fixed in r1696311. Thanks guys!
          Hide
          sergey_beryozkin Sergey Beryozkin added a comment -

          Great, thanks

          Show
          sergey_beryozkin Sergey Beryozkin added a comment - Great, thanks
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in tika-trunk-jdk1.7 #832 (See https://builds.apache.org/job/tika-trunk-jdk1.7/832/)
          Update changes for TIKA-1712 (mattmann: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1696312)

          • /tika/trunk/CHANGES.txt
          • /tika/trunk/tika-app/pom.xml
          Show
          hudson Hudson added a comment - FAILURE: Integrated in tika-trunk-jdk1.7 #832 (See https://builds.apache.org/job/tika-trunk-jdk1.7/832/ ) Update changes for TIKA-1712 (mattmann: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1696312 ) /tika/trunk/CHANGES.txt fix for TIKA-1712 : GROBID parser fails in tika-app thanks to Sergey Beryozkin and Daniel Kulp for the idea for the fix. (mattmann: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1696311 ) /tika/trunk/tika-app/pom.xml

            People

            • Assignee:
              chrismattmann Chris A. Mattmann
              Reporter:
              chrismattmann Chris A. Mattmann
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development