Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2017

Tika Server Cannot handle large files; add option for metadata only

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • None
    • None
    • None
    • None

    Description

      Tika-Python uses Tika REST Server to parse both content & metadata. In this case, the CSV file was 600 MB in size. Tika REST Server runs out of Heap Space since it tries to parse Content also. There should an option to make a REST API call to Tika Server just to parse & return metadata.

      Jun 22, 2016 6:38:40 PM org.slf4j.impl.JCLLoggerAdapter warn
      WARNING: /rmeta/text
      java.lang.RuntimeException: org.apache.cxf.interceptor.Fault: Java heap space
              at org.apache.cxf.interceptor.AbstractFaultChainInitiatorObserver.onMessage(AbstractFaultChainInitiatorObserver.java:116)
              at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:371)
              at org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
              at org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:251)
              at org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:261)
              at org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:70)
              at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1088)
              at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1024)
              at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
              at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
              at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
              at org.eclipse.jetty.server.Server.handle(Server.java:370)
              at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
              at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:982)
              at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1043)
              at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:865)
              at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
              at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
              at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:696)
              at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:53)
              at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
              at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
              at java.lang.Thread.run(Thread.java:745)
      Caused by: org.apache.cxf.interceptor.Fault: Java heap space
              at org.apache.cxf.service.invoker.AbstractInvoker.createFault(AbstractInvoker.java:163)
              at org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:129)
              at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:200)
              at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:99)
              at org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:59)
              at org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:96)
              at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
              ... 21 more
      Caused by: java.lang.OutOfMemoryError: Java heap space
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            hmanjuna Harshavardhan Manjunatha
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: