Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Not A Bug
-
1.20, 1.24.1
-
None
-
None
-
Tika is running on Windows 10 for my test machine, and Windows 2016 for the production machine. Reproducible on both. The Linux command line I used is just SLES on WSL, so it has no bearing here.
(having a problem attaching the file, Jira is giving me a 'missing token' error so I'll try again after creation of the Jira issue)
Tika is running on Windows 10 for my test machine, and Windows 2016 for the production machine. Reproducible on both. The Linux command line I used is just SLES on WSL, so it has no bearing here. (having a problem attaching the file, Jira is giving me a 'missing token' error so I'll try again after creation of the Jira issue)
Description
I've tried to parse the attached file (please first extract choke.txt from choke.zip to reproduce) using both 1.20 and 1.24.1. The file appears valid when I view it in my text editor and seems to simply be a tab-delimited table with a mix of Hebrew and Latin characters. In 1.20 I see an exception thrown, and in 1.24.1 I get JSON metadata back with no content.
My command line:
curl -X PUT --upload-file /tmp/choke.txt http://localhost:9998/rmeta/text
1.24.1 Result:
{{[
{"Content-Type":"application/octet-stream","X-Parsed-By":"org.apache.tika.parser.EmptyParser","X-TIKA:embedded_depth":"0","X-TIKA:parse_time_millis":"10"}]}}
1.20 Result:
INFO Starting Apache Tika 1.20 server
INFO Setting the server's publish address to be http://localhost:9998/
INFO Logging initialized @1704ms to org.eclipse.jetty.util.log.Slf4jLog
INFO jetty-9.4.z-SNAPSHOT; built: 2018-08-30T13:59:14.071Z; git: 27208684755d94a92186989f695db2d7b21ebc51; jvm 8.0.6.10 - pwa6480sr6fp10-20200408_01(SR6 FP10)
{{INFO Started ServerConnector@7b09f799
}}
INFO Started @2085ms
WARN Empty contextPath
{{INFO Started o.e.j.s.h.ContextHandler@-405fdc63
}}
INFO Started Apache Tika server at http://localhost:9998/
INFO rmeta/text (autodetecting type)
WARN rmeta/text: Text extraction failed (null)
org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.server.resource.TikaResource$1@74f007b
{{ at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)}}
{{ at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)}}
{{ at org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:224)}}
{{ at org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:401)}}
{{ at org.apache.tika.server.resource.RecursiveMetadataResource.parseMetadata(RecursiveMetadataResource.java:144)}}
{{ at org.apache.tika.server.resource.RecursiveMetadataResource.getMetadata(RecursiveMetadataResource.java:121)}}
{{ at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)}}
{{ at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:90)}}
{{ at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)}}
{{ at java.lang.reflect.Method.invoke(Method.java:508)}}
{{ at org.apache.cxf.service.invoker.AbstractInvoker.performInvocation(AbstractInvoker.java:179)}}
{{ at org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:96)}}
{{ at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:193)}}
{{ at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:103)}}
{{ at org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:59)}}
{{ at org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:96)}}
{{ at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)}}
{{ at org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)}}
{{ at org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:267)}}
{{ at org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247)}}
{{ at org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79)}}
{{ at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)}}
{{ at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)}}
{{ at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1340)}}
{{ at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:205)}}
{{ at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1242)}}
{{ at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)}}
{{ at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)}}
{{ at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)}}
{{ at org.eclipse.jetty.server.Server.handle(Server.java:503)}}
{{ at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364)}}
{{ at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)}}
{{ at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)}}
{{ at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)}}
{{ at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)}}
{{ at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)}}
{{ at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)}}
{{ at java.lang.Thread.run(Thread.java:820)}}
Caused by: javax.ws.rs.WebApplicationException: HTTP 415 Unsupported Media Type
{{ at org.apache.tika.server.resource.TikaResource$1.parse(TikaResource.java:127)}}
{{ at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}}
{{ ... 37 more}}