Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
There apearrs to be a Jetty HttpClient bug that makes it possible for a request thread to hang indefinitely while waiting to parse the response from remote jetty servers. The cause of the hung thread is because it calls .wait() on monitor lock that should be notified by another (internal jetty client) thread when a chunk of data is available from the wire – but in some cases this evidently may not happen.
In the case of distrib=true requests processed by Solr (aggregating multiple per-shard responses from other nodes) this can manifest with stack traces that look like the following (taken from Solr 8.8.2)...
"thread",{ "id":14253, "name":"httpShardExecutor-7-thread-13819-...", "state":"WAITING", "lock":"org.eclipse.jetty.client.util.InputStreamResponseListener@12b59075", "lock-waiting":{ "name":"org.eclipse.jetty.client.util.InputStreamResponseListener@12b59075", "owner":null}, "synchronizers-locked":["java.util.concurrent.ThreadPoolExecutor$Worker@1ec1aed0"], "cpuTime":"65.4882ms", "userTime":"60.0000ms", "stackTrace":["java.base@11.0.14/java.lang.Object.wait(Native Method)", "java.base@11.0.14/java.lang.Object.wait(Unknown Source)", "org.eclipse.jetty.client.util.InputStreamResponseListener$Input.read(InputStreamResponseListener.java:318)", "org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:90)", "org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:99)", "org.apache.solr.common.util.FastInputStream.readByte(FastInputStream.java:217)", "org.apache.solr.common.util.JavaBinCodec._init(JavaBinCodec.java:211)", "org.apache.solr.common.util.JavaBinCodec.initRead(JavaBinCodec.java:202)", "org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:195)", "org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:51)", "org.apache.solr.client.solrj.impl.Http2SolrClient.processErrorsAndResponse(Http2SolrClient.java:696)", "org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:412)", "org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:761)", "org.apache.solr.client.solrj.impl.LBSolrClient.doRequest(LBSolrClient.java:369)", "org.apache.solr.client.solrj.impl.LBSolrClient.request(LBSolrClient.java:297)", "org.apache.solr.handler.component.HttpShardHandlerFactory.makeLoadBalancedRequest(HttpShardHandlerFactory.java:371)", "org.apache.solr.handler.component.ShardRequestor.call(ShardRequestor.java:132)", "org.apache.solr.handler.component.ShardRequestor.call(ShardRequestor.java:41)", "java.base@11.0.14/java.util.concurrent.FutureTask.run(Unknown Source)", "java.base@11.0.14/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)", "java.base@11.0.14/java.util.concurrent.FutureTask.run(Unknown Source)", "com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:180)", "org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)", "org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$175/0x0000000840243c40.run(Unknown Source)", "java.base@11.0.14/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)", "java.base@11.0.14/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)", "java.base@11.0.14/java.lang.Thread.run(Unknown Source)"]},
...these httpShardExecutor threads can stay hung, tying up system resources, indefinitely (unless they get a spuriuos notify() from the JVM). (In once case, it seems to have caused a request to hang for 10.3 hours)
Anecdotally:
- There is some evidence that this problem did NOT affect Solr 8.6.3, but does affect later versions
- suggesting the bug didn't exist in Jetty until after 9.4.27.v20200227
- Forcing the Jetty HttpClient to use HTTP1.1 transport seems to prevent this problem from happening
- In Solr this can be done by setting the "solr.http1" system property
- Or using the Http2SolrClient.Builder.useHttp1_1() method in client application code
Attachments
Issue Links
- incorporates
-
SOLR-15840 Performance degradation with Http2 client
- Open
- is related to
-
SOLR-15955 Update Jetty dependency to 10
- Closed
- relates to
-
SOLR-16129 Solr specific InputStreamResponseListener to prevent client threads from hanging forever
- Open
- links to