Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-16099

HTTP Client threads can hang in Jetty's InputStreamResponseListener when using HTTP2 - impacts intra-node communication

    XMLWordPrintableJSON

Details

    Description

      There apearrs to be a Jetty HttpClient bug that makes it possible for a request thread to hang indefinitely while waiting to parse the response from remote jetty servers. The cause of the hung thread is because it calls .wait() on monitor lock that should be notified by another (internal jetty client) thread when a chunk of data is available from the wire – but in some cases this evidently may not happen.

      In the case of distrib=true requests processed by Solr (aggregating multiple per-shard responses from other nodes) this can manifest with stack traces that look like the following (taken from Solr 8.8.2)...

       "thread",{
         "id":14253,
         "name":"httpShardExecutor-7-thread-13819-...",
         "state":"WAITING",
         "lock":"org.eclipse.jetty.client.util.InputStreamResponseListener@12b59075",
         "lock-waiting":{
           "name":"org.eclipse.jetty.client.util.InputStreamResponseListener@12b59075",
           "owner":null},
         "synchronizers-locked":["java.util.concurrent.ThreadPoolExecutor$Worker@1ec1aed0"],
         "cpuTime":"65.4882ms",
         "userTime":"60.0000ms",
         "stackTrace":["java.base@11.0.14/java.lang.Object.wait(Native Method)",
           "java.base@11.0.14/java.lang.Object.wait(Unknown Source)",
           "org.eclipse.jetty.client.util.InputStreamResponseListener$Input.read(InputStreamResponseListener.java:318)",
           "org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:90)",
           "org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:99)",
           "org.apache.solr.common.util.FastInputStream.readByte(FastInputStream.java:217)",
           "org.apache.solr.common.util.JavaBinCodec._init(JavaBinCodec.java:211)",
           "org.apache.solr.common.util.JavaBinCodec.initRead(JavaBinCodec.java:202)",
           "org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:195)",
           "org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:51)",
           "org.apache.solr.client.solrj.impl.Http2SolrClient.processErrorsAndResponse(Http2SolrClient.java:696)",
           "org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:412)",
           "org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:761)",
           "org.apache.solr.client.solrj.impl.LBSolrClient.doRequest(LBSolrClient.java:369)",
           "org.apache.solr.client.solrj.impl.LBSolrClient.request(LBSolrClient.java:297)",
           "org.apache.solr.handler.component.HttpShardHandlerFactory.makeLoadBalancedRequest(HttpShardHandlerFactory.java:371)",
           "org.apache.solr.handler.component.ShardRequestor.call(ShardRequestor.java:132)",
           "org.apache.solr.handler.component.ShardRequestor.call(ShardRequestor.java:41)",
           "java.base@11.0.14/java.util.concurrent.FutureTask.run(Unknown Source)",
           "java.base@11.0.14/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)",
           "java.base@11.0.14/java.util.concurrent.FutureTask.run(Unknown Source)",
           "com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:180)",
           "org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)",
           "org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$175/0x0000000840243c40.run(Unknown Source)",
           "java.base@11.0.14/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)",
           "java.base@11.0.14/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)",
           "java.base@11.0.14/java.lang.Thread.run(Unknown Source)"]},
      

      ...these httpShardExecutor threads can stay hung, tying up system resources, indefinitely (unless they get a spuriuos notify() from the JVM). (In once case, it seems to have caused a request to hang for 10.3 hours)

      Anecdotally:

      • There is some evidence that this problem did NOT affect Solr 8.6.3, but does affect later versions
        • suggesting the bug didn't exist in Jetty until after 9.4.27.v20200227
      • Forcing the Jetty HttpClient to use HTTP1.1 transport seems to prevent this problem from happening
        • In Solr this can be done by setting the "solr.http1" system property
        • Or using the Http2SolrClient.Builder.useHttp1_1() method in client application code

      Attachments

        Issue Links

          Activity

            People

              krisden Kevin Risden
              hossman Chris M. Hostetter
              Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h