Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-13953

Prometheus exporter in SolrCloud mode limited to 100 nodes

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 8.4
    • Component/s: None
    • Labels:
      None

      Description

      When using the Prometheus Exporter in SolrCloud mode against a cluster with more than 100 nodes, only 100 nodes' metrics are collected. For the other nodes, we see "Connection pool shut down" errors show up in logs, and the metrics from those nodes aren't reported.

      This seems to be tied to the cache implementation in hostClientCache in SolrCloudScraper. That cache currently has a fixed maximum size of 100. When it approaches that limit begins to evict HttpSolrClients, it closes those clients.

      We use the cache to build up a map of base URL to HttpSolrClient. For a >100 node cluster, the cache will successfully return clients for all nodes, sequentially. But once we add the 101st node, the first HttpSolrClient, which the cache still holds a reference to, gets closed. When we then try to get the metrics using all of the HttpSolrClients returned from the cache, the ones that have been closed throw IllegalStateExceptions with message "Connection pool shut down".

       

      Original email thread here: http://mail-archives.apache.org/mod_mbox/lucene-dev/201911.mbox/%3CCAOz296DSV-tt7rWBirBZ%2BP4%3DvT5g29FZrR_2zHrHF084Xq%2Bgyw%40mail.gmail.com%3E

      Github PR here: https://github.com/apache/lucene-solr/pull/1022

       

      Example stacktrace:
       

      WARN  - 2019-11-15 21:21:19.584; org.apache.solr.prometheus.scraper.Async; Error occurred during metrics collection => java.util.concurrent.ExecutionException: java.lang.IllegalStateException: Connection pool shut down
              at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
              at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177) [?:?]
              at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1654) [?:?]
              at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) [?:?]
              at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) [?:?]
              at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) [?:?]
              at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) [?:?]
              at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) [?:?]
              at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497) [?:?]
              at org.apache.solr.prometheus.scraper.Async.lambda$waitForAllSuccessfulResponses$3(Async.java:43) [solr-prometheus-exporter-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:41]
              at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986) [?:?]
              at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:970) [?:?]
              at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) [?:?]
              at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1705) [?:?]
              at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209) [solr-solrj-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:52]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
              at java.lang.Thread.run(Thread.java:834) [?:?]
      Caused by: java.lang.IllegalStateException: Connection pool shut down
              at org.apache.http.util.Asserts.check(Asserts.java:34) ~[httpcore-4.4.10.jar:4.4.10]
              at org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:191) ~[httpcore-4.4.10.jar:4.4.10]
              at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.requestConnection(PoolingHttpClientConnectionManager.java:267) ~[httpclient-4.5.6.jar:4.5.6]
              at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:176) ~[httpclient-4.5.6.jar:4.5.6]
              at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) ~[httpclient-4.5.6.jar:4.5.6]
              at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) ~[httpclient-4.5.6.jar:4.5.6]
              at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) ~[httpclient-4.5.6.jar:4.5.6]
              at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) ~[httpclient-4.5.6.jar:4.5.6]
              at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) ~[httpclient-4.5.6.jar:4.5.6]
              at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) ~[httpclient-4.5.6.jar:4.5.6]
              at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:542) ~[solr-solrj-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:52]
              at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255) ~[solr-solrj-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:52]
              at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244) ~[solr-solrj-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:52]
              at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1260) ~[solr-solrj-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:52]
              at org.apache.solr.prometheus.scraper.SolrScraper.request(SolrScraper.java:102) ~[solr-prometheus-exporter-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:41]
              at org.apache.solr.prometheus.scraper.SolrCloudScraper.lambda$metricsForAllHosts$6(SolrCloudScraper.java:119) ~[solr-prometheus-exporter-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:41]
              at org.apache.solr.prometheus.scraper.SolrScraper.lambda$null$0(SolrScraper.java:81) ~[solr-prometheus-exporter-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:41]
              at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) ~[?:?]
              ... 4 more
      

       

        Attachments

        1. SOLR-13953.patch
          2 kB
          Erick Erickson

          Issue Links

            Activity

              People

              • Assignee:
                erickerickson Erick Erickson
                Reporter:
                ajablonski Alex Jablonski
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m