Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-16273

Prometheus Metric Exporter is very slow when collecting large amounts of sample data

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 8.6.3, 9.0
    • 9.3
    • None

    Description

      I have a solr cluster with 300 Collections, use Prometheus Metric Exporter program to get solr-cluster information, but it takes 2 minutes to get data each time, `jstack` is as follows:

      
      "solr-exporter-collectors-1-thread-2" #21 prio=5 os_prio=0 tid=0x00007fcef8009000 nid=0x45208 runnable [0x00007fcf16470000]
         java.lang.Thread.State: RUNNABLE
          at io.prometheus.client.Collector$MetricFamilySamples$Sample.equals(Collector.java:95)
          at java.util.ArrayList.indexOf(ArrayList.java:323)
          at java.util.ArrayList.contains(ArrayList.java:306)
          at org.apache.solr.prometheus.collector.MetricSamples.addSampleIfMetricExists(MetricSamples.java:50)
          at org.apache.solr.prometheus.collector.MetricSamples.addAll(MetricSamples.java:60)
          at org.apache.solr.prometheus.collector.MetricsCollector.lambda$collect$0(MetricsCollector.java:38)
          at org.apache.solr.prometheus.collector.MetricsCollector$$Lambda$127/68757342.accept(Unknown Source)
          at java.util.HashMap.forEach(HashMap.java:1291)
          at org.apache.solr.prometheus.collector.MetricsCollector.collect(MetricsCollector.java:38)
          at org.apache.solr.prometheus.collector.SchedulerMetricsCollector.lambda$collectMetrics$0(SchedulerMetricsCollector.java:91)
          at org.apache.solr.prometheus.collector.SchedulerMetricsCollector$$Lambda$75/817493591.get(Unknown Source)
          at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
          at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:212)
          at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$39/351002168.run(Unknown Source)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:750)
      
      

       

      "contains" method takes 90% of execution time.

       

      Looking at the MetricSamples.java code, "sample" will be deduplicated before adding to "sampleFamily.samples", when "sampleFamily.samples" reaches 20,000, "sampleFamily.samples.contains" is very inefficient

      Attachments

        Issue Links

          Activity

            People

              cpoerschke Christine Poerschke
              q6364325 Fa Ming
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 40m
                  1h 40m