[SOLR-16273] Prometheus Metric Exporter is very slow when collecting large amounts of sample data - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: 8.6.3, 9.0
Fix Version/s: 9.3
Component/s: contrib - prometheus-exporter
Labels:
None

Description

I have a solr cluster with 300 Collections, use Prometheus Metric Exporter program to get solr-cluster information, but it takes 2 minutes to get data each time, `jstack` is as follows:


"solr-exporter-collectors-1-thread-2" #21 prio=5 os_prio=0 tid=0x00007fcef8009000 nid=0x45208 runnable [0x00007fcf16470000]
   java.lang.Thread.State: RUNNABLE
    at io.prometheus.client.Collector$MetricFamilySamples$Sample.equals(Collector.java:95)
    at java.util.ArrayList.indexOf(ArrayList.java:323)
    at java.util.ArrayList.contains(ArrayList.java:306)
    at org.apache.solr.prometheus.collector.MetricSamples.addSampleIfMetricExists(MetricSamples.java:50)
    at org.apache.solr.prometheus.collector.MetricSamples.addAll(MetricSamples.java:60)
    at org.apache.solr.prometheus.collector.MetricsCollector.lambda$collect$0(MetricsCollector.java:38)
    at org.apache.solr.prometheus.collector.MetricsCollector$$Lambda$127/68757342.accept(Unknown Source)
    at java.util.HashMap.forEach(HashMap.java:1291)
    at org.apache.solr.prometheus.collector.MetricsCollector.collect(MetricsCollector.java:38)
    at org.apache.solr.prometheus.collector.SchedulerMetricsCollector.lambda$collectMetrics$0(SchedulerMetricsCollector.java:91)
    at org.apache.solr.prometheus.collector.SchedulerMetricsCollector$$Lambda$75/817493591.get(Unknown Source)
    at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
    at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:212)
    at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$39/351002168.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)

"contains" method takes 90% of execution time.

Looking at the MetricSamples.java code, "sample" will be deduplicated before adding to "sampleFamily.samples", when "sampleFamily.samples" reaches 20,000, "sampleFamily.samples.contains" is very inefficient

Attachments

Issue Links

links to

GitHub Pull Request #1627

Activity

People

Assignee:: Christine Poerschke

Reporter:: Fa Ming

Votes:: 1 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 28/Jun/22 13:56

Updated:: 21/Jul/23 20:52

Resolved:: 12/Jun/23 05:06

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

1h 40m