[SOLR-9936] Allow configuration for recoveryExecutor thread pool size - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 6.3
Fix Version/s: 6.6, 7.0
Component/s: replication (java)
Labels:
None

Description

There are two executor services in UpdateShardHandler, the updateExecutor whose size is unbounded for reasons explained in the code comments. There is also the recoveryExecutor which was added later, and is the one that executes the RecoveryStrategy code to actually fetch index files and store to disk, eventually calling an fsync thread to ensure the data is written.

We found that with a fast network such as 10GbE it's very easy to overload the local disk storage when doing a restart of Solr instances after some downtime, if they have many cores to load. Typically we have each physical server containing 6 SSDs and 6 Solr instances, so each Solr has its home dir on a dedicated SSD. With 100+ cores (shard replicas) on each instance, startup can really hammer the SSD as it's writing in parallel from as many cores as Solr is recovering. This made recovery time bad enough that replicas were down for a long time, and even shards marked as down if none of its replicas have recovered (usually when many machines have been restarted). The very slow IO times (10s of seconds or worse) also made the JVM pause, so that disconnects from ZK, which didn't help recovery either.

This patch allowed us to throttle how much parallelism there would be writing to a disk - in practice we're using a pool size of 4 threads, to prevent the SSD getting overloaded, and that worked well enough to make recovery of all cores in reasonable time.

Due to the comment on the other thread pool size, I'd like some comments on whether it's OK to do this for the recoveryExecutor though?

It's configured in solr.xml with e.g.

  <updateshardhandler>
    <int name="maxRecoveryThreads">${solr.recovery.threads:4}</int>
  </updateshardhandler>

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

SOLR-9936.patch
30/Jan/17 11:52
7 kB
Tim Owen
SOLR-9936.patch
06/Jan/17 14:43
7 kB
Tim Owen

Issue Links

is related to

SOLR-8205 Make UpdateShardHandler's thread pool configurable

Resolved

Activity

People

Assignee:: Mark Miller

Reporter:: Tim Owen

Votes:: 1 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 06/Jan/17 14:42

Updated:: 02/Oct/19 17:23

Resolved:: 14/Apr/17 06:34