[SOLR-16689] Inefficiencies in replication process - ASF JIRA

XML

Word

Printable

JSON

There are a couple of inefficiencies with replication that can cause increased CPU usage unnecessarily due to replicas being added:

The RecoveryStrategy.replicate() method makes a call to commit to on the leader. This happens whenever a replica is reloaded. For PULL replicas in particular this isn't necessary since we can just pull down whatever the latest data is and rely on other mechanisms to be consistently committing the leader. (As an aside, it seems like forcing a commit on the leader might never be necessary, but for this I've limited it to focusing on PULL replicas).
In a case where the leader has no data yet (index version is 0), then a non-leader replica will consistently delete and recreate its core due to this case in IndexFetcher: https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/handler/IndexFetcher.java#L549. This can cause unnecessary CPU usage until the leader has data indexed to it.