Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
4.3, 6.0
-
None
-
None
-
None
Description
Anecdotal user's list evidence indicates that in high-throughput situations, the default of 10 docs/batch for inter-shard batching can generate significant CPU load. See the thread titled "Sharding and Replication" on June 19th, but the gist is below.
I haven't poked around, but it's a little surprising on the surface that Asif is seeing this kind of difference. So I'm wondering if this change indicates some other underlying issue. Regardless, this seems like it would be good to investigate.
Here's the gist of Asif's experience from the thread:
Its a completely practical problem - we are exploring Solr to build a real
time analytics/data solution for a system handling about 1000 qps. We have
various metrics that are stored as different collections on the cloud,
which means very high amount of writes. The cloud also needs to support
about 300-400 qps.
We initially tested with a single Solr node on a 16 core / 24 GB box for a
single metric. We saw that writes were not a issue at all - Solr was
handling it extremely well. We were also able to achieve about 200 qps from
a single node.
When we set up the cloud ( a ensemble on 6 boxes), we saw very high CPU
usage on the replicas. Up to 10 cores were getting used for writes on the
replicas. Hence my concern with respect to batch updates for the replicas.
BTW, I altered the maxBufferedAddsPerServer to 1000 - and now CPU usage is
very similar to single node installation.
Attachments
Issue Links
- Is contained by
-
SOLR-5232 SolrCloud should distribute updates via streaming rather than buffering.
- Resolved