When we issue commit through curl command, not all the shards are getting `start commit` requests at the same time.
Solr Setup Detail : (Running in solrCloud mode)
It has 6 shards, and each shard has only one replica (which is also a
leader) and the replica type is NRT.
Each shards are hosted on the separate physical host.
Zookeeper => We are using external zookeeper ensemble (3 separate node
Shard and Host name
Request rate on the system is currently zero and only hourly indexing
running on it.
We are using curl command to issue commit.
(Using solr_254 host to issue commit)
On using the above command all the shards have started processing commit (i.e
getting `start commit` request) except the one used in curl command (i.e
shard2_0 which is hosted on solr_254). Individually each shards takes around
10 to 12 min to process hard commit (most of this time is spent on reloading
As per logs, shard2_0 is getting `start commit` request after 10 minutes
(approx). This leads to following timeout error.
2020-12-06 18:47:47.013 ERROR org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting response from server at: http://solr_132:9744/solr/my_collection_shard2_1_replica_n21/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2Fsolr_254%3A9744%2Fsolr%2Fmy_collection_shard2_0_replica_n11%2F at org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:407) at org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:753) at org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient.request(ConcurrentUpdateHttp2SolrClient.java:369) at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1290) at org.apache.solr.update.SolrCmdDistributor.doRequest(SolrCmdDistributor.java:344) at org.apache.solr.update.SolrCmdDistributor.lambda$submit$0(SolrCmdDistributor.java:333) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:180) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.util.concurrent.TimeoutException at org.eclipse.jetty.client.util.InputStreamResponseListener.get(InputStreamResponseListener.java:216) at org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:398) ... 13 more
Above timeout error is between solr_254 and solr_132. Similar errors are
there between solr_254 and other 4 shards
Since query load is zero, mostly CPU utilization is around 3%.
After issuing curl commit command, CPU goes up to 14% on all shards except
shard2_0 (host: solr_254, the one used in curl command).
And after 10 minutes (i.e after getting the `start commit` request) CPU on
shard2_0 also goes up to 14%.
As I mentioned earlier each shards take around 10-12 mins to process commit
and due to delay in starting commit process on one shard (shard2_0) our
overall commit time is doubled now. (22-24 minutes approx).
We are observing this delay in both hard and soft commit.
In our solr-5.4.0(having similar setup), we use the similar curl command to issue commit, and there all the shards are getting `start commit` request at same time. Including the one used in curl command.
Impact After deleting external files:
In order to nullify the impact of external files, I had deleted external
files from all the shards and issued commit through the curl command. Commit
operation got completed in 3 seconds. Individual shards took 1.5 seconds to
complete the commit operation. But there was a delay of around 1.5 seconds
on the shard whose hostname was used to issue the commit. Hence overall
commit time is 3 seconds.
During this operation, there was no timeout or any other kind of error
(except `external file not found` error which is expected).