Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
There are many reasons why Solr will not elect a leader for a shard e.g. all replicas' last published state was recovery or due to bugs which cause a leader to be marked as 'down'. While the best solution is that they never get into this state, we need a manual way to fix this when it does get into this state. Right now we can do a series of dance involving bouncing the node (since recovery paths between bouncing and REQUESTRECOVERY are different), but that is difficult when running a large cluster. Although it is possible that such a manual API may lead to some data loss but in some cases, it is the only possible option to restore availability.
This issue proposes to build a new collection API which can be used to force replicas into recovering a leader while avoiding data loss on a best effort basis.