[SOLR-7021] Leader will not publish core as active without recovering first, but never recovers - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Cannot Reproduce
Affects Version/s: 4.10
Fix Version/s: None
Component/s: SolrCloud
Labels:

Description

A little background: 1 core solr-cloud cluster across 3 nodes, each with its own shard and each shard with a single replica hence each replica is itself a leader.

For reasons we won't get into, we witnessed a shard go down in our cluster. We restarted the cluster but our core/shards still did not come back up. After inspecting the logs, we found this:

015-01-21 15:51:56,494 [coreZkRegister-1-thread-2] INFO  cloud.ZkController  - We are http://xxx.xxx.xxx.35:8081/solr/xyzcore/ and leader is http://xxx.xxx.xxx.35:8081/solr/xyzcore/
2015-01-21 15:51:56,496 [coreZkRegister-1-thread-2] INFO  cloud.ZkController  - No LogReplay needed for core=xyzcore baseURL=http://xxx.xxx.xxx.35:8081/solr
2015-01-21 15:51:56,496 [coreZkRegister-1-thread-2] INFO  cloud.ZkController  - I am the leader, no recovery necessary
2015-01-21 15:51:56,496 [coreZkRegister-1-thread-2] INFO  cloud.ZkController  - publishing core=xyzcore state=active collection=xyzcore
2015-01-21 15:51:56,497 [coreZkRegister-1-thread-2] INFO  cloud.ZkController  - numShards not found on descriptor - reading it from system property
2015-01-21 15:51:56,498 [coreZkRegister-1-thread-2] INFO  cloud.ZkController  - publishing core=xyzcore state=down collection=xyzcore
2015-01-21 15:51:56,498 [coreZkRegister-1-thread-2] INFO  cloud.ZkController  - numShards not found on descriptor - reading it from system property
2015-01-21 15:51:56,501 [coreZkRegister-1-thread-2] ERROR core.ZkContainer  - :org.apache.solr.common.SolrException: Cannot publish state of core 'xyzcore' as active without recovering first!
	at org.apache.solr.cloud.ZkController.publish(ZkController.java:1075)

And at this point the necessary shards never recover correctly and hence our core never returns to a functional state.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: James Hardwick

Votes:: 2 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 22/Jan/15 23:33

Updated:: 30/Nov/16 23:39

Resolved:: 30/Nov/16 23:39