Description
setup:
1 node, 4 cores, 2 shards.
15 documents indexed.
problem:
init stage times out.
probable cause:
According to the init flow, cores are initialized one by one synchronously.
Actually, the main thread waits ShardLeaderElectionContext.waitForReplicasToComeUp until retry threshold, while replica cores are not yet initialized, in other words there is no chance other replicas go up in the meanwhile.
stack trace:
Thread [main] (Suspended)
owns: HashMap<K,V> (id=3876)
owns: StandardContext (id=3877)
owns: HashMap<K,V> (id=3878)
owns: StandardHost (id=3879)
owns: StandardEngine (id=3880)
owns: Service[] (id=3881)
Thread.sleep(long) line: not available [native method]
ShardLeaderElectionContext.waitForReplicasToComeUp(boolean, String) line: 298
ShardLeaderElectionContext.runLeaderProcess(boolean) line: 143
LeaderElector.runIamLeaderProcess(ElectionContext, boolean) line: 152
LeaderElector.checkIfIamLeader(int, ElectionContext, boolean) line: 96
LeaderElector.joinElection(ElectionContext) line: 262
ZkController.joinElection(CoreDescriptor, boolean) line: 733
ZkController.register(String, CoreDescriptor, boolean, boolean) line: 566
ZkController.register(String, CoreDescriptor) line: 532
CoreContainer.registerInZk(SolrCore) line: 709
CoreContainer.register(String, SolrCore, boolean) line: 693
CoreContainer.load(String, InputSource) line: 535
CoreContainer.load(String, File) line: 356
CoreContainer$Initializer.initialize() line: 308
SolrDispatchFilter.init(FilterConfig) line: 107
ApplicationFilterConfig.getFilter() line: 295
ApplicationFilterConfig.setFilterDef(FilterDef) line: 422
ApplicationFilterConfig.<init>(Context, FilterDef) line: 115
StandardContext.filterStart() line: 4072
StandardContext.start() line: 4726
StandardHost(ContainerBase).addChildInternal(Container) line: 799
StandardHost(ContainerBase).addChild(Container) line: 779
StandardHost.addChild(Container) line: 601
HostConfig.deployDescriptor(String, File, String) line: 675
HostConfig.deployDescriptors(File, String[]) line: 601
HostConfig.deployApps() line: 502
HostConfig.start() line: 1317
HostConfig.lifecycleEvent(LifecycleEvent) line: 324
LifecycleSupport.fireLifecycleEvent(String, Object) line: 142
StandardHost(ContainerBase).start() line: 1065
StandardHost.start() line: 840
StandardEngine(ContainerBase).start() line: 1057
StandardEngine.start() line: 463
StandardService.start() line: 525
StandardServer.start() line: 754
Catalina.start() line: 595
NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not available [native method]
NativeMethodAccessorImpl.invoke(Object, Object[]) line: not available
DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: not available
Method.invoke(Object, Object...) line: not available
Bootstrap.start() line: 289
Bootstrap.main(String[]) line: 414
After a while, the session times out and following exception appears:
Oct 25, 2012 1:16:56 PM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp
INFO: Waiting until we see more replicas up: total=2 found=0 timeoutin=-95
Oct 25, 2012 1:16:56 PM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp
INFO: Was waiting for replicas to come up, but they are taking too long - assuming they won't come back till later
Oct 25, 2012 1:16:56 PM org.apache.solr.common.SolrException log
SEVERE: Errir checking for the number of election participants:org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /collections/collection1/leader_elect/shard2/election
at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1249)
at org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:227)
at org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:224)
at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:63)
at org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:224)
at org.apache.solr.cloud.ShardLeaderElectionContext.waitForReplicasToComeUp(ElectionContext.java:276)
at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:143)
at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:152)
at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:96)
at org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:262)
at org.apache.solr.cloud.ZkController.joinElection(ZkController.java:733)
at org.apache.solr.cloud.ZkController.register(ZkController.java:566)
at org.apache.solr.cloud.ZkController.register(ZkController.java:532)
at org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:709)
at org.apache.solr.core.CoreContainer.register(CoreContainer.java:693)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:535)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356)
at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308)
at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107)
at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
at org.apache.catalina.core.ApplicationFilterConfig.<init>(ApplicationFilterConfig.java:115)
at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4072)
at org.apache.catalina.core.StandardContext.start(StandardContext.java:4726)
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601)
at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:675)
at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:601)
at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:502)
at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1317)
at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324)
at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1065)
at org.apache.catalina.core.StandardHost.start(StandardHost.java:840)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1057)
at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463)
at org.apache.catalina.core.StandardService.start(StandardService.java:525)
at org.apache.catalina.core.StandardServer.start(StandardServer.java:754)
at org.apache.catalina.startup.Catalina.start(Catalina.java:595)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)
Followed by:
Oct 25, 2012 1:17:27 PM org.apache.solr.cloud.RecoveryStrategy doRecovery
SEVERE: Recovery failed - trying again... core=collection1
Oct 25, 2012 1:18:32 PM org.apache.solr.common.SolrException log
SEVERE: Error while trying to recover. core=collection1
Oct 25, 2012 1:18:32 PM org.apache.solr.common.SolrException log
SEVERE: Error while trying to recover. core=collection1:org.apache.solr.common.SolrException: No registered leader was found, collection:collection1 slice:shard1
at org.apache.solr.common.cloud.ZkStateReader.getLeaderProps(ZkStateReader.java:413)
at org.apache.solr.common.cloud.ZkStateReader.getLeaderProps(ZkStateReader.java:399)
at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:318)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220)