Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-3993

SolrCloud leader election on single node stucks the initialization

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 4.0
    • Fix Version/s: 4.1, 6.0
    • Component/s: SolrCloud
    • Labels:
      None
    • Environment:

      Windows 7, Tomcat 6

      Description

      setup:
      1 node, 4 cores, 2 shards.
      15 documents indexed.

      problem:
      init stage times out.

      probable cause:
      According to the init flow, cores are initialized one by one synchronously.
      Actually, the main thread waits ShardLeaderElectionContext.waitForReplicasToComeUp until retry threshold, while replica cores are not yet initialized, in other words there is no chance other replicas go up in the meanwhile.
      stack trace:
      Thread [main] (Suspended)
      owns: HashMap<K,V> (id=3876)
      owns: StandardContext (id=3877)
      owns: HashMap<K,V> (id=3878)
      owns: StandardHost (id=3879)
      owns: StandardEngine (id=3880)
      owns: Service[] (id=3881)
      Thread.sleep(long) line: not available [native method]
      ShardLeaderElectionContext.waitForReplicasToComeUp(boolean, String) line: 298
      ShardLeaderElectionContext.runLeaderProcess(boolean) line: 143
      LeaderElector.runIamLeaderProcess(ElectionContext, boolean) line: 152
      LeaderElector.checkIfIamLeader(int, ElectionContext, boolean) line: 96
      LeaderElector.joinElection(ElectionContext) line: 262
      ZkController.joinElection(CoreDescriptor, boolean) line: 733
      ZkController.register(String, CoreDescriptor, boolean, boolean) line: 566
      ZkController.register(String, CoreDescriptor) line: 532
      CoreContainer.registerInZk(SolrCore) line: 709
      CoreContainer.register(String, SolrCore, boolean) line: 693
      CoreContainer.load(String, InputSource) line: 535
      CoreContainer.load(String, File) line: 356
      CoreContainer$Initializer.initialize() line: 308
      SolrDispatchFilter.init(FilterConfig) line: 107
      ApplicationFilterConfig.getFilter() line: 295
      ApplicationFilterConfig.setFilterDef(FilterDef) line: 422
      ApplicationFilterConfig.<init>(Context, FilterDef) line: 115
      StandardContext.filterStart() line: 4072
      StandardContext.start() line: 4726
      StandardHost(ContainerBase).addChildInternal(Container) line: 799
      StandardHost(ContainerBase).addChild(Container) line: 779
      StandardHost.addChild(Container) line: 601
      HostConfig.deployDescriptor(String, File, String) line: 675
      HostConfig.deployDescriptors(File, String[]) line: 601
      HostConfig.deployApps() line: 502
      HostConfig.start() line: 1317
      HostConfig.lifecycleEvent(LifecycleEvent) line: 324
      LifecycleSupport.fireLifecycleEvent(String, Object) line: 142
      StandardHost(ContainerBase).start() line: 1065
      StandardHost.start() line: 840
      StandardEngine(ContainerBase).start() line: 1057
      StandardEngine.start() line: 463
      StandardService.start() line: 525
      StandardServer.start() line: 754
      Catalina.start() line: 595
      NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not available [native method]
      NativeMethodAccessorImpl.invoke(Object, Object[]) line: not available
      DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: not available
      Method.invoke(Object, Object...) line: not available
      Bootstrap.start() line: 289
      Bootstrap.main(String[]) line: 414

      After a while, the session times out and following exception appears:
      Oct 25, 2012 1:16:56 PM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp
      INFO: Waiting until we see more replicas up: total=2 found=0 timeoutin=-95
      Oct 25, 2012 1:16:56 PM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp
      INFO: Was waiting for replicas to come up, but they are taking too long - assuming they won't come back till later
      Oct 25, 2012 1:16:56 PM org.apache.solr.common.SolrException log
      SEVERE: Errir checking for the number of election participants:org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /collections/collection1/leader_elect/shard2/election
      at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
      at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
      at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1249)
      at org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:227)
      at org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:224)
      at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:63)
      at org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:224)
      at org.apache.solr.cloud.ShardLeaderElectionContext.waitForReplicasToComeUp(ElectionContext.java:276)
      at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:143)
      at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:152)
      at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:96)
      at org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:262)
      at org.apache.solr.cloud.ZkController.joinElection(ZkController.java:733)
      at org.apache.solr.cloud.ZkController.register(ZkController.java:566)
      at org.apache.solr.cloud.ZkController.register(ZkController.java:532)
      at org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:709)
      at org.apache.solr.core.CoreContainer.register(CoreContainer.java:693)
      at org.apache.solr.core.CoreContainer.load(CoreContainer.java:535)
      at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356)
      at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308)
      at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107)
      at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
      at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
      at org.apache.catalina.core.ApplicationFilterConfig.<init>(ApplicationFilterConfig.java:115)
      at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4072)
      at org.apache.catalina.core.StandardContext.start(StandardContext.java:4726)
      at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799)
      at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779)
      at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601)
      at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:675)
      at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:601)
      at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:502)
      at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1317)
      at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324)
      at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
      at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1065)
      at org.apache.catalina.core.StandardHost.start(StandardHost.java:840)
      at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1057)
      at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463)
      at org.apache.catalina.core.StandardService.start(StandardService.java:525)
      at org.apache.catalina.core.StandardServer.start(StandardServer.java:754)
      at org.apache.catalina.startup.Catalina.start(Catalina.java:595)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
      at java.lang.reflect.Method.invoke(Unknown Source)
      at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
      at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)

      Followed by:
      Oct 25, 2012 1:17:27 PM org.apache.solr.cloud.RecoveryStrategy doRecovery
      SEVERE: Recovery failed - trying again... core=collection1
      Oct 25, 2012 1:18:32 PM org.apache.solr.common.SolrException log
      SEVERE: Error while trying to recover. core=collection1
      Oct 25, 2012 1:18:32 PM org.apache.solr.common.SolrException log
      SEVERE: Error while trying to recover. core=collection1:org.apache.solr.common.SolrException: No registered leader was found, collection:collection1 slice:shard1
      at org.apache.solr.common.cloud.ZkStateReader.getLeaderProps(ZkStateReader.java:413)
      at org.apache.solr.common.cloud.ZkStateReader.getLeaderProps(ZkStateReader.java:399)
      at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:318)
      at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220)

        Attachments

          Activity

            People

            • Assignee:
              markrmiller@gmail.com Mark Miller
              Reporter:
              alexeyk Alexey Kudinov
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: