Solr
  1. Solr
  2. SOLR-6095

SolrCloud cluster can end up without an overseer with overseer roles

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.8
    • Fix Version/s: 4.10, 6.0
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      We have a large cluster running on ec2 which occasionally ends up without an overseer after a rolling restart. We always restart our overseer nodes at the very last otherwise we end up with a large number of shards that can't recover properly.

      This cluster is running a custom branch forked from 4.8 and has SOLR-5473, SOLR-5495 and SOLR-5468 applied. We have a large number of small collections (120 collections each with approx 5M docs) on 16 Solr nodes. We are also using the overseer roles feature to designate two specified nodes as overseers. However, I think the problem that we're seeing is not specific to the overseer roles feature.

      As soon as the overseer was shutdown, we saw the following on the node which was next in line to become the overseer:

      2014-05-20 09:55:39,261 [main-EventThread] INFO  solr.cloud.ElectionContext  - I am going to be the leader ec2-xxxxxxxxxx.compute-1.amazonaws.com:8987_solr
      2014-05-20 09:55:39,265 [main-EventThread] WARN  solr.cloud.LeaderElector  - 
      org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /overseer_elect/leader
      	at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
      	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
      	at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
      	at org.apache.solr.common.cloud.SolrZkClient$10.execute(SolrZkClient.java:432)
      	at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73)
      	at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:429)
      	at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:386)
      	at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:373)
      	at org.apache.solr.cloud.OverseerElectionContext.runLeaderProcess(ElectionContext.java:551)
      	at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:142)
      	at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:110)
      	at org.apache.solr.cloud.LeaderElector.access$200(LeaderElector.java:55)
      	at org.apache.solr.cloud.LeaderElector$ElectionWatcher.process(LeaderElector.java:303)
      	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
      	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
      

      When the overseer leader node is gracefully shutdown, we get the following in the logs:

      2014-05-20 09:55:39,254 [Thread-63] ERROR solr.cloud.Overseer  - Exception in Overseer main queue loop
      org.apache.solr.common.SolrException: Could not load collection from ZK:sm12
      	at org.apache.solr.common.cloud.ZkStateReader.getExternCollectionFresh(ZkStateReader.java:778)
      	at org.apache.solr.common.cloud.ZkStateReader.updateClusterState(ZkStateReader.java:553)
      	at org.apache.solr.common.cloud.ZkStateReader.updateClusterState(ZkStateReader.java:246)
      	at org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:237)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.InterruptedException
      	at java.lang.Object.wait(Native Method)
      	at java.lang.Object.wait(Object.java:503)
      	at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1342)
      	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1040)
      	at org.apache.solr.common.cloud.SolrZkClient$4.execute(SolrZkClient.java:226)
      	at org.apache.solr.common.cloud.SolrZkClient$4.execute(SolrZkClient.java:223)
      	at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73)
      	at org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:223)
      	at org.apache.solr.common.cloud.ZkStateReader.getExternCollectionFresh(ZkStateReader.java:767)
      	... 4 more
      2014-05-20 09:55:39,254 [Thread-63] INFO  solr.cloud.Overseer  - Overseer Loop exiting : ec2-xxxxxxxxxx.compute-1.amazonaws.com:8986_solr
      2014-05-20 09:55:39,256 [main-EventThread] WARN  common.cloud.ZkStateReader  - ZooKeeper watch triggered, but Solr cannot talk to ZK
      2014-05-20 09:55:39,259 [ShutdownMonitor] INFO  server.handler.ContextHandler  - stopped o.e.j.w.WebAppContext{/solr,file:/vol0/cloud86/solr-webapp/webapp/},/vol0/cloud86/webapps/solr.war
      

      Notice how the overseer kept on running almost till the last point i.e. until the jetty context stopped. On some runs, we got the following on the overseer leader node on graceful shutdown:

      2014-05-19 21:33:43,657 [Thread-75] ERROR solr.cloud.Overseer  - Exception in Overseer main queue loop
      org.apache.solr.common.SolrException: Could not load collection from ZK:sm71
      	at org.apache.solr.common.cloud.ZkStateReader.getExternCollectionFresh(ZkStateReader.java:778)
      	at org.apache.solr.common.cloud.ZkStateReader.updateClusterState(ZkStateReader.java:553)
      	at org.apache.solr.common.cloud.ZkStateReader.updateClusterState(ZkStateReader.java:246)
      	at org.apache.solr.cloud.Overseer$ClusterStateUpdater.run(Overseer.java:237)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.lang.InterruptedException
      	at java.lang.Object.wait(Native Method)
      	at java.lang.Object.wait(Object.java:503)
      	at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1342)
      	at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1153)
      	at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:277)
      	at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:274)
      	at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73)
      	at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:274)
      	at org.apache.solr.common.cloud.ZkStateReader.getExternCollectionFresh(ZkStateReader.java:769)
      	... 4 more
      2014-05-19 21:33:43,662 [main-EventThread] WARN  common.cloud.ZkStateReader  - ZooKeeper watch triggered, but Solr cannot talk to ZK
      2014-05-19 21:33:43,663 [Thread-75] INFO  solr.cloud.Overseer  - Overseer Loop exiting : ec2-xxxxxxxxxxxx.compute-1.amazonaws.com:8987_solr
      2014-05-19 21:33:43,664 [OverseerExitThread] ERROR solr.cloud.Overseer  - could not read the data
      org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /overseer_elect/leader
      	at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
      	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
      	at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
      	at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:277)
      	at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:274)
      	at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73)
      	at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:274)
      	at org.apache.solr.cloud.Overseer$ClusterStateUpdater.checkIfIamStillLeader(Overseer.java:329)
      	at org.apache.solr.cloud.Overseer$ClusterStateUpdater.access$300(Overseer.java:85)
      	at org.apache.solr.cloud.Overseer$ClusterStateUpdater$1.run(Overseer.java:293)
      2014-05-19 21:33:43,665 [ShutdownMonitor] INFO  server.handler.ContextHandler  - stopped o.e.j.w.WebAppContext{/solr,file:/vol0/cloud87/solr-webapp/webapp/},/vol0/cloud87/webapps/solr.war
      

      Again the overseer was clinging on till the last moment and by the time it exited the ZK session expired and it couldn't delete the /overseer_elect/leader node. The exception on the next-in-line node was the same i.e. NodeExists for /overseer_elect/leader.

      In both cases, we are left with no overseers after restart. I can easily reproduce this problem by just restarting overseer leader nodes repeatedly.

      1. SOLR-6095.patch
        36 kB
        Noble Paul
      2. SOLR-6095.patch
        36 kB
        Noble Paul
      3. SOLR-6095.patch
        35 kB
        Noble Paul
      4. SOLR-6095.patch
        7 kB
        Shalin Shekhar Mangar

        Issue Links

          Activity

          Hide
          Shalin Shekhar Mangar added a comment -

          I also opened SOLR-6091 but that didn't help.

          Show
          Shalin Shekhar Mangar added a comment - I also opened SOLR-6091 but that didn't help.
          Hide
          Shalin Shekhar Mangar added a comment -

          The problem that I could find is in LeaderElector.checkIfIamLeader where we have the following code:

          if (seq <= intSeqs.get(0)) {
                // first we delete the node advertising the old leader in case the ephem is still there
                try {
                  zkClient.delete(context.leaderPath, -1, true);
                } catch(Exception e) {
                  // fine
                }
          
                runIamLeaderProcess(context, replacement);
              }
          

          If for whatever reason, the zkClient.delete was unsuccessful, we just ignore and go ahead to runIamLeaderProcess(...) which leads to OverseerElectionContext.runLeaderProcess(...) where it tries to create the /overseer_elect/leader node:

          zkClient.makePath(leaderPath, ZkStateReader.toJSON(myProps),
                  CreateMode.EPHEMERAL, true);
          

          This is where things go wrong. Because the /overseer_elect/leader node already existed, the zkClient.makePath fails and the node decides to give up because it think that there is already a leader. It never tries to rejoin election ever. Then once the ephemeral /overseer_elect/leader node goes away (after the previous overseer leader exits), the cluster is left with no leader.

          Shouldn't the node next in line to become a leader try again or rejoin the election instead of giving up?

          Show
          Shalin Shekhar Mangar added a comment - The problem that I could find is in LeaderElector.checkIfIamLeader where we have the following code: if (seq <= intSeqs.get(0)) { // first we delete the node advertising the old leader in case the ephem is still there try { zkClient.delete(context.leaderPath, -1, true ); } catch (Exception e) { // fine } runIamLeaderProcess(context, replacement); } If for whatever reason, the zkClient.delete was unsuccessful, we just ignore and go ahead to runIamLeaderProcess(...) which leads to OverseerElectionContext.runLeaderProcess(...) where it tries to create the /overseer_elect/leader node: zkClient.makePath(leaderPath, ZkStateReader.toJSON(myProps), CreateMode.EPHEMERAL, true ); This is where things go wrong. Because the /overseer_elect/leader node already existed, the zkClient.makePath fails and the node decides to give up because it think that there is already a leader. It never tries to rejoin election ever. Then once the ephemeral /overseer_elect/leader node goes away (after the previous overseer leader exits), the cluster is left with no leader. Shouldn't the node next in line to become a leader try again or rejoin the election instead of giving up?
          Hide
          Mark Miller added a comment -

          We always restart our overseer nodes at the very last otherwise we end up with a large number of shards that can't recover properly.

          Do you know if there is a JIRA issue for that?

          Show
          Mark Miller added a comment - We always restart our overseer nodes at the very last otherwise we end up with a large number of shards that can't recover properly. Do you know if there is a JIRA issue for that?
          Hide
          Shalin Shekhar Mangar added a comment -

          No, I don't think there's a jira for it. The reason that we could find was that if for some reason the rolling restart sequence matches with the overseer election sequence then the overseer keep shifting with each bounce and are unable to process events. This is kinda okay in small clusters but in large clusters, by the time the rolling restarts complete, some nodes reach "recovery_failed" state and won't try to come back up again.

          Once we changed our restart sequence to restart the overseer node at the very last, we did not encounter this problem any more.

          Show
          Shalin Shekhar Mangar added a comment - No, I don't think there's a jira for it. The reason that we could find was that if for some reason the rolling restart sequence matches with the overseer election sequence then the overseer keep shifting with each bounce and are unable to process events. This is kinda okay in small clusters but in large clusters, by the time the rolling restarts complete, some nodes reach "recovery_failed" state and won't try to come back up again. Once we changed our restart sequence to restart the overseer node at the very last, we did not encounter this problem any more.
          Hide
          Ramkumar Aiyengar added a comment -

          Not sure I understand. You bring down first wave, overseers move to second wave. When you bring back first wave, they use the overseer in the second wave to recover and become active. Then you start with the second wave. Why would this be a problem?

          Show
          Ramkumar Aiyengar added a comment - Not sure I understand. You bring down first wave, overseers move to second wave. When you bring back first wave, they use the overseer in the second wave to recover and become active. Then you start with the second wave. Why would this be a problem?
          Hide
          Shalin Shekhar Mangar added a comment - - edited

          Except we don't do our rolling restarts like that. Our restart scripts iterates through hosts looked up using EC2 APIs (and it almost always returns the node names in the same order) and restarts them one by one and after each restart, waits for 60 seconds, verifies that node is up and continues with the next host.

          Since the script originally created the nodes too in the same order, the election nodes are also approximately in the same order. This causes each host restart to displace the overseer to the next host in line which is again displaced and so on.

          Show
          Shalin Shekhar Mangar added a comment - - edited Except we don't do our rolling restarts like that. Our restart scripts iterates through hosts looked up using EC2 APIs (and it almost always returns the node names in the same order) and restarts them one by one and after each restart, waits for 60 seconds, verifies that node is up and continues with the next host. Since the script originally created the nodes too in the same order, the election nodes are also approximately in the same order. This causes each host restart to displace the overseer to the next host in line which is again displaced and so on.
          Hide
          Ramkumar Aiyengar added a comment -

          That would explain it, our start script blocks until all cores are active, hence we don't have this issue..

          Show
          Ramkumar Aiyengar added a comment - That would explain it, our start script blocks until all cores are active, hence we don't have this issue..
          Hide
          Shalin Shekhar Mangar added a comment -

          Here's a testcase which fails with these same issue. Finally after discarding many iterations I managed to create this test. This test sets up 16 shards (2x8) and adds overseer roles to three nodes and then in a loop restarts just the overseers over and over again. It fails usually after the a couple of restarts.

          Show
          Shalin Shekhar Mangar added a comment - Here's a testcase which fails with these same issue. Finally after discarding many iterations I managed to create this test. This test sets up 16 shards (2x8) and adds overseer roles to three nodes and then in a loop restarts just the overseers over and over again. It fails usually after the a couple of restarts.
          Hide
          Noble Paul added a comment -

          I have tweaked the roles feature a bit as follows

          The new approach

          If the current order is (everyone who is below is listening to the one right above)

          1. nodeA-0 <leader>
          2. nodeB-1
          3. nodeC-2
          4. nodeD-3
          5. nodeE-4

          And addrole asks nodeD to become overseer

          According to the new approach , a command is sent to nodeD to rejoin election at head, so the new Q becomes

          1. nodeA-0 <leader>
          2. nodeB-1 nodeD-1
          3. nodeC-2
          4. nodeE-4

          Now, both nodeB and nodeD are waiting on nodeA to become the leader

          The next step is to send a rejoin (not at head) command to nodeB . So the new order automatically is as follows where nodeD is the next node in line to become the leader.

          1. nodeA-0 <leader>
          2. nodeD-1
          3. nodeC-2
          4. nodeE-4
          5. nodeB-5

          The next step is to send a quit command to nodeA (current leader) . So the new order becomes

          1. nodeD-1 <leader>
          2. nodeC-2
          3. nodeE-4
          4. nodeB-5
          5. nodeA-6

          So we have promoted nodeD to leader with just 3 operations . The advantage is that , irrespective of the no:of nodes in the queue , the no:of operations is still the same 3 , So it does not matter if it is a big cluster or small. The good thing is there will never be a loss of overseer , even if the designate does not become the leader (because of errors happening in the prioritizeOverseerNodes)

          Show
          Noble Paul added a comment - I have tweaked the roles feature a bit as follows The new approach If the current order is (everyone who is below is listening to the one right above) nodeA-0 <leader> nodeB-1 nodeC-2 nodeD-3 nodeE-4 And addrole asks nodeD to become overseer According to the new approach , a command is sent to nodeD to rejoin election at head, so the new Q becomes nodeA-0 <leader> nodeB-1 nodeD-1 nodeC-2 nodeE-4 Now, both nodeB and nodeD are waiting on nodeA to become the leader The next step is to send a rejoin (not at head) command to nodeB . So the new order automatically is as follows where nodeD is the next node in line to become the leader. nodeA-0 <leader> nodeD-1 nodeC-2 nodeE-4 nodeB-5 The next step is to send a quit command to nodeA (current leader) . So the new order becomes nodeD-1 <leader> nodeC-2 nodeE-4 nodeB-5 nodeA-6 So we have promoted nodeD to leader with just 3 operations . The advantage is that , irrespective of the no:of nodes in the queue , the no:of operations is still the same 3 , So it does not matter if it is a big cluster or small. The good thing is there will never be a loss of overseer , even if the designate does not become the leader (because of errors happening in the prioritizeOverseerNodes)
          Hide
          Noble Paul added a comment -

          This has the new stress test and the new approach. I plan to commit this soon

          Show
          Noble Paul added a comment - This has the new stress test and the new approach. I plan to commit this soon
          Hide
          Jessica Cheng Mallet added a comment -

          What if before step 2 nodeA dies. Is there a possibility that we'd end up with two Overseers (nodeB and nodeD)? What's done to prevent this from happening?

          Show
          Jessica Cheng Mallet added a comment - What if before step 2 nodeA dies. Is there a possibility that we'd end up with two Overseers (nodeB and nodeD)? What's done to prevent this from happening?
          Hide
          Noble Paul added a comment - - edited

          Is there a possibility that we'd end up with two Overseers (nodeB and nodeD)?

          No , only one can succeed. If nodeD succeeds it is great,
          If it does not,

          • nodeB will become Overseer
          • nodeD will rejoin at the back because the , "leader" node already exists (created by nodeB)
          • and nodeB will go through all the same steps as explained above
          Show
          Noble Paul added a comment - - edited Is there a possibility that we'd end up with two Overseers (nodeB and nodeD)? No , only one can succeed. If nodeD succeeds it is great, If it does not, nodeB will become Overseer nodeD will rejoin at the back because the , "leader" node already exists (created by nodeB) and nodeB will go through all the same steps as explained above
          Hide
          Jessica Cheng Mallet added a comment - - edited

          nodeD will rejoin at the back because the , "leader" node already exists (created by nodeB)

          When does this happen? The classic zk leader election recipe would not have checked for the leader node. In LeaderElector.checkIfIAmLeader, the node with the smallest seqId deletes the leader node without looking at it before writing itself down as the leader. If the first node that wrote itself down as the leader already passed the amILeader() check in the Overseer loop before the second node overwrites it, it is then possible that the first node will be active for at least one loop iteration while the second node becomes the new leader.

          One of the fundamental assumption of the zk leader election recipe (http://zookeeper.apache.org/doc/trunk/recipes.html#sc_leaderElection), which I believe solr follows, is that seqId is handed out by zookeeper and is therefore unique. This change will violate that assumption, so my question is--what's built around the code in this change that makes it ok to violate that assumption?

          and nodeB will go through all the same steps as explained above

          Even if say what you describe above worked, here nodeB gets re-prioritized down and then nodeC becomes the leader, we still don't have the right results. What happens then?

          Show
          Jessica Cheng Mallet added a comment - - edited nodeD will rejoin at the back because the , "leader" node already exists (created by nodeB) When does this happen? The classic zk leader election recipe would not have checked for the leader node. In LeaderElector.checkIfIAmLeader, the node with the smallest seqId deletes the leader node without looking at it before writing itself down as the leader. If the first node that wrote itself down as the leader already passed the amILeader() check in the Overseer loop before the second node overwrites it, it is then possible that the first node will be active for at least one loop iteration while the second node becomes the new leader. One of the fundamental assumption of the zk leader election recipe ( http://zookeeper.apache.org/doc/trunk/recipes.html#sc_leaderElection ), which I believe solr follows, is that seqId is handed out by zookeeper and is therefore unique. This change will violate that assumption, so my question is--what's built around the code in this change that makes it ok to violate that assumption? and nodeB will go through all the same steps as explained above Even if say what you describe above worked, here nodeB gets re-prioritized down and then nodeC becomes the leader, we still don't have the right results. What happens then?
          Hide
          Noble Paul added a comment -

          The classic recipe is tweaked a bit so that the churn of large no:of nodes are avoided in prioritizing another a node .In that case, the node id is created by the client instead of asking zk to create one. The checkIfIamLeader is modified in this new patch to take care of 2 nodes with same sequence Id.

          Show
          Noble Paul added a comment - The classic recipe is tweaked a bit so that the churn of large no:of nodes are avoided in prioritizing another a node .In that case, the node id is created by the client instead of asking zk to create one. The checkIfIamLeader is modified in this new patch to take care of 2 nodes with same sequence Id.
          Hide
          Jessica Cheng Mallet added a comment -

          I see that your new patch is trying to fix the "seq <= intSeqs.get(0)" case in LeaderElector, but the fix doesn't quite work. Note that the delete statement is meant to delete the old leader's node in case it hasn't expired yet, which is a possible scenario. If the old leader's node indeed hasn't expired, both nodeB and nodeD will fail your new statement.

          Show
          Jessica Cheng Mallet added a comment - I see that your new patch is trying to fix the "seq <= intSeqs.get(0)" case in LeaderElector, but the fix doesn't quite work. Note that the delete statement is meant to delete the old leader's node in case it hasn't expired yet, which is a possible scenario. If the old leader's node indeed hasn't expired, both nodeB and nodeD will fail your new statement.
          Hide
          Jessica Cheng Mallet added a comment - - edited

          Sorry that wasn't true. You were comparing election path, not the leader node. However, this still possibly doesn't work because sortSeqs extracts just the sequence number (n_0000000001) out of the entire node string and sorts based on that, so in fact the sort order of nodeB and nodeD might not be deterministic in different JVM (calls to zk.getChildren does not guarantee return list ordering, so even though the Collections.sort is stable, the original list is not), which makes this new if statement also non-deterministic.

          Show
          Jessica Cheng Mallet added a comment - - edited Sorry that wasn't true. You were comparing election path, not the leader node. However, this still possibly doesn't work because sortSeqs extracts just the sequence number (n_0000000001) out of the entire node string and sorts based on that, so in fact the sort order of nodeB and nodeD might not be deterministic in different JVM (calls to zk.getChildren does not guarantee return list ordering, so even though the Collections.sort is stable, the original list is not), which makes this new if statement also non-deterministic.
          Hide
          Noble Paul added a comment - - edited

          I haven't really gone into the implementation of the Arrays.sort() . But as long as the getChildren returns the nodes in the same order , the Arrays.sort() would give the same order right? (sort can be non-deterministic when the input is random right? )Because ZK does not sort based on the sequence number

          But again , this solution does not give 100% guarantee that nodeD would become the leader ,if the last step quit command is not executed . So, there is a very small possibility that the overseer is not a designate, but there will always be a leader. Only if the leader quits , because of an explicit rejoin coreadmin command, or if the node dies

          Show
          Noble Paul added a comment - - edited I haven't really gone into the implementation of the Arrays.sort() . But as long as the getChildren returns the nodes in the same order , the Arrays.sort() would give the same order right? (sort can be non-deterministic when the input is random right? )Because ZK does not sort based on the sequence number But again , this solution does not give 100% guarantee that nodeD would become the leader ,if the last step quit command is not executed . So, there is a very small possibility that the overseer is not a designate, but there will always be a leader. Only if the leader quits , because of an explicit rejoin coreadmin command, or if the node dies
          Hide
          Jessica Cheng Mallet added a comment - - edited

          The problem is I don't think getChildren is guaranteed to return nodes in the same order. Its javadoc states

          The list of children returned is not sorted and no guarantee is provided as to its natural or lexical order.

          If somehow getChildren doesn't return nodes in the same order (unless we can verify otherwise, and add this as a regression test against each zk upgrade since the API doesn't guarantee it), the sort can possibly get different ordering of nodeB and nodeD so that they both believe they're the top item in their own invocation, and we're back to the temporary two-Overseer case (for one loop iteration).

          Show
          Jessica Cheng Mallet added a comment - - edited The problem is I don't think getChildren is guaranteed to return nodes in the same order. Its javadoc states The list of children returned is not sorted and no guarantee is provided as to its natural or lexical order. If somehow getChildren doesn't return nodes in the same order (unless we can verify otherwise, and add this as a regression test against each zk upgrade since the API doesn't guarantee it), the sort can possibly get different ordering of nodeB and nodeD so that they both believe they're the top item in their own invocation, and we're back to the temporary two-Overseer case (for one loop iteration).
          Hide
          Jessica Cheng Mallet added a comment -

          Just checked zookeeper's code. Its children is held by HashSet in DataNode, which means that if you hit different instances of zookeeper in the ensemble, you may get different results ordering back.

          Show
          Jessica Cheng Mallet added a comment - Just checked zookeeper's code. Its children is held by HashSet in DataNode, which means that if you hit different instances of zookeeper in the ensemble, you may get different results ordering back.
          Hide
          Noble Paul added a comment -

          The sort is now going to be deterministic

          Show
          Noble Paul added a comment - The sort is now going to be deterministic
          Hide
          Anshum Gupta added a comment -

          RollingRestartTest.regularRestartTest() is commented out. If it’s not required, you might want to remove it (or uncomment it and let it run).

          Show
          Anshum Gupta added a comment - RollingRestartTest.regularRestartTest() is commented out. If it’s not required, you might want to remove it (or uncomment it and let it run).
          Hide
          Shalin Shekhar Mangar added a comment -

          RollingRestartTest.regularRestartTest() is commented out. If it’s not required, you might want to remove it (or uncomment it and let it run).

          Yes, it is not required in its current form. We can remove it.

          Show
          Shalin Shekhar Mangar added a comment - RollingRestartTest.regularRestartTest() is commented out. If it’s not required, you might want to remove it (or uncomment it and let it run). Yes, it is not required in its current form. We can remove it.
          Hide
          ASF subversion and git services added a comment -

          Commit 1603382 from Noble Paul in branch 'dev/trunk'
          [ https://svn.apache.org/r1603382 ]

          SOLR-6095 SolrCloud cluster can end up without an overseer with overseer roles

          Show
          ASF subversion and git services added a comment - Commit 1603382 from Noble Paul in branch 'dev/trunk' [ https://svn.apache.org/r1603382 ] SOLR-6095 SolrCloud cluster can end up without an overseer with overseer roles
          Hide
          ASF subversion and git services added a comment -

          Commit 1603383 from Noble Paul in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1603383 ]

          SOLR-6095 SolrCloud cluster can end up without an overseer with overseer roles

          Show
          ASF subversion and git services added a comment - Commit 1603383 from Noble Paul in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1603383 ] SOLR-6095 SolrCloud cluster can end up without an overseer with overseer roles
          Hide
          ASF subversion and git services added a comment -

          Commit 1603467 from Noble Paul in branch 'dev/trunk'
          [ https://svn.apache.org/r1603467 ]

          SOLR-6095 Uncaught Exception causing test failures

          Show
          ASF subversion and git services added a comment - Commit 1603467 from Noble Paul in branch 'dev/trunk' [ https://svn.apache.org/r1603467 ] SOLR-6095 Uncaught Exception causing test failures
          Hide
          ASF subversion and git services added a comment -

          Commit 1603468 from Noble Paul in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1603468 ]

          SOLR-6095 Uncaught Exception causing test failures

          Show
          ASF subversion and git services added a comment - Commit 1603468 from Noble Paul in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1603468 ] SOLR-6095 Uncaught Exception causing test failures
          Hide
          ASF subversion and git services added a comment -

          Commit 1604791 from Noble Paul in branch 'dev/trunk'
          [ https://svn.apache.org/r1604791 ]

          SOLR-6095 wait for http responses

          Show
          ASF subversion and git services added a comment - Commit 1604791 from Noble Paul in branch 'dev/trunk' [ https://svn.apache.org/r1604791 ] SOLR-6095 wait for http responses
          Hide
          ASF subversion and git services added a comment -

          Commit 1604792 from Noble Paul in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1604792 ]

          SOLR-6095 wait for http responses

          Show
          ASF subversion and git services added a comment - Commit 1604792 from Noble Paul in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1604792 ] SOLR-6095 wait for http responses

            People

            • Assignee:
              Noble Paul
              Reporter:
              Shalin Shekhar Mangar
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development