Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      We need to add metadata into zookeeper about who is the leader for each shard, and have some kind of leader election.

      1. SOLR-2752.patch
        46 kB
        Mark Miller
      2. SOLR-2752.patch
        38 kB
        Mark Miller
      3. SOLR-2752.patch
        34 kB
        Mark Miller
      4. SOLR-2752.patch
        20 kB
        Mark Miller
      5. SOLR-2752.patch
        17 kB
        Mark Miller

        Activity

        Hide
        Mark Miller added a comment -

        I've hacked out an initial rough patch for this. More tests, refactoring, thinking, etc to come.

        Adds a new /collections/

        {collection}/leader_elect node.

        When a core registers, it creates a new ephemeral node under /collections/{collection}

        /leader_elect/

        {shard}/election/

        eg

        /collections/{collection}/leader_elect/{shard}

        /election/n_0000000001

        If that is the lowest n_*, the core sets itself as the leader in /collections/

        {collection}

        /leader_elect/

        {shard}

        /leader
        If that is not the lowest n_*, the core puts a watch on the node before it. If that node goes down, the core initiates the leader election process again, see's if it's the lowest n, if so it's the leader, else make a new watch on the n_* node before it.

        Rough early exploration stuff, more to follow.

        Show
        Mark Miller added a comment - I've hacked out an initial rough patch for this. More tests, refactoring, thinking, etc to come. Adds a new /collections/ {collection}/leader_elect node. When a core registers, it creates a new ephemeral node under /collections/{collection} /leader_elect/ {shard}/election/ eg /collections/{collection}/leader_elect/{shard} /election/n_0000000001 If that is the lowest n_*, the core sets itself as the leader in /collections/ {collection} /leader_elect/ {shard} /leader If that is not the lowest n_*, the core puts a watch on the node before it. If that node goes down, the core initiates the leader election process again, see's if it's the lowest n, if so it's the leader, else make a new watch on the n_* node before it. Rough early exploration stuff, more to follow.
        Hide
        Mark Miller added a comment -

        new patch - much stronger test, a couple fixes, refactor most of the leader election code into its own class.

        Show
        Mark Miller added a comment - new patch - much stronger test, a couple fixes, refactor most of the leader election code into its own class.
        Hide
        Mark Miller added a comment -

        Just a quick correction to first comment - cores create an ephemeral|sequential node - not just ephemeral.

        Show
        Mark Miller added a comment - Just a quick correction to first comment - cores create an ephemeral|sequential node - not just ephemeral.
        Hide
        Mark Miller added a comment -

        Another new patch:

        I moved SolrZooKeeper to the org.apache.zookeeper package so that I could add a simulated timeout method for tests.

        I also wrote a new test that starts up a bunch of replicas and then times out the leader. After waiting for the leader to reconnect, all of the other replicas are killed and I check that the first leader is again the leader. I wrote this test because I knew it would fail and that on reconnecting, clients don't jump back into the leader election process.

        So I also added to the client reconnection impl - on reconnect, all SolrCores are re-registered. This also has the advantage that any SolrCores that where created while the connection was down are put into play. That allows the new test to pass.

        Show
        Mark Miller added a comment - Another new patch: I moved SolrZooKeeper to the org.apache.zookeeper package so that I could add a simulated timeout method for tests. I also wrote a new test that starts up a bunch of replicas and then times out the leader. After waiting for the leader to reconnect, all of the other replicas are killed and I check that the first leader is again the leader. I wrote this test because I knew it would fail and that on reconnecting, clients don't jump back into the leader election process. So I also added to the client reconnection impl - on reconnect, all SolrCores are re-registered. This also has the advantage that any SolrCores that where created while the connection was down are put into play. That allows the new test to pass.
        Hide
        Mark Miller added a comment -

        feeling motivated I guess - another patch with a bunch of polish

        Show
        Mark Miller added a comment - feeling motivated I guess - another patch with a bunch of polish
        Hide
        Mark Miller added a comment -

        Another patch I suppose - rename existing tests to LeaderElectionIntegrationTest and a new LeaderElectionTest that just tests the LeaderElector class itself. Also a bit more javadoc.

        Show
        Mark Miller added a comment - Another patch I suppose - rename existing tests to LeaderElectionIntegrationTest and a new LeaderElectionTest that just tests the LeaderElector class itself. Also a bit more javadoc.
        Hide
        Mark Miller added a comment -

        I think we need a try catch around setting the watch on the next guy in line - he may have been cut down between seeing he was next and setting the watch - we probably want to check if we are the leader again if an exception is thrown setting the watch.

        Tried making a stress test that could catch this, but tough window to hit...

        Show
        Mark Miller added a comment - I think we need a try catch around setting the watch on the next guy in line - he may have been cut down between seeing he was next and setting the watch - we probably want to check if we are the leader again if an exception is thrown setting the watch. Tried making a stress test that could catch this, but tough window to hit...
        Hide
        Mark Miller added a comment -

        While working on the tests for this I ran into and filed a locale bug in zk: https://issues.apache.org/jira/browse/ZOOKEEPER-1206

        Show
        Mark Miller added a comment - While working on the tests for this I ran into and filed a locale bug in zk: https://issues.apache.org/jira/browse/ZOOKEEPER-1206
        Hide
        Mark Miller added a comment -

        I've committed this early work to the solrcloud branch

        Show
        Mark Miller added a comment - I've committed this early work to the solrcloud branch

          People

          • Assignee:
            Mark Miller
            Reporter:
            Yonik Seeley
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development