Solr
  1. Solr
  2. SOLR-5689

On reconnect, ZkController cancels election on first context rather than latest

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.6.1, 4.7, 6.0
    • Fix Version/s: 4.7, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      I haven't tested this yet, so I could be wrong, but this is my reading of the code:
      During init:

      ElectionContext context = new OverseerElectionContext(zkClient, overseer, getNodeName());
      overseerElector.setup(context);
      overseerElector.joinElection(context, false);
      

      On reconnect:

      ElectionContext context = new OverseerElectionContext(zkClient,overseer, getNodeName());
                    
      ElectionContext prevContext = overseerElector.getContext();
      if (prevContext != null) {
        prevContext.cancelElection();
      }
                    
      overseerElector.joinElection(context, true);
      

      setup doesn't appear to be called on reconnect, so the new context is never set and the first context gets cancelled over and over.

      A call to overseerElector.setup(context); before joinElection in the reconnect case would address this.

      1. SOLR-5689.patch
        0.6 kB
        Shalin Shekhar Mangar

        Activity

        Hide
        Daniel Collins added a comment -

        My understanding of what `LeaderElector.setup()` does is that it just creates the `/overseer_elect/election` "directory" in ZK. This isn't ephemeral, so in reality should only be a one-off job? Unless ZK has been wiped whilst the node was disconnected from ZK, that directory should still be there. It shouldn't hurt to add in the call to setup in reconnect, but I don't believe it is necessary.

        cancelElection() removes the `leaderSeqPath` which is the ephemeral node(s) under that "directory", e.g. "19127283862405127-xxxxxxx:yyyyy_solr-n_0000000368" in my case.

        Show
        Daniel Collins added a comment - My understanding of what `LeaderElector.setup()` does is that it just creates the `/overseer_elect/election` "directory" in ZK. This isn't ephemeral, so in reality should only be a one-off job? Unless ZK has been wiped whilst the node was disconnected from ZK, that directory should still be there. It shouldn't hurt to add in the call to setup in reconnect, but I don't believe it is necessary. cancelElection() removes the `leaderSeqPath` which is the ephemeral node(s) under that "directory", e.g. "19127283862405127-xxxxxxx:yyyyy_solr-n_0000000368" in my case.
        Hide
        Mark Miller added a comment -

        It also sets the latest context on the elector though - which we want to make sure is always the latest so that if for some reason we are asked to join the election again and are already participating, we cancel our participation first.

        Show
        Mark Miller added a comment - It also sets the latest context on the elector though - which we want to make sure is always the latest so that if for some reason we are asked to join the election again and are already participating, we cancel our participation first.
        Hide
        Daniel Collins added a comment -

        DOH, my bad, missed that line, too used to expecting whitespace line between bracket and first code statement, must be a bug in my brain's Java parser.

        Show
        Daniel Collins added a comment - DOH, my bad, missed that line, too used to expecting whitespace line between bracket and first code statement, must be a bug in my brain's Java parser.
        Hide
        Shalin Shekhar Mangar added a comment -

        Trivial fix attached. I'll commit once the test suite succeeds.

        Show
        Shalin Shekhar Mangar added a comment - Trivial fix attached. I'll commit once the test suite succeeds.
        Hide
        ASF subversion and git services added a comment -

        Commit 1567049 from shalin@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1567049 ]

        SOLR-5689: On reconnect, ZkController cancels election on first context rather than latest

        Show
        ASF subversion and git services added a comment - Commit 1567049 from shalin@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1567049 ] SOLR-5689 : On reconnect, ZkController cancels election on first context rather than latest
        Hide
        ASF subversion and git services added a comment -

        Commit 1567050 from shalin@apache.org in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1567050 ]

        SOLR-5689: On reconnect, ZkController cancels election on first context rather than latest

        Show
        ASF subversion and git services added a comment - Commit 1567050 from shalin@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1567050 ] SOLR-5689 : On reconnect, ZkController cancels election on first context rather than latest
        Hide
        Shalin Shekhar Mangar added a comment -

        Thanks Gregory!

        Show
        Shalin Shekhar Mangar added a comment - Thanks Gregory!

          People

          • Assignee:
            Shalin Shekhar Mangar
            Reporter:
            Gregory Chanan
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development