Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-5689

On reconnect, ZkController cancels election on first context rather than latest

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 4.6.1, 4.7, 6.0
    • Fix Version/s: 4.7, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      I haven't tested this yet, so I could be wrong, but this is my reading of the code:
      During init:

      ElectionContext context = new OverseerElectionContext(zkClient, overseer, getNodeName());
      overseerElector.setup(context);
      overseerElector.joinElection(context, false);
      

      On reconnect:

      ElectionContext context = new OverseerElectionContext(zkClient,overseer, getNodeName());
                    
      ElectionContext prevContext = overseerElector.getContext();
      if (prevContext != null) {
        prevContext.cancelElection();
      }
                    
      overseerElector.joinElection(context, true);
      

      setup doesn't appear to be called on reconnect, so the new context is never set and the first context gets cancelled over and over.

      A call to overseerElector.setup(context); before joinElection in the reconnect case would address this.

      1. SOLR-5689.patch
        0.6 kB
        Shalin Shekhar Mangar

        Activity

        Hide
        dancollins Daniel Collins added a comment -

        My understanding of what `LeaderElector.setup()` does is that it just creates the `/overseer_elect/election` "directory" in ZK. This isn't ephemeral, so in reality should only be a one-off job? Unless ZK has been wiped whilst the node was disconnected from ZK, that directory should still be there. It shouldn't hurt to add in the call to setup in reconnect, but I don't believe it is necessary.

        cancelElection() removes the `leaderSeqPath` which is the ephemeral node(s) under that "directory", e.g. "19127283862405127-xxxxxxx:yyyyy_solr-n_0000000368" in my case.

        Show
        dancollins Daniel Collins added a comment - My understanding of what `LeaderElector.setup()` does is that it just creates the `/overseer_elect/election` "directory" in ZK. This isn't ephemeral, so in reality should only be a one-off job? Unless ZK has been wiped whilst the node was disconnected from ZK, that directory should still be there. It shouldn't hurt to add in the call to setup in reconnect, but I don't believe it is necessary. cancelElection() removes the `leaderSeqPath` which is the ephemeral node(s) under that "directory", e.g. "19127283862405127-xxxxxxx:yyyyy_solr-n_0000000368" in my case.
        Hide
        markrmiller@gmail.com Mark Miller added a comment -

        It also sets the latest context on the elector though - which we want to make sure is always the latest so that if for some reason we are asked to join the election again and are already participating, we cancel our participation first.

        Show
        markrmiller@gmail.com Mark Miller added a comment - It also sets the latest context on the elector though - which we want to make sure is always the latest so that if for some reason we are asked to join the election again and are already participating, we cancel our participation first.
        Hide
        dancollins Daniel Collins added a comment -

        DOH, my bad, missed that line, too used to expecting whitespace line between bracket and first code statement, must be a bug in my brain's Java parser.

        Show
        dancollins Daniel Collins added a comment - DOH, my bad, missed that line, too used to expecting whitespace line between bracket and first code statement, must be a bug in my brain's Java parser.
        Hide
        shalinmangar Shalin Shekhar Mangar added a comment -

        Trivial fix attached. I'll commit once the test suite succeeds.

        Show
        shalinmangar Shalin Shekhar Mangar added a comment - Trivial fix attached. I'll commit once the test suite succeeds.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1567049 from shalin@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1567049 ]

        SOLR-5689: On reconnect, ZkController cancels election on first context rather than latest

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1567049 from shalin@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1567049 ] SOLR-5689 : On reconnect, ZkController cancels election on first context rather than latest
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1567050 from shalin@apache.org in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1567050 ]

        SOLR-5689: On reconnect, ZkController cancels election on first context rather than latest

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1567050 from shalin@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1567050 ] SOLR-5689 : On reconnect, ZkController cancels election on first context rather than latest
        Hide
        shalinmangar Shalin Shekhar Mangar added a comment -

        Thanks Gregory!

        Show
        shalinmangar Shalin Shekhar Mangar added a comment - Thanks Gregory!

          People

          • Assignee:
            shalinmangar Shalin Shekhar Mangar
            Reporter:
            gchanan Gregory Chanan
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development