ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-599

Changes to FLE and QuorumCnxManager to support Observers

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.3.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Observers currently can only use electionAlg 0, which is not the default, supported leader implementation. This issue will fix it.

      1. ZOOKEEPER-599.patch
        37 kB
        Flavio Junqueira
      2. ZOOKEEPER-599.patch
        36 kB
        Flavio Junqueira
      3. ZOOKEEPER-599.patch
        35 kB
        Flavio Junqueira
      4. ZOOKEEPER-599.patch
        15 kB
        Flavio Junqueira

        Issue Links

          Activity

          Hide
          Flavio Junqueira added a comment -

          Attaching preliminary patch. Currently it adds code to FLE and QCM to support observers later. it does not change anything for observers yet.

          One comment, perhaps important, is that I'm considering that observers may not have identifiers in the future. If they have, it will force us to add every single observer to the configuration file of every participant, which might not be desirable. In the way I'm implementing, it does not matter whether observers have concrete identifiers or not.

          Show
          Flavio Junqueira added a comment - Attaching preliminary patch. Currently it adds code to FLE and QCM to support observers later. it does not change anything for observers yet. One comment, perhaps important, is that I'm considering that observers may not have identifiers in the future. If they have, it will force us to add every single observer to the configuration file of every participant, which might not be desirable. In the way I'm implementing, it does not matter whether observers have concrete identifiers or not.
          Hide
          Flavio Junqueira added a comment -

          This patch fixes the issue with observers and FastLeaderElection. It does mainly three things:

          1. It makes some modifications to FLE to support the OBSERVING state and to return responses to observers;
          2. It removes checks for electionAlg=0 when observers are being used;
          3. It modifies tests to use observers with FLE. In fact, I have changed QuorumBase to use FLE, which is something we have been talking about doing for a while. Most tests are now using FLE.

          All tests pass for me.

          Show
          Flavio Junqueira added a comment - This patch fixes the issue with observers and FastLeaderElection. It does mainly three things: It makes some modifications to FLE to support the OBSERVING state and to return responses to observers; It removes checks for electionAlg=0 when observers are being used; It modifies tests to use observers with FLE. In fact, I have changed QuorumBase to use FLE, which is something we have been talking about doing for a while. Most tests are now using FLE. All tests pass for me.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12426322/ZOOKEEPER-599.patch
          against trunk revision 884554.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/78/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/78/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/78/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12426322/ZOOKEEPER-599.patch against trunk revision 884554. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/78/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/78/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/78/console This message is automatically generated.
          Hide
          Flavio Junqueira added a comment -

          Checking the test report. Apparently ObserverQuorumHammerTest did not pass.

          Show
          Flavio Junqueira added a comment - Checking the test report. Apparently ObserverQuorumHammerTest did not pass.
          Hide
          Henry Robinson added a comment -

          Hi Flavio -

          This looks really good, thanks for taking this on.

          Can you just comment on the advantages of this approach over having the ResponderThread active in every LE algorithm? I see that that approach neatly decouples Observers from the LE algorithm (and so you wouldn't have to write Observer-specific code into each LE implementation); on the other hand your code doesn't require the extra thread. Either approach sounds ok to me, but I'd like to know your thoughts.

          Henry

          Show
          Henry Robinson added a comment - Hi Flavio - This looks really good, thanks for taking this on. Can you just comment on the advantages of this approach over having the ResponderThread active in every LE algorithm? I see that that approach neatly decouples Observers from the LE algorithm (and so you wouldn't have to write Observer-specific code into each LE implementation); on the other hand your code doesn't require the extra thread. Either approach sounds ok to me, but I'd like to know your thoughts. Henry
          Hide
          Flavio Junqueira added a comment -

          Hi Henry, Thanks for checking it out. Your feedback is important on this issue. In fact, if you happen to know why ObserverHammerQuorumTest is failing with this latest patch, I'd love to hear.

          I have a few reasons to prefer the approach in this patch rather than relying upon the ResponderThread. ResponderThread is legacy code, and we have been talking about deprecating that code. LE, and AFLE, although there is no final decision on making it happen yet. We need feedback from the community before deprecating, but this is at least what I have in mind.

          The responder thread uses UDP currently, and we have written this connection manager class to take care of the TCP connections we use for leader election. To me it makes sense to use that facility instead of adding a new TCP responder thread. However, to use the connection manager, we currently need to do it through FLE.

          Finally, I'd rather integrate all the logic for leader election instead of breaking up according to functionality. In fact, the changes to make observers work with FLE have been fairly simple so far, so I don't expect to have dramatic changes to make it happen, although I can't guarantee that changes will remain simple because I don't know yet why the observer hammer test is failing.

          Does it make sense?

          Show
          Flavio Junqueira added a comment - Hi Henry, Thanks for checking it out. Your feedback is important on this issue. In fact, if you happen to know why ObserverHammerQuorumTest is failing with this latest patch, I'd love to hear. I have a few reasons to prefer the approach in this patch rather than relying upon the ResponderThread. ResponderThread is legacy code, and we have been talking about deprecating that code. LE, and AFLE, although there is no final decision on making it happen yet. We need feedback from the community before deprecating, but this is at least what I have in mind. The responder thread uses UDP currently, and we have written this connection manager class to take care of the TCP connections we use for leader election. To me it makes sense to use that facility instead of adding a new TCP responder thread. However, to use the connection manager, we currently need to do it through FLE. Finally, I'd rather integrate all the logic for leader election instead of breaking up according to functionality. In fact, the changes to make observers work with FLE have been fairly simple so far, so I don't expect to have dramatic changes to make it happen, although I can't guarantee that changes will remain simple because I don't know yet why the observer hammer test is failing. Does it make sense?
          Hide
          Flavio Junqueira added a comment -

          Found an issue with the initialization of votes. I fixed it and I'm now resubmitting the patch.

          Show
          Flavio Junqueira added a comment - Found an issue with the initialization of votes. I fixed it and I'm now resubmitting the patch.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12426347/ZOOKEEPER-599.patch
          against trunk revision 884554.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/79/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/79/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/79/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12426347/ZOOKEEPER-599.patch against trunk revision 884554. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/79/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/79/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/79/console This message is automatically generated.
          Hide
          Flavio Junqueira added a comment -

          It seems that trunk is failing on the same test and due to the same issue. Cancelling this patch until we fix the testHammer issue.

          Show
          Flavio Junqueira added a comment - It seems that trunk is failing on the same test and due to the same issue. Cancelling this patch until we fix the testHammer issue.
          Hide
          Flavio Junqueira added a comment -

          Trying again, given that ZOOKEEPER-597 got in. I'm a little confused about when 597 got in, though. It looks like Mahadev committed it several days ago.

          Show
          Flavio Junqueira added a comment - Trying again, given that ZOOKEEPER-597 got in. I'm a little confused about when 597 got in, though. It looks like Mahadev committed it several days ago.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12426347/ZOOKEEPER-599.patch
          against trunk revision 885394.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/81/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/81/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/81/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12426347/ZOOKEEPER-599.patch against trunk revision 885394. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/81/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/81/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/81/console This message is automatically generated.
          Hide
          Gustavo Niemeyer added a comment -

          I suppose the TODO below is referring to the "path" variable which is passed in as an output variable to PyArg_ParseTuple right below. The TODO may be removed, since the code is right. Code using PyArg_ParseTuple will borrow the reference from the calling code, since there's a stack behind the call to the enclosing function (pyzoo_get_children in this case) which won't go away until the function returns.

          Index: src/contrib/zkpython/src/c/zookeeper.c
          ===================================================================
          — src/contrib/zkpython/src/c/zookeeper.c (revision 885582)
          +++ src/contrib/zkpython/src/c/zookeeper.c (working copy)
          @@ -774,8 +774,6 @@

          static PyObject *pyzoo_get_children(PyObject *self, PyObject *args)
          {

          • // TO DO: Does Python copy the string or the reference? If it's the former
          • // we should free the String_vector
            int zkhid;
            char *path;
            PyObject *watcherfn = Py_None;
          Show
          Gustavo Niemeyer added a comment - I suppose the TODO below is referring to the "path" variable which is passed in as an output variable to PyArg_ParseTuple right below. The TODO may be removed, since the code is right. Code using PyArg_ParseTuple will borrow the reference from the calling code, since there's a stack behind the call to the enclosing function (pyzoo_get_children in this case) which won't go away until the function returns. Index: src/contrib/zkpython/src/c/zookeeper.c =================================================================== — src/contrib/zkpython/src/c/zookeeper.c (revision 885582) +++ src/contrib/zkpython/src/c/zookeeper.c (working copy) @@ -774,8 +774,6 @@ static PyObject *pyzoo_get_children(PyObject *self, PyObject *args) { // TO DO: Does Python copy the string or the reference? If it's the former // we should free the String_vector int zkhid; char *path; PyObject *watcherfn = Py_None;
          Hide
          Gustavo Niemeyer added a comment -

          I'm sorry.. I clicked on "submit a patch", but didn't notice that the context for the patch was the JIRA I was looking at, rather than the project itself.

          Please disregard the above comment.

          Show
          Gustavo Niemeyer added a comment - I'm sorry.. I clicked on "submit a patch", but didn't notice that the context for the patch was the JIRA I was looking at, rather than the project itself. Please disregard the above comment.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12426347/ZOOKEEPER-599.patch
          against trunk revision 885394.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/82/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/82/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/82/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12426347/ZOOKEEPER-599.patch against trunk revision 885394. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/82/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/82/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/82/console This message is automatically generated.
          Hide
          Benjamin Reed added a comment -

          looks good flavio. i'd like to request two changes:
          1) can you remove or expand debug message so that they are more informative? for example, "if(recvqueue.size() == 0) LOG.debug("Message: " + n.sid);" doesn't describe what is going on or why the message is being logged.

          2) can we simplify:
          + Random r = new Random();
          + do

          { + sid = 10000L + ((long) r.nextInt(10000)); + }

          while(self.getVotingView().containsKey(sid)
          + || senderWorkerMap.containsKey(sid));
          if we guarantee that followers are non-negative, we could pick negative sids for observers, so that we don't have to worry about observers colliding with follower sids. it might also help later for debugging.

          Show
          Benjamin Reed added a comment - looks good flavio. i'd like to request two changes: 1) can you remove or expand debug message so that they are more informative? for example, "if(recvqueue.size() == 0) LOG.debug("Message: " + n.sid);" doesn't describe what is going on or why the message is being logged. 2) can we simplify: + Random r = new Random(); + do { + sid = 10000L + ((long) r.nextInt(10000)); + } while(self.getVotingView().containsKey(sid) + || senderWorkerMap.containsKey(sid)); if we guarantee that followers are non-negative, we could pick negative sids for observers, so that we don't have to worry about observers colliding with follower sids. it might also help later for debugging.
          Hide
          Flavio Junqueira added a comment -

          Thanks for the comments, Ben. I'm submitting a new patch that hopefully fixes the issues you raised.

          Show
          Flavio Junqueira added a comment - Thanks for the comments, Ben. I'm submitting a new patch that hopefully fixes the issues you raised.
          Hide
          Flavio Junqueira added a comment -

          Hudson is behaving weirdly. I can't see the latest report among the jira posts, but I checked manually and it did run it:

          http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/83/

          and I can't understand from the console output why it failed. I suspect that the strange behavior is related to the trunk failures I've been seeing. In any case, I'll resubmit to see what happens.

          Show
          Flavio Junqueira added a comment - Hudson is behaving weirdly. I can't see the latest report among the jira posts, but I checked manually and it did run it: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/83/ and I can't understand from the console output why it failed. I suspect that the strange behavior is related to the trunk failures I've been seeing. In any case, I'll resubmit to see what happens.
          Hide
          Flavio Junqueira added a comment -

          Re-trying hudson.

          Show
          Flavio Junqueira added a comment - Re-trying hudson.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12427835/ZOOKEEPER-599.patch
          against trunk revision 889848.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/26/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/26/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/26/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12427835/ZOOKEEPER-599.patch against trunk revision 889848. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/26/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/26/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/26/console This message is automatically generated.
          Hide
          Benjamin Reed added a comment -

          Committed revision 891034.

          Show
          Benjamin Reed added a comment - Committed revision 891034.
          Hide
          Hudson added a comment -

          Integrated in ZooKeeper-trunk #628 (See http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/628/)
          . Changes to FLE and QuorumCnxManager to support Observers
          ZOOKEEPER-506. QuorumBase should use default leader election

          Show
          Hudson added a comment - Integrated in ZooKeeper-trunk #628 (See http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/628/ ) . Changes to FLE and QuorumCnxManager to support Observers ZOOKEEPER-506 . QuorumBase should use default leader election

            People

            • Assignee:
              Flavio Junqueira
              Reporter:
              Flavio Junqueira
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development