Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: HA Branch (HDFS-1623)
    • Component/s: ha
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      ZKClient needs to support the following capabilities:

      1. Ability to create a znode for co-ordinating leader election.
      2. Ability to monitor and receive call backs when active znode status changes.
      3. Ability to get information about the active node.
      1. Zookeeper based Leader Election and Monitoring Library.pdf
        60 kB
        Bikas Saha
      2. HDFS-2681.txt
        49 kB
        Suresh Srinivas
      3. HDFS-2681.txt
        50 kB
        Suresh Srinivas
      4. HDFS-2681.HDFS-1623.patch
        33 kB
        Bikas Saha
      5. HDFS-2681.HDFS-1623.patch
        36 kB
        Bikas Saha
      6. HDFS-2681.HDFS-1623.patch
        38 kB
        Bikas Saha
      7. HDFS-2681.HDFS-1623.patch
        40 kB
        Bikas Saha
      8. HDFS-2681.HDFS-1623.patch
        41 kB
        Bikas Saha
      9. HDFS-2681.HDFS-1623.patch
        41 kB
        Bikas Saha
      10. HDFS-2681.HDFS-1623.patch
        41 kB
        Bikas Saha
      11. HDFS-2681.HDFS-1623.patch
        48 kB
        Bikas Saha
      12. HDFS-2681.HDFS-1623.patch
        49 kB
        Bikas Saha
      13. HDFS-2681.HDFS-1623.patch
        50 kB
        Bikas Saha

        Issue Links

          Activity

          Hide
          Todd Lipcon added a comment -

          Hey Suresh. I've been thinking about the design for the ZK-based failover controller but have yet to post a design. Let me write something up and post it to HDFS-2185 today. It sounds like we should coordinate work.

          Show
          Todd Lipcon added a comment - Hey Suresh. I've been thinking about the design for the ZK-based failover controller but have yet to post a design. Let me write something up and post it to HDFS-2185 today. It sounds like we should coordinate work.
          Hide
          Todd Lipcon added a comment -

          Linking to HDFS-2185 (ZK Failover Controller). I posted a bit of a design there - this JIRA fits into the design as the component that watches ZK and provides ZK-related callbacks to the state machine.

          Show
          Todd Lipcon added a comment - Linking to HDFS-2185 (ZK Failover Controller). I posted a bit of a design there - this JIRA fits into the design as the component that watches ZK and provides ZK-related callbacks to the state machine.
          Hide
          Suresh Srinivas added a comment -

          Todd, the goal of this jira is to write a library that could be used by failover controller.

          Show
          Suresh Srinivas added a comment - Todd, the goal of this jira is to write a library that could be used by failover controller.
          Hide
          Suresh Srinivas added a comment -

          I found that ZK-1080 added support that I intended to add this. Will take a look at it as well.

          Show
          Suresh Srinivas added a comment - I found that ZK-1080 added support that I intended to add this. Will take a look at it as well.
          Hide
          Aaron T. Myers added a comment -

          Hey Suresh, you might also check out ZOOKEEPER-1095, as ZK-1080 hasn't been committed yet.

          Show
          Aaron T. Myers added a comment - Hey Suresh, you might also check out ZOOKEEPER-1095 , as ZK-1080 hasn't been committed yet.
          Hide
          Bikas Saha added a comment -

          Hi,
          I have submitted a short design document for this library and a patch that implements the functionality. Both are up for review and comments.
          Thanks

          Show
          Bikas Saha added a comment - Hi, I have submitted a short design document for this library and a patch that implements the functionality. Both are up for review and comments. Thanks
          Hide
          Bikas Saha added a comment -

          please ignore inconsistencies in comments while I fix them.

          Show
          Bikas Saha added a comment - please ignore inconsistencies in comments while I fix them.
          Hide
          Bikas Saha added a comment -

          New patch with inconsistent names fixed across comments and code.

          Show
          Bikas Saha added a comment - New patch with inconsistent names fixed across comments and code.
          Hide
          Suresh Srinivas added a comment -

          Some comments for the first version of the patch:

          1. General - How is multithreaded use for this library handled?
          2. ActiveStandbyElector
            • Add javadoc to the class on how to use the class. Also callback interface can be described in more detail on when the call back is made and perhaps some description of what is expected of the app. notifyError() particularly needs better documentation on when to expect this callback.
            • Please document the enumerations.
            • Constructor should check for null - at least for call back passed. Otherwise you will get null pointer exception.
            • joinElection() you may want to copy the byte[] data passed or at least document that the data[] must not be changed by the caller.
            • #getNewZooKeeper() seems unnecessary and can be removed. Creation of ZooKeeper() can be moved to createConnection() it self.
            • Make member variable that are initialized only once in the constructor final.
            • activeData could be better name for appData.
            • Please check if all the params are documented in methods. For example constructor is missing one of the params in the doc. Same is true with exceptions thrown.
            • quitElection() should not check zkClient non null, as terminateConnection already checks it.
            • getActiveData() - how about not throwing KeeperException? Also ActiveNotFoundException should wrap the exception caught from ZK.

          I have not complete the review. These are some prelimiary comments

          Show
          Suresh Srinivas added a comment - Some comments for the first version of the patch: General - How is multithreaded use for this library handled? ActiveStandbyElector Add javadoc to the class on how to use the class. Also callback interface can be described in more detail on when the call back is made and perhaps some description of what is expected of the app. notifyError() particularly needs better documentation on when to expect this callback. Please document the enumerations. Constructor should check for null - at least for call back passed. Otherwise you will get null pointer exception. joinElection() you may want to copy the byte[] data passed or at least document that the data[] must not be changed by the caller. #getNewZooKeeper() seems unnecessary and can be removed. Creation of ZooKeeper() can be moved to createConnection() it self. Make member variable that are initialized only once in the constructor final. activeData could be better name for appData. Please check if all the params are documented in methods. For example constructor is missing one of the params in the doc. Same is true with exceptions thrown. quitElection() should not check zkClient non null, as terminateConnection already checks it. getActiveData() - how about not throwing KeeperException? Also ActiveNotFoundException should wrap the exception caught from ZK. I have not complete the review. These are some prelimiary comments
          Hide
          Bikas Saha added a comment -

          >> General - How is multithreaded use for this library handled?
          All public methods are synchronized

          >>Add javadoc to the class on how to use the class. Also callback interface can be described in more detail on when the call back is made and perhaps some description of what is expected of the app. notifyError() particularly needs better documentation on when to expect this callback.
          Please document the enumerations.
          Done in second patch

          >>Constructor should check for null - at least for call back passed. Otherwise you will get null pointer exception.
          Done

          >>joinElection() you may want to copy the byte[] data passed or at least document that the data[] must not be changed by the caller.
          Done

          >>#getNewZooKeeper() seems unnecessary and can be removed. Creation of ZooKeeper() can be moved to createConnection() it self.
          This is to pass in a mock zookeeper for testing

          >>Make member variable that are initialized only once in the constructor final.
          Done in second patch

          >>activeData could be better name for appData.
          All app's can pass in data (which may go into future per app nodes). Only active app's data makes it to the lock. So I think the name is good.

          >>Please check if all the params are documented in methods. For example constructor is missing one of the params in the doc. Same is true with exceptions thrown.
          Done in second patch

          >>quitElection() should not check zkClient non null, as terminateConnection already checks it.
          Yeah. I forgot to remove that check after I refactored stuff into the reset() method

          >>getActiveData() - how about not throwing KeeperException? Also ActiveNotFoundException should wrap the exception caught from ZK.
          Its hard to differentiate exceptions inside KeeperException. There is not much the elector can do about them. The only commonly expected exception would be getting leader data when no leader exists and that has been handled as part of the elector API via a new exception.

          Show
          Bikas Saha added a comment - >> General - How is multithreaded use for this library handled? All public methods are synchronized >>Add javadoc to the class on how to use the class. Also callback interface can be described in more detail on when the call back is made and perhaps some description of what is expected of the app. notifyError() particularly needs better documentation on when to expect this callback. Please document the enumerations. Done in second patch >>Constructor should check for null - at least for call back passed. Otherwise you will get null pointer exception. Done >>joinElection() you may want to copy the byte[] data passed or at least document that the data[] must not be changed by the caller. Done >>#getNewZooKeeper() seems unnecessary and can be removed. Creation of ZooKeeper() can be moved to createConnection() it self. This is to pass in a mock zookeeper for testing >>Make member variable that are initialized only once in the constructor final. Done in second patch >>activeData could be better name for appData. All app's can pass in data (which may go into future per app nodes). Only active app's data makes it to the lock. So I think the name is good. >>Please check if all the params are documented in methods. For example constructor is missing one of the params in the doc. Same is true with exceptions thrown. Done in second patch >>quitElection() should not check zkClient non null, as terminateConnection already checks it. Yeah. I forgot to remove that check after I refactored stuff into the reset() method >>getActiveData() - how about not throwing KeeperException? Also ActiveNotFoundException should wrap the exception caught from ZK. Its hard to differentiate exceptions inside KeeperException. There is not much the elector can do about them. The only commonly expected exception would be getting leader data when no leader exists and that has been handled as part of the elector API via a new exception.
          Hide
          Bikas Saha added a comment -

          Patch that takes care of review comments.

          Show
          Bikas Saha added a comment - Patch that takes care of review comments.
          Hide
          Suresh Srinivas added a comment -

          Really nice job with the tests. May be you can also create a real ZK based test. This could be done in another jira.

          Here are the comments:
          Comments:

          1. Please use System.arraycopy() instead of byte[] clone.
          2. Split process into two different methods processZkEvent and processZnodeEvent?
          1. Test
          2. Change the method name to init(). Annotate it @Before. It will be automatically called before tests.
          3. Use @Expected for tests that expect exception
          4. Add class level javadoc.
          5. #Init need not catch IOException. Just throw it. The test will fail.
          6. You can reduce several lines of code by using a static byte[] DATA;
          7. can you add test where jointElection() is called twice and the second call is NO-OP
          8. Many times where processResult is called back to back can be in for loop
          9. Why should 4 errors of connection loss result in fatalError?
          10. testStatNodeError already covers some part of testCreateNodeResultRetryBecomeActive
          11. Instead of catching InterruptedException, you can just throw it
          Show
          Suresh Srinivas added a comment - Really nice job with the tests. May be you can also create a real ZK based test. This could be done in another jira. Here are the comments: Comments: Please use System.arraycopy() instead of byte[] clone. Split process into two different methods processZkEvent and processZnodeEvent? Test Change the method name to init(). Annotate it @Before. It will be automatically called before tests. Use @Expected for tests that expect exception Add class level javadoc. #Init need not catch IOException. Just throw it. The test will fail. You can reduce several lines of code by using a static byte[] DATA; can you add test where jointElection() is called twice and the second call is NO-OP Many times where processResult is called back to back can be in for loop Why should 4 errors of connection loss result in fatalError? testStatNodeError already covers some part of testCreateNodeResultRetryBecomeActive Instead of catching InterruptedException, you can just throw it
          Hide
          Bikas Saha added a comment -

          Some more comments from chat session

          1. Change the method name to init(). Annotate it @Before. It will be automatically called before tests.
          2. Use @Expected for tests that expect exception
            Done
          3. Add class level javadoc.
            This is already in the second patch. If by this you mean comments before the class declaration.
          4. #Init need not catch IOException. Just throw it. The test will fail.
            I know it wont throw the exception in the test. But it has to be handled to keep the compiler happy. So I handled it locally there instead of adding "throws IOException in every method of the test"
          5. You can reduce several lines of code by using a static byte[] DATA;
            done
          6. can you add test where jointElection() is called twice and the second call is NO-OP
          7. Many times where processResult is called back to back can be in for loop
            this helps me walk the scenarios better than in a loop
          8. Why should 4 errors of connection loss result in fatalError?
            Because the elector has tried its best to connect to Zookeeper and failed. We can revisit this based observed failures at a later time.
          9. testStatNodeError already covers some part of testCreateNodeResultRetryBecomeActive
            yes. thats because it is trying to walk through a logical scenario. so I let it be.
          10. Instead of catching InterruptedException, you can just throw it
            same code cleanliness as above. I know this exception will not get thrown in the test. so want to make local changes to keep the compiler happy.

          >>Please use System.arraycopy() instead of byte[] clone.
          done.
          >>Split process into two different methods processZkEvent and processZnodeEvent?
          The function is still small enough to let it be. Will do this later when more logic might get added if we do group participation. At that point processZnodeEvent itself will need division into lock znode and parent znode.

          >>can you add test where jointElection() is called twice and the second call is NO-OP
          it was there in test processResult callback but got changed to enterNeutralMode when I changed that test. now I enhanced testCreateNodeResultBecomeActive() to check that there is no double master call and added another test to check that there is no double slave call for expected scenarios. now all 3 states are covered.

          Will upload the patch with all these changes.
          Thanks

          Show
          Bikas Saha added a comment - Some more comments from chat session Change the method name to init(). Annotate it @Before. It will be automatically called before tests. Use @Expected for tests that expect exception Done Add class level javadoc. This is already in the second patch. If by this you mean comments before the class declaration. #Init need not catch IOException. Just throw it. The test will fail. I know it wont throw the exception in the test. But it has to be handled to keep the compiler happy. So I handled it locally there instead of adding "throws IOException in every method of the test" You can reduce several lines of code by using a static byte[] DATA; done can you add test where jointElection() is called twice and the second call is NO-OP Many times where processResult is called back to back can be in for loop this helps me walk the scenarios better than in a loop Why should 4 errors of connection loss result in fatalError? Because the elector has tried its best to connect to Zookeeper and failed. We can revisit this based observed failures at a later time. testStatNodeError already covers some part of testCreateNodeResultRetryBecomeActive yes. thats because it is trying to walk through a logical scenario. so I let it be. Instead of catching InterruptedException, you can just throw it same code cleanliness as above. I know this exception will not get thrown in the test. so want to make local changes to keep the compiler happy. >>Please use System.arraycopy() instead of byte[] clone. done. >>Split process into two different methods processZkEvent and processZnodeEvent? The function is still small enough to let it be. Will do this later when more logic might get added if we do group participation. At that point processZnodeEvent itself will need division into lock znode and parent znode. >>can you add test where jointElection() is called twice and the second call is NO-OP it was there in test processResult callback but got changed to enterNeutralMode when I changed that test. now I enhanced testCreateNodeResultBecomeActive() to check that there is no double master call and added another test to check that there is no double slave call for expected scenarios. now all 3 states are covered. Will upload the patch with all these changes. Thanks
          Hide
          Bikas Saha added a comment -

          new patch will all changes

          Show
          Bikas Saha added a comment - new patch will all changes
          Hide
          Todd Lipcon added a comment -
          • Can ActiveStandbyElector be made package-private? If not, it should get audience annotations (perhaps private, perhaps LimitedPrivate to HDFS? not sure)
            • Same with the inner interface

          +     * or loss of Zookeeper quorum. Thus enterSafeMode can be used to guard
          +     * against split-brain issues. In such situations it might be prudent to
          +     * call becomeStandby too. However, such state change operations might be
          +     * expensive and enterSafeMode can help guard against doing that for
          +     * transient issues.
          +     */
          
          • I think the above references to enterSafeMode are supposed to be enterNeutralMode, right?
          • Also, it can't really guard against split brain, because there is no guarantee on the timeliness of delivery of these messages. That is to say, the other participants in the election might receive becomeActive before this participant receives enterNeutralMode. So, I'm not sold on the necessity of this callback.

          +    void notifyFatalError();
          

          Shouldn't this come with some kind of Exception argument or at least a String error message? Right now if we hit it, it won't be clear in the logs which of several cases caused it.


          +  /**
          +   * Name of the lock znode used by the library
          +   */
          +  public static final String lockFileName = "ActiveStandbyElectorLock-21EC2020-3AEA-1069-A2DD-08002B30309D";
          
          • why is this public?
          • should also be ALL_CAPS.
          • what's with the random UUID in there? Assumedly this library would be configured to be rooted inside some base directory in the tree which would include the namespaceID, etc.

          +   * Setting a very short session timeout may result in frequent transitions
          +   * between active and standby states during issues like network outages.
          

          Also should mention GC pauses here – they're more frequent than network blips IME.


          +   * @param zookeeperHostPort
          +   *          ZooKeeper hostPort for all ZooKeeper servers
          

          Comma-separated? Perhaps better to name it zookeeperHostPorts since there is more than one server in the quorum.

          • typo: "reference to callback inteface object"

          +    appData = new byte[data.length];
          +    System.arraycopy(data, 0, appData, 0, data.length);
          

          Could use Arrays.copyOf() here instead


          • Rename operationSuccess etc to isSuccessCode – I think that's a clearer naming.
          • Make ActiveStandbyElectorTester an inner class of TestActiveStandbyElector. We generally discourage having multiple "outer" classes per Java file. You can then avoid making two "mockZk" objects, and "count" wouldn't have to be static, either. The whole class could be done as an anonymous class, inline, probably.
          • Echo what Suresh said about catching exceptions in tests - should let it fall through and fail the test - that'll also make sure the exception that was triggered makes it all the way up to the test runner and recorded properly (handy when debugging in Eclipse for example)
          • In a couple places, you catch an expected exception and then verify, but you should also add an Assert.fail("Didn't throw exception") in the try clause to make sure the exception was actually thrown.

          + * active and the rest become standbys. </br> This election mechanism is
          + * efficient for small number of election candidates (order of 10's) because
          

          Should say "only efficient" to be clear


          + * {@link ActiveStandbyElectorCallback} to interact with the elector
          + * 
          + */
          

          Extra blank lines inside javadoc comments should be removed


          Some general notes/nits:

          • Some of the INFO level logs are probably better off at DEBUG level. Or else, they should be expanded out to more operator-readable information (most ops will have no clue what "CreateNode result: 2 for path: /blah/blah" means.
          • Some more DEBUG level logs could be added to the different cases, or even INFO level ones at the unexpected ones (like having to retry, or being Disconnected, etc). I don't think there's any harm in being fairly verbose about state change events that are expected to only happen during fail-overs, and in case it goes wrong we want to have all the details at hand. But, as above, they should be operator-understandable.
          • Javadoc breaks should be <br/> rather than </br>.
          • Constants should be ALL_CAPS – eg LOG rather than {{Log}
          • Add a constant for NUM_RETRIES instead of hard-coded 3.
          • Should never use e.printStackTrace – instead, use LOG.error or LOG.warn with the second argument as the exception. This will print the trace, but also makes sure it goes to the right log.
          Show
          Todd Lipcon added a comment - Can ActiveStandbyElector be made package-private? If not, it should get audience annotations (perhaps private, perhaps LimitedPrivate to HDFS? not sure) Same with the inner interface + * or loss of Zookeeper quorum. Thus enterSafeMode can be used to guard + * against split-brain issues. In such situations it might be prudent to + * call becomeStandby too. However, such state change operations might be + * expensive and enterSafeMode can help guard against doing that for + * transient issues. + */ I think the above references to enterSafeMode are supposed to be enterNeutralMode , right? Also, it can't really guard against split brain, because there is no guarantee on the timeliness of delivery of these messages. That is to say, the other participants in the election might receive becomeActive before this participant receives enterNeutralMode . So, I'm not sold on the necessity of this callback. + void notifyFatalError(); Shouldn't this come with some kind of Exception argument or at least a String error message? Right now if we hit it, it won't be clear in the logs which of several cases caused it. + /** + * Name of the lock znode used by the library + */ + public static final String lockFileName = "ActiveStandbyElectorLock-21EC2020-3AEA-1069-A2DD-08002B30309D" ; why is this public? should also be ALL_CAPS. what's with the random UUID in there? Assumedly this library would be configured to be rooted inside some base directory in the tree which would include the namespaceID, etc. + * Setting a very short session timeout may result in frequent transitions + * between active and standby states during issues like network outages. Also should mention GC pauses here – they're more frequent than network blips IME. + * @param zookeeperHostPort + * ZooKeeper hostPort for all ZooKeeper servers Comma-separated? Perhaps better to name it zookeeperHostPorts since there is more than one server in the quorum. typo: "reference to callback inteface object" + appData = new byte [data.length]; + System .arraycopy(data, 0, appData, 0, data.length); Could use Arrays.copyOf() here instead Rename operationSuccess etc to isSuccessCode – I think that's a clearer naming. Make ActiveStandbyElectorTester an inner class of TestActiveStandbyElector. We generally discourage having multiple "outer" classes per Java file. You can then avoid making two "mockZk" objects, and "count" wouldn't have to be static, either. The whole class could be done as an anonymous class, inline, probably. Echo what Suresh said about catching exceptions in tests - should let it fall through and fail the test - that'll also make sure the exception that was triggered makes it all the way up to the test runner and recorded properly (handy when debugging in Eclipse for example) In a couple places, you catch an expected exception and then verify, but you should also add an Assert.fail("Didn't throw exception") in the try clause to make sure the exception was actually thrown. + * active and the rest become standbys. </br> This election mechanism is + * efficient for small number of election candidates (order of 10's) because Should say " only efficient" to be clear + * {@link ActiveStandbyElectorCallback} to interact with the elector + * + */ Extra blank lines inside javadoc comments should be removed Some general notes/nits: Some of the INFO level logs are probably better off at DEBUG level. Or else, they should be expanded out to more operator-readable information (most ops will have no clue what "CreateNode result: 2 for path: /blah/blah" means. Some more DEBUG level logs could be added to the different cases, or even INFO level ones at the unexpected ones (like having to retry, or being Disconnected, etc). I don't think there's any harm in being fairly verbose about state change events that are expected to only happen during fail-overs, and in case it goes wrong we want to have all the details at hand. But, as above, they should be operator-understandable. Javadoc breaks should be <br/> rather than </br> . Constants should be ALL_CAPS – eg LOG rather than {{Log} Add a constant for NUM_RETRIES instead of hard-coded 3. Should never use e.printStackTrace – instead, use LOG.error or LOG.warn with the second argument as the exception. This will print the trace, but also makes sure it goes to the right log.
          Hide
          Bikas Saha added a comment -

          >>Can ActiveStandbyElector be made package-private?
          Let me read up on all these package annotations and see what makes sense.

          >>Also, it can't really guard against split brain
          The guarantee is based on setting the correct timeouts. Another instance can only become active after the session timeout. The session timeout is recommended to be at least 3X the zookeeper disconnect timeout. enterNeutralMode is called when zookeeper client disconnects from zookeeper server or when zookeeper servers lose quorum. My understanding is that when there is a network disconnection then zookeeper client will disconnect from the server and post a disconnect event. So if your TCP disconnect timeouts are not set insanely high (> session timeout) then enterSafeMode will be called before session timeout expires and someone else becomes a master. Does this clarify?

          >>should also be ALL_CAPS
          It public because its a well defined property of the class.
          Is the ALLCAPS on static strings a convention? You mean the member name should be all caps or the value? I added the random UID to prevent accidental operation on this file from some admin. It does not hurt and it safer than using just a nicely named file. Anyways, I changed it.

          >>Could use Arrays.copyOf()
          I first used .clone() and then was pointed to System.ArrayCopy() and now pointed to Array.copyOf(). Could you please point me to any place that lists the pros and cons of different array copying methods (of which there seem to be many)?

          >>Rename operationSuccess etc to isSuccessCode
          I think the current names read OK with the if() statements.

          >>Make ActiveStandbyElectorTester an inner class of TestActiveStandbyElector.
          I first wrote it that way. But there is a problem.
          Tester_constructor()>super_constructor()>Tester().getNewZookeeper()->returns mock.
          So I need to have mock initialized before constructing the tester object. So I made mock a static member. But then java complained that inner classes cannot have static members.

          >>Some of the INFO level logs are probably better off at DEBUG level.
          Could you please point me to some place which explains what to log at different log levels?

          Show
          Bikas Saha added a comment - >>Can ActiveStandbyElector be made package-private? Let me read up on all these package annotations and see what makes sense. >>Also, it can't really guard against split brain The guarantee is based on setting the correct timeouts. Another instance can only become active after the session timeout. The session timeout is recommended to be at least 3X the zookeeper disconnect timeout. enterNeutralMode is called when zookeeper client disconnects from zookeeper server or when zookeeper servers lose quorum. My understanding is that when there is a network disconnection then zookeeper client will disconnect from the server and post a disconnect event. So if your TCP disconnect timeouts are not set insanely high (> session timeout) then enterSafeMode will be called before session timeout expires and someone else becomes a master. Does this clarify? >>should also be ALL_CAPS It public because its a well defined property of the class. Is the ALLCAPS on static strings a convention? You mean the member name should be all caps or the value? I added the random UID to prevent accidental operation on this file from some admin. It does not hurt and it safer than using just a nicely named file. Anyways, I changed it. >>Could use Arrays.copyOf() I first used .clone() and then was pointed to System.ArrayCopy() and now pointed to Array.copyOf(). Could you please point me to any place that lists the pros and cons of different array copying methods (of which there seem to be many)? >>Rename operationSuccess etc to isSuccessCode I think the current names read OK with the if() statements. >>Make ActiveStandbyElectorTester an inner class of TestActiveStandbyElector. I first wrote it that way. But there is a problem. Tester_constructor() >super_constructor() >Tester().getNewZookeeper()->returns mock. So I need to have mock initialized before constructing the tester object. So I made mock a static member. But then java complained that inner classes cannot have static members. >>Some of the INFO level logs are probably better off at DEBUG level. Could you please point me to some place which explains what to log at different log levels?
          Hide
          Todd Lipcon added a comment -

          So if your TCP disconnect timeouts are not set insanely high (> session timeout) then enterSafeMode will be called before session timeout expires and someone else becomes a master.

          This still isn't "safe". For example, imagine the NN goes into a multi-minute GC pause just before writing an edit to its edit log. Since the GC pause is longer than the session timeout, some other NN will take over. Without active fencing, when the first NN wakes up, it will make that mutation to the edit log before it finds out about the ZK timeout.

          It sounds contrived but we've had many instances of data loss bugs in HBase due to scenarios like this in the past. Multi-minute GC pauses are rare but do happen.

          It public because its a well defined property of the class.

          But it implies that external consumers of this class may want to directly manipulate the znode – which is exposing an implementation detail unnecessarily.

          Is the ALLCAPS on static strings a convention? You mean the member name should be all caps or the value?

          Yes, it's a convention that constants should have all-caps names. See the Sun java coding conventions, which we more-or-less follow: http://www.oracle.com/technetwork/java/codeconventions-135099.html#367

          So I need to have mock initialized before constructing the tester object. So I made mock a static member. But then java complained that inner classes cannot have static members.

          I'm not quite following - you already initialize the non-static mockZk in TestActiveStandbyElector.init()?. Then if it's a non-static inner class, it can simply refer to the already-initialized member of its outer class.

          Could you please point me to some place which explains what to log at different log levels?

          I don't think we have any formal guidelines here.. the basic assumptions I make are:

          • ERROR: unrecoverable errors (eg some block is apparently lost, or a failover failed, etc)
          • WARN: recoverable errors (eg failures that will be retried, blocks that have become under-replicated but can be repaired, etc)
          • INFO: normal operations proceeding as expected, but interesting enough that operators will want to see it.
          • DEBUG: information that will be useful to developers debugging unit tests or running small test clusters (unit tests generally enable these, but users generally don't). Also handy when you have a reproducible bug on the client - you can ask the user to enable DEBUG and re-run, for example.
          • TRACE: super-detailed trace information that will only be enabled in rare circumstances. We don't use this much.
          Show
          Todd Lipcon added a comment - So if your TCP disconnect timeouts are not set insanely high (> session timeout) then enterSafeMode will be called before session timeout expires and someone else becomes a master. This still isn't "safe". For example, imagine the NN goes into a multi-minute GC pause just before writing an edit to its edit log. Since the GC pause is longer than the session timeout, some other NN will take over. Without active fencing, when the first NN wakes up, it will make that mutation to the edit log before it finds out about the ZK timeout. It sounds contrived but we've had many instances of data loss bugs in HBase due to scenarios like this in the past. Multi-minute GC pauses are rare but do happen. It public because its a well defined property of the class. But it implies that external consumers of this class may want to directly manipulate the znode – which is exposing an implementation detail unnecessarily. Is the ALLCAPS on static strings a convention? You mean the member name should be all caps or the value? Yes, it's a convention that constants should have all-caps names. See the Sun java coding conventions, which we more-or-less follow: http://www.oracle.com/technetwork/java/codeconventions-135099.html#367 So I need to have mock initialized before constructing the tester object. So I made mock a static member. But then java complained that inner classes cannot have static members. I'm not quite following - you already initialize the non-static mockZk in TestActiveStandbyElector.init() ?. Then if it's a non-static inner class, it can simply refer to the already-initialized member of its outer class. Could you please point me to some place which explains what to log at different log levels? I don't think we have any formal guidelines here.. the basic assumptions I make are: ERROR: unrecoverable errors (eg some block is apparently lost, or a failover failed, etc) WARN: recoverable errors (eg failures that will be retried, blocks that have become under-replicated but can be repaired, etc) INFO: normal operations proceeding as expected, but interesting enough that operators will want to see it. DEBUG: information that will be useful to developers debugging unit tests or running small test clusters (unit tests generally enable these, but users generally don't). Also handy when you have a reproducible bug on the client - you can ask the user to enable DEBUG and re-run, for example. TRACE: super-detailed trace information that will only be enabled in rare circumstances. We don't use this much.
          Hide
          Bikas Saha added a comment -

          About the GC pause scenario (and others like it). Lets not mix up election with operation safety. What this library provides is a signal about whether one is a leader or not. By itself, that does not solve the problems of whether that signal was properly processed or not. E.g. a potential solution to the GC pause (or any NN hung case) would be to not have the NN participate in leader election directly. A failover controller (whose design ensures 0 or cheap GC pauses) could handle the leader election and terminate hung NN's when its are no longer a master.

          Let me address some of the comments in a subsequent patch. I need to learn a little more Java before I can do it to my liking.

          Show
          Bikas Saha added a comment - About the GC pause scenario (and others like it). Lets not mix up election with operation safety. What this library provides is a signal about whether one is a leader or not. By itself, that does not solve the problems of whether that signal was properly processed or not. E.g. a potential solution to the GC pause (or any NN hung case) would be to not have the NN participate in leader election directly. A failover controller (whose design ensures 0 or cheap GC pauses) could handle the leader election and terminate hung NN's when its are no longer a master. Let me address some of the comments in a subsequent patch. I need to learn a little more Java before I can do it to my liking.
          Hide
          Bikas Saha added a comment -

          Uploading new patch that address review comments.

          Show
          Bikas Saha added a comment - Uploading new patch that address review comments.
          Hide
          Bikas Saha added a comment -

          Adding patch with audience.private and stability.evolving annotations for the elector class

          Show
          Bikas Saha added a comment - Adding patch with audience.private and stability.evolving annotations for the elector class
          Hide
          Bikas Saha added a comment -

          granting the ASF license this time for the previous patch on class annotations.

          Show
          Bikas Saha added a comment - granting the ASF license this time for the previous patch on class annotations.
          Hide
          Bikas Saha added a comment -

          Adding patch with functional test using real ZooKeeper server.

          Show
          Bikas Saha added a comment - Adding patch with functional test using real ZooKeeper server.
          Hide
          Bikas Saha added a comment -

          Sorry for the spam. There were some unsupported characters that somehow showed up only when building from the top level hadoop project. Fixing that and adding some more comments.

          Show
          Bikas Saha added a comment - Sorry for the spam. There were some unsupported characters that somehow showed up only when building from the top level hadoop project. Fixing that and adding some more comments.
          Hide
          Suresh Srinivas added a comment -

          Uploading new patch with the following minor changes:

          1. Fixed build failure due to unsupported characters
          2. Fixed many lines longer than 80 columns. Other coding conventions fixed "{" to be on the same line after if and not new line.
          3. Added apache license banner to the new file TestActiveStandbyElectorRealZK.java
          4. Added javadoc to the the test classes to briefly describe what is being tested.

          Bikas can you please address the following comments:

          1. TestActiveStandbyElectorRealZK.java
            • ThreadRunner.zkClient hides the zkClient from the outside class.
            • Please add javadoc to the classes
            • Please add comments in the tests to describe the test that is being done.
          Show
          Suresh Srinivas added a comment - Uploading new patch with the following minor changes: Fixed build failure due to unsupported characters Fixed many lines longer than 80 columns. Other coding conventions fixed "{" to be on the same line after if and not new line. Added apache license banner to the new file TestActiveStandbyElectorRealZK.java Added javadoc to the the test classes to briefly describe what is being tested. Bikas can you please address the following comments: TestActiveStandbyElectorRealZK.java ThreadRunner.zkClient hides the zkClient from the outside class. Please add javadoc to the classes Please add comments in the tests to describe the test that is being done.
          Hide
          Bikas Saha added a comment -

          Fixed review comments

          Show
          Bikas Saha added a comment - Fixed review comments
          Hide
          Suresh Srinivas added a comment -

          +1 for the patch. Minor update. Changes the logger to commons.logging.

          Show
          Suresh Srinivas added a comment - +1 for the patch. Minor update. Changes the logger to commons.logging.
          Hide
          Aaron T. Myers added a comment -

          Why was this moved out from being a sub-task of HDFS-1623?

          Show
          Aaron T. Myers added a comment - Why was this moved out from being a sub-task of HDFS-1623 ?
          Hide
          Suresh Srinivas added a comment -

          I committed the patch. Thank you Bikas.

          Show
          Suresh Srinivas added a comment - I committed the patch. Thank you Bikas.
          Hide
          Harsh J added a comment -

          Just wondering if Curator would've been a better choice here, instead of writing our own?

          Show
          Harsh J added a comment - Just wondering if Curator would've been a better choice here, instead of writing our own?
          Hide
          Bikas Saha added a comment -

          From previous discussions on HDFS-2185 it seemed that writing a simple library for our own needs was a choice no one disagreed on. It also makes sense because this library could be enhanced more painlessly for future needs.
          From my reading of the curator code, it brings in a fair bit of complexity because it builds a general ZK framework on top on which recipes are implemented. Secondly, it seemed that the leader election recipe worked by electing a leader, that holds onto leadership via not returning from a blocking call. When that call returns then leadership is given up. I might be wrong but it was not obvious to me how this would handle environmental errors which cause the leadership to be lost because of external errors.

          Show
          Bikas Saha added a comment - From previous discussions on HDFS-2185 it seemed that writing a simple library for our own needs was a choice no one disagreed on. It also makes sense because this library could be enhanced more painlessly for future needs. From my reading of the curator code, it brings in a fair bit of complexity because it builds a general ZK framework on top on which recipes are implemented. Secondly, it seemed that the leader election recipe worked by electing a leader, that holds onto leadership via not returning from a blocking call. When that call returns then leadership is given up. I might be wrong but it was not obvious to me how this would handle environmental errors which cause the leadership to be lost because of external errors.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-HAbranch-build #59 (See https://builds.apache.org/job/Hadoop-Hdfs-HAbranch-build/59/)
          HADOOP-7992. Add ZKClient library to facilitate leader election. Contributed by Bikas Saha.

          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1235841
          Files :

          • /hadoop/common/branches/HDFS-1623/hadoop-common-project/hadoop-common/CHANGES.HDFS-1623.txt
          • /hadoop/common/branches/HDFS-1623/hadoop-common-project/hadoop-common/pom.xml
          • /hadoop/common/branches/HDFS-1623/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java
          • /hadoop/common/branches/HDFS-1623/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestActiveStandbyElector.java
          • /hadoop/common/branches/HDFS-1623/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestActiveStandbyElectorRealZK.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-HAbranch-build #59 (See https://builds.apache.org/job/Hadoop-Hdfs-HAbranch-build/59/ ) HADOOP-7992 . Add ZKClient library to facilitate leader election. Contributed by Bikas Saha. suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1235841 Files : /hadoop/common/branches/ HDFS-1623 /hadoop-common-project/hadoop-common/CHANGES. HDFS-1623 .txt /hadoop/common/branches/ HDFS-1623 /hadoop-common-project/hadoop-common/pom.xml /hadoop/common/branches/ HDFS-1623 /hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java /hadoop/common/branches/ HDFS-1623 /hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestActiveStandbyElector.java /hadoop/common/branches/ HDFS-1623 /hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestActiveStandbyElectorRealZK.java

            People

            • Assignee:
              Bikas Saha
              Reporter:
              Suresh Srinivas
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development