Hadoop Common
  1. Hadoop Common
  2. HADOOP-8212

Improve ActiveStandbyElector's behavior when session expires

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.3, 0.24.0
    • Fix Version/s: Auto Failover (HDFS-3042)
    • Component/s: None
    • Labels:
      None

      Description

      Currently when the ZK session expires, it results in a fatal error being sent to the application callback. This is not the best behavior – for example, in the case of HA, if ZK goes down, we would like the current state to be maintained, rather than causing either NN to abort. When the ZK clients are able to reconnect, they should sort out the correct leader based on the normal locking schemes.

      1. hadoop-8212.txt
        42 kB
        Todd Lipcon
      2. hadoop-8212.txt
        42 kB
        Todd Lipcon
      3. hadoop-8212-delta-bikas.txt
        4 kB
        Todd Lipcon

        Issue Links

          Activity

          Hide
          Todd Lipcon added a comment -

          Thanks for reviewing the addendum, and for your comments. I'll commit the addendum to the new HDFS-3042 branch momentarily.

          Show
          Todd Lipcon added a comment - Thanks for reviewing the addendum, and for your comments. I'll commit the addendum to the new HDFS-3042 branch momentarily.
          Hide
          Bikas Saha added a comment -

          Thanks for moving this to a new branch.

          Show
          Bikas Saha added a comment - Thanks for moving this to a new branch.
          Hide
          Bikas Saha added a comment -

          I assume this delta will be committed to new branch?

          Thats the behavior I found for ZK when I wrote the realZK test. I had an ill feeling about simply nulling zkClient and checking for null in the original code. I had a doubt that the old zkClient might still be alive and callback on its reference to the elector. Thats what happened in the create case. Unfortunately I did not have a test with real ZK that actually times out the session.

          I think adding the check for stale zkClient actually makes the check session expired code dead, as long as ZK client behaves this way. But it is a good to have in case ZK Client behavior changes going forward.

          +1

          Show
          Bikas Saha added a comment - I assume this delta will be committed to new branch? Thats the behavior I found for ZK when I wrote the realZK test. I had an ill feeling about simply nulling zkClient and checking for null in the original code. I had a doubt that the old zkClient might still be alive and callback on its reference to the elector. Thats what happened in the create case. Unfortunately I did not have a test with real ZK that actually times out the session. I think adding the check for stale zkClient actually makes the check session expired code dead, as long as ZK client behaves this way. But it is a good to have in case ZK Client behavior changes going forward. +1
          Hide
          Todd Lipcon added a comment -

          Here's a delta patch:

          • adds the sessionExpired check to the stat callback
          • adds a new functional test to try to trigger this scenario – expires the standby's session while it is monitoring the node.

          In practice, running these tests, I found that the Watcher always triggers with SessionExpired before the individual callbacks do. I don't know if this is a guarantee of ZK, but seems to be the current behavior. So we are doubly protected: the Expired handling case in the Watcher calls rejoinElection() which makes a new zkClient. So the new check against zkClient consistency will short circuit the events before even hitting the new session expiration tests.

          I also intend to separately write a stress test which should hopefully catch races like this, should ZK's behavior change.

          Show
          Todd Lipcon added a comment - Here's a delta patch: adds the sessionExpired check to the stat callback adds a new functional test to try to trigger this scenario – expires the standby's session while it is monitoring the node. In practice, running these tests, I found that the Watcher always triggers with SessionExpired before the individual callbacks do. I don't know if this is a guarantee of ZK, but seems to be the current behavior. So we are doubly protected: the Expired handling case in the Watcher calls rejoinElection() which makes a new zkClient . So the new check against zkClient consistency will short circuit the events before even hitting the new session expiration tests. I also intend to separately write a stress test which should hopefully catch races like this, should ZK's behavior change.
          Hide
          Todd Lipcon added a comment -

          Actually this turns to be difficult to revert since some other patches have modified this same code afterward (eg the change in mock proxy creation). I will just target an amendment patch at the branch, if that's OK.

          Show
          Todd Lipcon added a comment - Actually this turns to be difficult to revert since some other patches have modified this same code afterward (eg the change in mock proxy creation). I will just target an amendment patch at the branch, if that's OK.
          Hide
          Todd Lipcon added a comment -

          Reopening for the improvements suggested by Bikas. I will revert this from trunk/branch-2 so we can recommit the improved version to the newly minted HDFS-3042 branch.

          Show
          Todd Lipcon added a comment - Reopening for the improvements suggested by Bikas. I will revert this from trunk/branch-2 so we can recommit the improved version to the newly minted HDFS-3042 branch.
          Hide
          Bikas Saha added a comment -

          The patch does add the same handling to StatCallback. It uses the ZooKeeper "context" parameter to pass the original zkClient. Unfortunately the Watcher interface doesn't have any context object, which is why I had to introduce the wrapper class there.

          Not this. I was talking about the handling of session expired code that was added to the create callback (and is the title of this jira). The same thing could happen in the stat callback. the stat callback could get session expired code and send a fatal error instead of letting the process watcher callback rejoin the election on session expired.

          In my experience working on similar projects in the past, getting all the initial code in place is only half the battle. The real work starts once the code is there and you start banging on it in realistic test scenarios.

          And it might be that if we go slow in the first place we might save that time in the later phase

          We'd like to see automatic failover be a supported piece of the HA solution in 0.23.x (..err..2.0), and to hit that timeline, we need to get into the latter phase ASAP.

          That seems like a worthwhile engineering goal!

          If you'd prefer, I'm happy to create a feature branch for auto-failover and then call a merge vote when it's ready for the full QA onslaught.

          Thats a really good idea! This way at least all the jira will be linked to that and easy to follow. And we can make aggressive changes without worrying about churning or destabilizing trunk. I dont think voting to bring this important piece of work back to trunk will be a problem. But it makes the process a lot more manageable and trackable. I am +1 for it. Thanks!

          Show
          Bikas Saha added a comment - The patch does add the same handling to StatCallback. It uses the ZooKeeper "context" parameter to pass the original zkClient. Unfortunately the Watcher interface doesn't have any context object, which is why I had to introduce the wrapper class there. Not this. I was talking about the handling of session expired code that was added to the create callback (and is the title of this jira). The same thing could happen in the stat callback. the stat callback could get session expired code and send a fatal error instead of letting the process watcher callback rejoin the election on session expired. In my experience working on similar projects in the past, getting all the initial code in place is only half the battle. The real work starts once the code is there and you start banging on it in realistic test scenarios. And it might be that if we go slow in the first place we might save that time in the later phase We'd like to see automatic failover be a supported piece of the HA solution in 0.23.x (..err..2.0), and to hit that timeline, we need to get into the latter phase ASAP. That seems like a worthwhile engineering goal! If you'd prefer, I'm happy to create a feature branch for auto-failover and then call a merge vote when it's ready for the full QA onslaught. Thats a really good idea! This way at least all the jira will be linked to that and easy to follow. And we can make aggressive changes without worrying about churning or destabilizing trunk. I dont think voting to bring this important piece of work back to trunk will be a problem. But it makes the process a lot more manageable and trackable. I am +1 for it. Thanks!
          Hide
          Todd Lipcon added a comment -

          I think we want to added similar handling in the StatCallback. Its another race waiting to happen.

          The patch does add the same handling to StatCallback. It uses the ZooKeeper "context" parameter to pass the original zkClient. Unfortunately the Watcher interface doesn't have any context object, which is why I had to introduce the wrapper class there.

          The comment on processWatchEvent needs to change slightly to reflect that its the proxied watcher callback handler.

          Does the following look good?

          -   * interface implementation of Zookeeper watch events (connection and node)
          +   * interface implementation of Zookeeper watch events (connection and node),
          +   * proxied by {@link WatcherWithClientRef}.
          

          Whats the hurry?

          In my experience working on similar projects in the past, getting all the initial code in place is only half the battle. The real work starts once the code is there and you start banging on it in realistic test scenarios. We'd like to see automatic failover be a supported piece of the HA solution in 0.23.x (..err..2.0), and to hit that timeline, we need to get into the latter phase ASAP.

          I'm less aggressive when it comes to changing existing code, but since this is all new code, there's no risk of regressing working features by moving fast here. Once it starts to stabilize we can afford to slow down the rate of change. If you'd prefer, I'm happy to create a feature branch for auto-failover and then call a merge vote when it's ready for the full QA onslaught.

          Show
          Todd Lipcon added a comment - I think we want to added similar handling in the StatCallback. Its another race waiting to happen. The patch does add the same handling to StatCallback. It uses the ZooKeeper "context" parameter to pass the original zkClient. Unfortunately the Watcher interface doesn't have any context object, which is why I had to introduce the wrapper class there. The comment on processWatchEvent needs to change slightly to reflect that its the proxied watcher callback handler. Does the following look good? - * interface implementation of Zookeeper watch events (connection and node) + * interface implementation of Zookeeper watch events (connection and node), + * proxied by {@link WatcherWithClientRef}. Whats the hurry? In my experience working on similar projects in the past, getting all the initial code in place is only half the battle. The real work starts once the code is there and you start banging on it in realistic test scenarios. We'd like to see automatic failover be a supported piece of the HA solution in 0.23.x (..err..2.0), and to hit that timeline, we need to get into the latter phase ASAP. I'm less aggressive when it comes to changing existing code, but since this is all new code, there's no risk of regressing working features by moving fast here. Once it starts to stabilize we can afford to slow down the rate of change. If you'd prefer, I'm happy to create a feature branch for auto-failover and then call a merge vote when it's ready for the full QA onslaught.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1032 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1032/)
          HADOOP-8212. Improve ActiveStandbyElector's behavior when session expires. Contributed by Todd Lipcon. (Revision 1305510)

          Result = SUCCESS
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1305510
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ActiveStandbyElectorTestUtil.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestActiveStandbyElector.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestActiveStandbyElectorRealZK.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/MultithreadedTestUtil.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1032 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1032/ ) HADOOP-8212 . Improve ActiveStandbyElector's behavior when session expires. Contributed by Todd Lipcon. (Revision 1305510) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1305510 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ActiveStandbyElectorTestUtil.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestActiveStandbyElector.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestActiveStandbyElectorRealZK.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/MultithreadedTestUtil.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #997 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/997/)
          HADOOP-8212. Improve ActiveStandbyElector's behavior when session expires. Contributed by Todd Lipcon. (Revision 1305510)

          Result = FAILURE
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1305510
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ActiveStandbyElectorTestUtil.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestActiveStandbyElector.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestActiveStandbyElectorRealZK.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/MultithreadedTestUtil.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #997 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/997/ ) HADOOP-8212 . Improve ActiveStandbyElector's behavior when session expires. Contributed by Todd Lipcon. (Revision 1305510) Result = FAILURE todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1305510 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ActiveStandbyElectorTestUtil.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestActiveStandbyElector.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestActiveStandbyElectorRealZK.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/MultithreadedTestUtil.java
          Hide
          Bikas Saha added a comment -

          I think we want to added similar handling in the StatCallback. Its another race waiting to happen.

          The comment on processWatchEvent needs to change slightly to reflect that its the proxied watcher callback handler.

          Sure, happy to address post-commit. Sorry for moving quick - trying to get at least an initial implementation of auto failover committed quickly, and we can continue to improve and fix it up.

          Whats the hurry?

          Show
          Bikas Saha added a comment - I think we want to added similar handling in the StatCallback. Its another race waiting to happen. The comment on processWatchEvent needs to change slightly to reflect that its the proxied watcher callback handler. Sure, happy to address post-commit. Sorry for moving quick - trying to get at least an initial implementation of auto failover committed quickly, and we can continue to improve and fix it up. Whats the hurry?
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-0.23-Commit #739 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/739/)
          HADOOP-8212. Improve ActiveStandbyElector's behavior when session expires. Contributed by Todd Lipcon. (Revision 1305509)

          Result = SUCCESS
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1305509
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ActiveStandbyElectorTestUtil.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestActiveStandbyElector.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestActiveStandbyElectorRealZK.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/MultithreadedTestUtil.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-0.23-Commit #739 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/739/ ) HADOOP-8212 . Improve ActiveStandbyElector's behavior when session expires. Contributed by Todd Lipcon. (Revision 1305509) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1305509 Files : /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ActiveStandbyElectorTestUtil.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestActiveStandbyElector.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestActiveStandbyElectorRealZK.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/MultithreadedTestUtil.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #1942 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1942/)
          HADOOP-8212. Improve ActiveStandbyElector's behavior when session expires. Contributed by Todd Lipcon. (Revision 1305510)

          Result = FAILURE
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1305510
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ActiveStandbyElectorTestUtil.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestActiveStandbyElector.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestActiveStandbyElectorRealZK.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/MultithreadedTestUtil.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #1942 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1942/ ) HADOOP-8212 . Improve ActiveStandbyElector's behavior when session expires. Contributed by Todd Lipcon. (Revision 1305510) Result = FAILURE todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1305510 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ActiveStandbyElectorTestUtil.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestActiveStandbyElector.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestActiveStandbyElectorRealZK.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/MultithreadedTestUtil.java
          Hide
          Todd Lipcon added a comment -

          Sure, happy to address post-commit. Sorry for moving quick - trying to get at least an initial implementation of auto failover committed quickly, and we can continue to improve and fix it up.

          Show
          Todd Lipcon added a comment - Sure, happy to address post-commit. Sorry for moving quick - trying to get at least an initial implementation of auto failover committed quickly, and we can continue to improve and fix it up.
          Hide
          Bikas Saha added a comment -

          Sorry I could not get to this sooner. I will try to take a look and see if I can make some post-commit comments.

          Show
          Bikas Saha added a comment - Sorry I could not get to this sooner. I will try to take a look and see if I can make some post-commit comments.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Commit #720 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/720/)
          HADOOP-8212. Improve ActiveStandbyElector's behavior when session expires. Contributed by Todd Lipcon. (Revision 1305509)

          Result = SUCCESS
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1305509
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ActiveStandbyElectorTestUtil.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestActiveStandbyElector.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestActiveStandbyElectorRealZK.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/MultithreadedTestUtil.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Commit #720 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/720/ ) HADOOP-8212 . Improve ActiveStandbyElector's behavior when session expires. Contributed by Todd Lipcon. (Revision 1305509) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1305509 Files : /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ActiveStandbyElectorTestUtil.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestActiveStandbyElector.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestActiveStandbyElectorRealZK.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/MultithreadedTestUtil.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-0.23-Commit #730 (See https://builds.apache.org/job/Hadoop-Common-0.23-Commit/730/)
          HADOOP-8212. Improve ActiveStandbyElector's behavior when session expires. Contributed by Todd Lipcon. (Revision 1305509)

          Result = SUCCESS
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1305509
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ActiveStandbyElectorTestUtil.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestActiveStandbyElector.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestActiveStandbyElectorRealZK.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/MultithreadedTestUtil.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-0.23-Commit #730 (See https://builds.apache.org/job/Hadoop-Common-0.23-Commit/730/ ) HADOOP-8212 . Improve ActiveStandbyElector's behavior when session expires. Contributed by Todd Lipcon. (Revision 1305509) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1305509 Files : /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ActiveStandbyElectorTestUtil.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestActiveStandbyElector.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestActiveStandbyElectorRealZK.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/MultithreadedTestUtil.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #2006 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2006/)
          HADOOP-8212. Improve ActiveStandbyElector's behavior when session expires. Contributed by Todd Lipcon. (Revision 1305510)

          Result = SUCCESS
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1305510
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ActiveStandbyElectorTestUtil.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestActiveStandbyElector.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestActiveStandbyElectorRealZK.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/MultithreadedTestUtil.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2006 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2006/ ) HADOOP-8212 . Improve ActiveStandbyElector's behavior when session expires. Contributed by Todd Lipcon. (Revision 1305510) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1305510 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ActiveStandbyElectorTestUtil.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestActiveStandbyElector.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestActiveStandbyElectorRealZK.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/MultithreadedTestUtil.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #1931 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1931/)
          HADOOP-8212. Improve ActiveStandbyElector's behavior when session expires. Contributed by Todd Lipcon. (Revision 1305510)

          Result = SUCCESS
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1305510
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ActiveStandbyElectorTestUtil.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestActiveStandbyElector.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestActiveStandbyElectorRealZK.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/MultithreadedTestUtil.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #1931 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1931/ ) HADOOP-8212 . Improve ActiveStandbyElector's behavior when session expires. Contributed by Todd Lipcon. (Revision 1305510) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1305510 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ActiveStandbyElectorTestUtil.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestActiveStandbyElector.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestActiveStandbyElectorRealZK.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/MultithreadedTestUtil.java
          Hide
          Todd Lipcon added a comment -

          Committed, thanks Aaron.

          Show
          Todd Lipcon added a comment - Committed, thanks Aaron.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12519915/hadoop-8212.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 11 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/782//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/782//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519915/hadoop-8212.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 11 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/782//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/782//console This message is automatically generated.
          Hide
          Aaron T. Myers added a comment -

          I agree the log message in #2 is a little odd, but I didn't want to duplicate the log message three times, and it felt a little excessive to factor out a "logIgnoredEvent()" call... do you have another suggestion, or is it OK to leave it?

          Yea, I figured. It's fine the way you have it.

          Show
          Aaron T. Myers added a comment - I agree the log message in #2 is a little odd, but I didn't want to duplicate the log message three times, and it felt a little excessive to factor out a "logIgnoredEvent()" call... do you have another suggestion, or is it OK to leave it? Yea, I figured. It's fine the way you have it.
          Hide
          Todd Lipcon added a comment -

          I fixed item #1 and #3 above.

          I agree the log message in #2 is a little odd, but I didn't want to duplicate the log message three times, and it felt a little excessive to factor out a "logIgnoredEvent()" call... do you have another suggestion, or is it OK to leave it?

          Show
          Todd Lipcon added a comment - I fixed item #1 and #3 above. I agree the log message in #2 is a little odd, but I didn't want to duplicate the log message three times, and it felt a little excessive to factor out a "logIgnoredEvent()" call... do you have another suggestion, or is it OK to leave it?
          Hide
          Aaron T. Myers added a comment -

          Nice catch on that race condition, Todd. I also very much like the way you've improved the tests for this. Just a few nits follow. +1 once these are addressed.

          1. Please put some blank lines between preventSessionReestablishmentForTests, allowSessionReestablishmentForTests, and getZKSessionIdForTests.
          2. Seems a little strange to put the "Ignoring stale result from ..." log message in isStaleClient, where no decisions are being made about ignoring anything.
          3. typo: "...and passes it back back along with..."
          Show
          Aaron T. Myers added a comment - Nice catch on that race condition, Todd. I also very much like the way you've improved the tests for this. Just a few nits follow. +1 once these are addressed. Please put some blank lines between preventSessionReestablishmentForTests, allowSessionReestablishmentForTests, and getZKSessionIdForTests. Seems a little strange to put the "Ignoring stale result from ..." log message in isStaleClient, where no decisions are being made about ignoring anything. typo: "...and passes it back back along with..."
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12519904/hadoop-8212.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 11 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/781//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/781//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519904/hadoop-8212.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 11 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/781//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/781//console This message is automatically generated.
          Hide
          Todd Lipcon added a comment -

          Attached patch fixes the behavior to not notifyFatalError when the session is expired. The existing code already handled rejoining.

          I also fixed a race bug I turned up where, after rejoining with a new zkClient, some old notifications from the previous zkClient could end up getting through. The watchers and callbacks now pass along the zkClient used to set them, and then in the callback, we check to make sure it is still current.

          I also simplified the test case to no longer be multi-threaded, since it's much easier to follow as a linear progression, and the threads didn't buy us anything. I added test coverage around session expiration to cover the new code.

          Show
          Todd Lipcon added a comment - Attached patch fixes the behavior to not notifyFatalError when the session is expired. The existing code already handled rejoining. I also fixed a race bug I turned up where, after rejoining with a new zkClient, some old notifications from the previous zkClient could end up getting through. The watchers and callbacks now pass along the zkClient used to set them, and then in the callback, we check to make sure it is still current. I also simplified the test case to no longer be multi-threaded, since it's much easier to follow as a linear progression, and the threads didn't buy us anything. I added test coverage around session expiration to cover the new code.

            People

            • Assignee:
              Todd Lipcon
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development