Hadoop Common
  1. Hadoop Common
  2. HADOOP-8206

Common portion of ZK-based failover controller

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.24.0
    • Fix Version/s: 2.0.0-alpha
    • Component/s: ha
    • Labels:
      None

      Description

      This JIRA is for the Common (generic) portion of HDFS-2185. It can't run on its own, but this JIRA will include unit tests using mock/dummy services.

      1. hadoop-8206.txt
        41 kB
        Todd Lipcon
      2. hadoop-8206.txt
        37 kB
        Todd Lipcon
      3. hadoop-8206.txt
        37 kB
        Todd Lipcon

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1032 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1032/)
          HADOOP-8206. Common portion of a ZK-based failover controller. Contributed by Todd Lipcon. (Revision 1305673)

          Result = SUCCESS
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1305673
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/FailoverController.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HealthMonitor.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ToolRunner.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ActiveStandbyElectorTestUtil.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/DummyHAService.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestNodeFencer.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestZKFailoverController.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1032 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1032/ ) HADOOP-8206 . Common portion of a ZK-based failover controller. Contributed by Todd Lipcon. (Revision 1305673) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1305673 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/FailoverController.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HealthMonitor.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ToolRunner.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ActiveStandbyElectorTestUtil.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/DummyHAService.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestNodeFencer.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestZKFailoverController.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #997 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/997/)
          HADOOP-8206. Common portion of a ZK-based failover controller. Contributed by Todd Lipcon. (Revision 1305673)

          Result = FAILURE
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1305673
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/FailoverController.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HealthMonitor.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ToolRunner.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ActiveStandbyElectorTestUtil.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/DummyHAService.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestNodeFencer.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestZKFailoverController.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #997 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/997/ ) HADOOP-8206 . Common portion of a ZK-based failover controller. Contributed by Todd Lipcon. (Revision 1305673) Result = FAILURE todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1305673 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/FailoverController.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HealthMonitor.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ToolRunner.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ActiveStandbyElectorTestUtil.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/DummyHAService.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestNodeFencer.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestZKFailoverController.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #2008 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2008/)
          HADOOP-8206. Common portion of a ZK-based failover controller. Contributed by Todd Lipcon. (Revision 1305673)

          Result = SUCCESS
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1305673
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/FailoverController.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HealthMonitor.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ToolRunner.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ActiveStandbyElectorTestUtil.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/DummyHAService.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestNodeFencer.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestZKFailoverController.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2008 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2008/ ) HADOOP-8206 . Common portion of a ZK-based failover controller. Contributed by Todd Lipcon. (Revision 1305673) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1305673 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/FailoverController.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HealthMonitor.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ToolRunner.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ActiveStandbyElectorTestUtil.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/DummyHAService.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestNodeFencer.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestZKFailoverController.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Commit #722 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/722/)
          HADOOP-8206. Common portion of a ZK-based failover controller. Contributed by Todd Lipcon. (Revision 1305672)

          Result = SUCCESS
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1305672
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/FailoverController.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HealthMonitor.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ToolRunner.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ActiveStandbyElectorTestUtil.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/DummyHAService.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestNodeFencer.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestZKFailoverController.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Commit #722 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/722/ ) HADOOP-8206 . Common portion of a ZK-based failover controller. Contributed by Todd Lipcon. (Revision 1305672) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1305672 Files : /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/FailoverController.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HealthMonitor.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ToolRunner.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ActiveStandbyElectorTestUtil.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/DummyHAService.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestNodeFencer.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestZKFailoverController.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #1933 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1933/)
          HADOOP-8206. Common portion of a ZK-based failover controller. Contributed by Todd Lipcon. (Revision 1305673)

          Result = SUCCESS
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1305673
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/FailoverController.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HealthMonitor.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ToolRunner.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ActiveStandbyElectorTestUtil.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/DummyHAService.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestNodeFencer.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestZKFailoverController.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #1933 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1933/ ) HADOOP-8206 . Common portion of a ZK-based failover controller. Contributed by Todd Lipcon. (Revision 1305673) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1305673 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/FailoverController.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HealthMonitor.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ToolRunner.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ActiveStandbyElectorTestUtil.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/DummyHAService.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestNodeFencer.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestZKFailoverController.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-0.23-Commit #742 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/742/)
          HADOOP-8206. Common portion of a ZK-based failover controller. Contributed by Todd Lipcon. (Revision 1305672)

          Result = ABORTED
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1305672
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/FailoverController.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HealthMonitor.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ToolRunner.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ActiveStandbyElectorTestUtil.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/DummyHAService.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestNodeFencer.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestZKFailoverController.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-0.23-Commit #742 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/742/ ) HADOOP-8206 . Common portion of a ZK-based failover controller. Contributed by Todd Lipcon. (Revision 1305672) Result = ABORTED todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1305672 Files : /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/FailoverController.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HealthMonitor.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ToolRunner.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ActiveStandbyElectorTestUtil.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/DummyHAService.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestNodeFencer.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestZKFailoverController.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #1945 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1945/)
          HADOOP-8206. Common portion of a ZK-based failover controller. Contributed by Todd Lipcon. (Revision 1305673)

          Result = ABORTED
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1305673
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/FailoverController.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HealthMonitor.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ToolRunner.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ActiveStandbyElectorTestUtil.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/DummyHAService.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestNodeFencer.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestZKFailoverController.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #1945 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1945/ ) HADOOP-8206 . Common portion of a ZK-based failover controller. Contributed by Todd Lipcon. (Revision 1305673) Result = ABORTED todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1305673 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/FailoverController.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HealthMonitor.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ToolRunner.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ActiveStandbyElectorTestUtil.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/DummyHAService.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestNodeFencer.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestZKFailoverController.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-0.23-Commit #732 (See https://builds.apache.org/job/Hadoop-Common-0.23-Commit/732/)
          HADOOP-8206. Common portion of a ZK-based failover controller. Contributed by Todd Lipcon. (Revision 1305672)

          Result = SUCCESS
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1305672
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/FailoverController.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HealthMonitor.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ToolRunner.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ActiveStandbyElectorTestUtil.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/DummyHAService.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestNodeFencer.java
          • /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestZKFailoverController.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-0.23-Commit #732 (See https://builds.apache.org/job/Hadoop-Common-0.23-Commit/732/ ) HADOOP-8206 . Common portion of a ZK-based failover controller. Contributed by Todd Lipcon. (Revision 1305672) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1305672 Files : /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ActiveStandbyElector.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/FailoverController.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HealthMonitor.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/ToolRunner.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/ActiveStandbyElectorTestUtil.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/DummyHAService.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestNodeFencer.java /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ha/TestZKFailoverController.java
          Hide
          Todd Lipcon added a comment -

          Committed to 23 and trunk, thanks for the reviews.

          I know this went in fairly quickly, so if anyone has any follow-up comments, please feel free to file a JIRA and assign it to me. I will take care of them ASAP.

          Show
          Todd Lipcon added a comment - Committed to 23 and trunk, thanks for the reviews. I know this went in fairly quickly, so if anyone has any follow-up comments, please feel free to file a JIRA and assign it to me. I will take care of them ASAP.
          Hide
          Todd Lipcon added a comment -

          Makes sense to me. One question, though - we seem to be inconsistently using IllegalArgumentException and HadoopIllegalArgumentException. Is there any good reason for that?

          I'm not entirely sure – looking across the code as a whole, we have a 10:1 ratio of IllegalArgumentException vs HadoopIllegalArgumentException. So I'm erring on the side of what's used more often, except in a few places where we directly expose it as a potentially user-visible error (like bad command line arguments).

          Show
          Todd Lipcon added a comment - Makes sense to me. One question, though - we seem to be inconsistently using IllegalArgumentException and HadoopIllegalArgumentException. Is there any good reason for that? I'm not entirely sure – looking across the code as a whole, we have a 10:1 ratio of IllegalArgumentException vs HadoopIllegalArgumentException. So I'm erring on the side of what's used more often, except in a few places where we directly expose it as a potentially user-visible error (like bad command line arguments).
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12520023/hadoop-8206.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 11 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/786//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/786//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12520023/hadoop-8206.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 11 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/786//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/786//console This message is automatically generated.
          Hide
          Aaron T. Myers added a comment -

          Tried to do this, but these inner classes are really just grouping of methods inside ZKFC. They access so many of the outer class's variables that it didn't seem any cleaner to make them static here.

          Makes sense. Thanks for looking into it.

          Changed to IllegalArgumentException – since from the perspective of ZK, the state is an argument of the callback, rather than its own state.

          Makes sense to me. One question, though - we seem to be inconsistently using IllegalArgumentException and HadoopIllegalArgumentException. Is there any good reason for that?

          I think you misread it. It is case-insensitive. Clarified the javadoc to note this.

          So I did! My mistake.

          I kept having cognitive dissonance when the logs told me something about "service 1" and I had to actually look at the code referring to svc2. Is that OK?

          Sure, makes sense.

          I couldn't use "setUp" since the base class already has that. Changed to setupConfAndServices and renamed setupServicesAndFCs to just setupFCs which is more accurate.

          Makes sense.

          Changed to start them at 2 (didn't start with 1, since that's the error code that Java returns for uncaught exceptions coming out of main())

          Good idea.

          Added a new test case testZooKeeperFailure()

          Thanks. The new test looks great to me.

          +1 pending Jenkins and an answer to the question above about IllegalArgEx.

          Show
          Aaron T. Myers added a comment - Tried to do this, but these inner classes are really just grouping of methods inside ZKFC. They access so many of the outer class's variables that it didn't seem any cleaner to make them static here. Makes sense. Thanks for looking into it. Changed to IllegalArgumentException – since from the perspective of ZK, the state is an argument of the callback, rather than its own state. Makes sense to me. One question, though - we seem to be inconsistently using IllegalArgumentException and HadoopIllegalArgumentException. Is there any good reason for that? I think you misread it. It is case-insensitive. Clarified the javadoc to note this. So I did! My mistake. I kept having cognitive dissonance when the logs told me something about "service 1" and I had to actually look at the code referring to svc2. Is that OK? Sure, makes sense. I couldn't use "setUp" since the base class already has that. Changed to setupConfAndServices and renamed setupServicesAndFCs to just setupFCs which is more accurate. Makes sense. Changed to start them at 2 (didn't start with 1, since that's the error code that Java returns for uncaught exceptions coming out of main()) Good idea. Added a new test case testZooKeeperFailure() Thanks. The new test looks great to me. +1 pending Jenkins and an answer to the question above about IllegalArgEx.
          Hide
          Todd Lipcon added a comment -

          You left in a "TODO: this should be namespace-scoped". Also, is that referring to all of those config keys or just the immediate one below it? If you're not going to address this here, could you please file a new JIRA for it?

          Will file.

          I don't understand the purpose of all the casting/wrapping of Exception/RuntimeException in ZKFailoverController#run. If this is really required, please add a comment as to why it's necessary.

          Agreed, that was goofy. Fixed to always wrap/unwrap – reason being that the implemented interface doesn't throw exceptions, but our code does.

          Could you please add a constant (and perhaps a more-likely-unique name) for the "wrapped" string used in ZKFailoverController#run?

          Obsoleted by above fix.

          "// TODO: need to hook DFS here to find the NN keytab info, etc," - do you intend to do this here? If not, please file a JIRA.

          Will file a JIRA for auto-failover plus security - it needs a lot of manual testing, etc, so I wanted to do it as a follow-up.

          If possible, let's make ElectorCallbacks and HealthCallbacks static, and have them take references to the ZKFailoverController instance at construction.

          Tried to do this, but these inner classes are really just grouping of methods inside ZKFC. They access so many of the outer class's variables that it didn't seem any cleaner to make them static here.

          Is there some reason you didn't use one of the option-parsing classes in ZKFailoverController#doRun?

          The args are simple enough here I didn't see any benefit. And I've actually found commons-CLI to be a pain when there are "dependent" options (e.g -force only makes sense when -formatZK is passed)

          Typo: "Gracefully uitting...".

          Fixed

          Seems a little strange to me to output a log message about quitting the master election that will presumably appear in the logs before we've ever joined the election. I think this might confuse users.

          That actually won't appear, since the HealthMonitor starts in INITIALIZING state. So, it doesn't fire a callback at startup, until it changes to something other than INITIALIZING. I will change the message to be something a little more general, though: "Ensuring that the local node does not participate in active election."

          The design doc on HDFS-2185 doesn't mention that the ZKFailoverController calls quitElection in the case of a transition to the INITIALIZING state. This seems like the right thing to do to me, but I just wanted to bring it to your attention.

          Added to next rev of the design doc, thanks.

          Seems a little odd to me to throw an AssertionError in the case of an unrecognized state transition. Perhaps an IllegalStateException would be better?

          Changed to IllegalArgumentException – since from the perspective of ZK, the state is an argument of the callback, rather than its own state.

          Left in the "TODO: we need to make sure that if we get fenced and then quickly restarted". What are the ramifications of not addressing this TODO initially?

          I think it's pretty unlikely to happen in practice, since we set the IPC client retry parameter to 1, and it needs some very strange timing. But, for example, we have this bad scenario:

          • ZKFC1 gets active lock
          • ZKFC1 is about to send transitionToActive() and machine freezes (eg GC pause + swapping)
          • ZKFC1 loses its ZK lock, ZKFC2 gets ZK lock
          • ZKFC2 calls transitionToStandby on NN1, and transitions NN2 to active
          • ZKFC1 wakes up from pause, calls transitionToActive(), now we have a bad situation

          The timing is unlikely here, because we need the "freeze" on ZKFC1 to be longer than its own session timeout. I do think we need to fix it, but I want to think carefully about the solution, and it will require more "plumbing" to get it right. I'll file a follow-up JIRA when this is committed.

          The method comment and the prompt in ToolRunner#confirmPrompt suggest that the user needs to enter a capital "Y", but the code in fact only accepts lower-case "y".

          I think you misread it. It is case-insensitive. Clarified the javadoc to note this.

          Might be a good idea to output a message along the lines of "Invalid input: 'x'" if the user provides an invalid input in ToolRunner#confirmPrompt.

          Fixed

          Why the variable name "serial" in DummyHAService#getInstance? 'index' seems clearer to me.

          I'd recommend you change the setting of DummyHAService#index to be before adding the object to the instances array, so that the index is zero-based. This will also let you take out the "- 1" from DummyHAService#getInstance.

          That's actually what I had in the previous revision. But then I found it confusing that my variables were named "

          {thr,svc,zkfc} {1,2}

          " but the logs would refer to "service 0". I kept having cognitive dissonance when the logs told me something about "service 1" and I had to actually look at the code referring to svc2. Is that OK?

          TestZKFailoverController does more than set up the configuration. Perhaps just "setUp" ?

          I couldn't use "setUp" since the base class already has that. Changed to setupConfAndServices and renamed setupServicesAndFCs to just setupFCs which is more accurate.

          I think it'd be better to make the ZKFailoverController error codes positive integers...

          Changed to start them at 2 (didn't start with 1, since that's the error code that Java returns for uncaught exceptions coming out of main())

          Please add a comment to TestZKFailoverController#setupServicesAndFCs that svc1 is guaranteed to initially be active. You also might want to assert that svc2 is initially STANDBY. Several of the tests assume this state up-front without actually asserting it.

          Fixed.

          "Allowing svc1 to be healthy again, making svc2 unhealthy..." - here you're actually making svc2 unreachable, not unhealthy.

          Fixed

          The test code in TestZKFailoverController#testAutoFailoverOnLostZKSession is effectively repeated 4 times. Factor out?

          Fixed

          In TestZKFailoverController#testDontFailoverToUnhealthyNode, let's assert after the Thread.sleep that svc2 is not in fact healthy. As it stands, the test could erroneously pass if svc2 became ACTIVE and then went back to STANDBY upon svc1 becoming healthy again.

          Added a check that the lock is un-held after the 1-second sleep

          I don't see any tests in this patch for the case of ZK completely disappearing. Please write one, or file a JIRA to do so.

          Added a new test case testZooKeeperFailure()

          Show
          Todd Lipcon added a comment - You left in a "TODO: this should be namespace-scoped". Also, is that referring to all of those config keys or just the immediate one below it? If you're not going to address this here, could you please file a new JIRA for it? Will file. I don't understand the purpose of all the casting/wrapping of Exception/RuntimeException in ZKFailoverController#run. If this is really required, please add a comment as to why it's necessary. Agreed, that was goofy. Fixed to always wrap/unwrap – reason being that the implemented interface doesn't throw exceptions, but our code does. Could you please add a constant (and perhaps a more-likely-unique name) for the "wrapped" string used in ZKFailoverController#run? Obsoleted by above fix. "// TODO: need to hook DFS here to find the NN keytab info, etc," - do you intend to do this here? If not, please file a JIRA. Will file a JIRA for auto-failover plus security - it needs a lot of manual testing, etc, so I wanted to do it as a follow-up. If possible, let's make ElectorCallbacks and HealthCallbacks static, and have them take references to the ZKFailoverController instance at construction. Tried to do this, but these inner classes are really just grouping of methods inside ZKFC. They access so many of the outer class's variables that it didn't seem any cleaner to make them static here. Is there some reason you didn't use one of the option-parsing classes in ZKFailoverController#doRun? The args are simple enough here I didn't see any benefit. And I've actually found commons-CLI to be a pain when there are "dependent" options (e.g -force only makes sense when -formatZK is passed) Typo: "Gracefully uitting...". Fixed Seems a little strange to me to output a log message about quitting the master election that will presumably appear in the logs before we've ever joined the election. I think this might confuse users. That actually won't appear, since the HealthMonitor starts in INITIALIZING state. So, it doesn't fire a callback at startup, until it changes to something other than INITIALIZING. I will change the message to be something a little more general, though: "Ensuring that the local node does not participate in active election." The design doc on HDFS-2185 doesn't mention that the ZKFailoverController calls quitElection in the case of a transition to the INITIALIZING state. This seems like the right thing to do to me, but I just wanted to bring it to your attention. Added to next rev of the design doc, thanks. Seems a little odd to me to throw an AssertionError in the case of an unrecognized state transition. Perhaps an IllegalStateException would be better? Changed to IllegalArgumentException – since from the perspective of ZK, the state is an argument of the callback, rather than its own state. Left in the "TODO: we need to make sure that if we get fenced and then quickly restarted". What are the ramifications of not addressing this TODO initially? I think it's pretty unlikely to happen in practice, since we set the IPC client retry parameter to 1, and it needs some very strange timing. But, for example, we have this bad scenario: ZKFC1 gets active lock ZKFC1 is about to send transitionToActive() and machine freezes (eg GC pause + swapping) ZKFC1 loses its ZK lock, ZKFC2 gets ZK lock ZKFC2 calls transitionToStandby on NN1, and transitions NN2 to active ZKFC1 wakes up from pause, calls transitionToActive(), now we have a bad situation The timing is unlikely here, because we need the "freeze" on ZKFC1 to be longer than its own session timeout. I do think we need to fix it, but I want to think carefully about the solution, and it will require more "plumbing" to get it right. I'll file a follow-up JIRA when this is committed. The method comment and the prompt in ToolRunner#confirmPrompt suggest that the user needs to enter a capital "Y", but the code in fact only accepts lower-case "y". I think you misread it. It is case-insensitive. Clarified the javadoc to note this. Might be a good idea to output a message along the lines of "Invalid input: 'x'" if the user provides an invalid input in ToolRunner#confirmPrompt. Fixed Why the variable name "serial" in DummyHAService#getInstance? 'index' seems clearer to me. I'd recommend you change the setting of DummyHAService#index to be before adding the object to the instances array, so that the index is zero-based. This will also let you take out the "- 1" from DummyHAService#getInstance. That's actually what I had in the previous revision. But then I found it confusing that my variables were named " {thr,svc,zkfc} {1,2} " but the logs would refer to "service 0". I kept having cognitive dissonance when the logs told me something about "service 1" and I had to actually look at the code referring to svc2. Is that OK? TestZKFailoverController does more than set up the configuration. Perhaps just "setUp" ? I couldn't use "setUp" since the base class already has that. Changed to setupConfAndServices and renamed setupServicesAndFCs to just setupFCs which is more accurate. I think it'd be better to make the ZKFailoverController error codes positive integers... Changed to start them at 2 (didn't start with 1, since that's the error code that Java returns for uncaught exceptions coming out of main()) Please add a comment to TestZKFailoverController#setupServicesAndFCs that svc1 is guaranteed to initially be active. You also might want to assert that svc2 is initially STANDBY. Several of the tests assume this state up-front without actually asserting it. Fixed. "Allowing svc1 to be healthy again, making svc2 unhealthy..." - here you're actually making svc2 unreachable, not unhealthy. Fixed The test code in TestZKFailoverController#testAutoFailoverOnLostZKSession is effectively repeated 4 times. Factor out? Fixed In TestZKFailoverController#testDontFailoverToUnhealthyNode, let's assert after the Thread.sleep that svc2 is not in fact healthy. As it stands, the test could erroneously pass if svc2 became ACTIVE and then went back to STANDBY upon svc1 becoming healthy again. Added a check that the lock is un-held after the 1-second sleep I don't see any tests in this patch for the case of ZK completely disappearing. Please write one, or file a JIRA to do so. Added a new test case testZooKeeperFailure()
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12520001/hadoop-8206.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 8 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/785//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/785//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12520001/hadoop-8206.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/785//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/785//console This message is automatically generated.
          Hide
          Aaron T. Myers added a comment -

          Patch is looking pretty good, Todd. A few comments, mostly nits:

          1. You left in a "TODO: this should be namespace-scoped". Also, is that referring to all of those config keys or just the immediate one below it? If you're not going to address this here, could you please file a new JIRA for it?
          2. I don't understand the purpose of all the casting/wrapping of Exception/RuntimeException in ZKFailoverController#run. If this is really required, please add a comment as to why it's necessary.
          3. Could you please add a constant (and perhaps a more-likely-unique name) for the "wrapped" string used in ZKFailoverController#run?
          4. "// TODO: need to hook DFS here to find the NN keytab info, etc," - do you intend to do this here? If not, please file a JIRA.
          5. If possible, let's make ElectorCallbacks and HealthCallbacks static, and have them take references to the ZKFailoverController instance at construction.
          6. Is there some reason you didn't use one of the option-parsing classes in ZKFailoverController#doRun?
          7. Typo: "Gracefully uitting...".
          8. Seems a little strange to me to output a log message about quitting the master election that will presumably appear in the logs before we've ever joined the election. I think this might confuse users.
          9. The design doc on HDFS-2185 doesn't mention that the ZKFailoverController calls quitElection in the case of a transition to the INITIALIZING state. This seems like the right thing to do to me, but I just wanted to bring it to your attention.
          10. Seems a little odd to me to throw an AssertionError in the case of an unrecognized state transition. Perhaps an IllegalStateException would be better?
          11. Left in the "TODO: we need to make sure that if we get fenced and then quickly restarted". What are the ramifications of not addressing this TODO initially?
          12. The method comment and the prompt in ToolRunner#confirmPrompt suggest that the user needs to enter a capital "Y", but the code in fact only accepts lower-case "y".
          13. Might be a good idea to output a message along the lines of "Invalid input: 'x'" if the user provides an invalid input in ToolRunner#confirmPrompt.
          14. Why the variable name "serial" in DummyHAService#getInstance? 'index' seems clearer to me.
          15. I'd recommend you change the setting of DummyHAService#index to be before adding the object to the instances array, so that the index is zero-based. This will also let you take out the "- 1" from DummyHAService#getInstance.
          16. TestZKFailoverController does more than set up the configuration. Perhaps just "setUp" ?
          17. I think it'd be better to make the ZKFailoverController error codes positive integers, since they'll end up getting returned as process exit statuses, and those have to be positive. (The negative error codes will wrap at 255).
          18. Please add a comment to TestZKFailoverController#setupServicesAndFCs that svc1 is guaranteed to initially be active. You also might want to assert that svc2 is initially STANDBY. Several of the tests assume this state up-front without actually asserting it.
          19. "Allowing svc1 to be healthy again, making svc2 unhealthy..." - here you're actually making svc2 unreachable, not unhealthy.
          20. The test code in TestZKFailoverController#testAutoFailoverOnLostZKSession is effectively repeated 4 times. Factor out?
          21. In TestZKFailoverController#testDontFailoverToUnhealthyNode, let's assert after the Thread.sleep that svc2 is not in fact healthy. As it stands, the test could erroneously pass if svc2 became ACTIVE and then went back to STANDBY upon svc1 becoming healthy again.
          22. I don't see any tests in this patch for the case of ZK completely disappearing. Please write one, or file a JIRA to do so.
          Show
          Aaron T. Myers added a comment - Patch is looking pretty good, Todd. A few comments, mostly nits: You left in a "TODO: this should be namespace-scoped". Also, is that referring to all of those config keys or just the immediate one below it? If you're not going to address this here, could you please file a new JIRA for it? I don't understand the purpose of all the casting/wrapping of Exception/RuntimeException in ZKFailoverController#run. If this is really required, please add a comment as to why it's necessary. Could you please add a constant (and perhaps a more-likely-unique name) for the "wrapped" string used in ZKFailoverController#run? "// TODO: need to hook DFS here to find the NN keytab info, etc," - do you intend to do this here? If not, please file a JIRA. If possible, let's make ElectorCallbacks and HealthCallbacks static, and have them take references to the ZKFailoverController instance at construction. Is there some reason you didn't use one of the option-parsing classes in ZKFailoverController#doRun? Typo: "Gracefully uitting...". Seems a little strange to me to output a log message about quitting the master election that will presumably appear in the logs before we've ever joined the election. I think this might confuse users. The design doc on HDFS-2185 doesn't mention that the ZKFailoverController calls quitElection in the case of a transition to the INITIALIZING state. This seems like the right thing to do to me, but I just wanted to bring it to your attention. Seems a little odd to me to throw an AssertionError in the case of an unrecognized state transition. Perhaps an IllegalStateException would be better? Left in the "TODO: we need to make sure that if we get fenced and then quickly restarted". What are the ramifications of not addressing this TODO initially? The method comment and the prompt in ToolRunner#confirmPrompt suggest that the user needs to enter a capital "Y", but the code in fact only accepts lower-case "y". Might be a good idea to output a message along the lines of "Invalid input: 'x'" if the user provides an invalid input in ToolRunner#confirmPrompt. Why the variable name "serial" in DummyHAService#getInstance? 'index' seems clearer to me. I'd recommend you change the setting of DummyHAService#index to be before adding the object to the instances array, so that the index is zero-based. This will also let you take out the "- 1" from DummyHAService#getInstance. TestZKFailoverController does more than set up the configuration. Perhaps just "setUp" ? I think it'd be better to make the ZKFailoverController error codes positive integers, since they'll end up getting returned as process exit statuses, and those have to be positive. (The negative error codes will wrap at 255). Please add a comment to TestZKFailoverController#setupServicesAndFCs that svc1 is guaranteed to initially be active. You also might want to assert that svc2 is initially STANDBY. Several of the tests assume this state up-front without actually asserting it. "Allowing svc1 to be healthy again, making svc2 unhealthy..." - here you're actually making svc2 unreachable, not unhealthy. The test code in TestZKFailoverController#testAutoFailoverOnLostZKSession is effectively repeated 4 times. Factor out? In TestZKFailoverController#testDontFailoverToUnhealthyNode, let's assert after the Thread.sleep that svc2 is not in fact healthy. As it stands, the test could erroneously pass if svc2 became ACTIVE and then went back to STANDBY upon svc1 becoming healthy again. I don't see any tests in this patch for the case of ZK completely disappearing. Please write one, or file a JIRA to do so.
          Hide
          Todd Lipcon added a comment -
          • Fix findbugs warning (was trivial)
          • Add a "mkdir" to the setup method of the test, to properly make the test data directory. For some reason on the Hudson box this is necessary (we had the same issue with the TestActiveStandbyElectorRealZK a month or so ago, now that I remember)
          Show
          Todd Lipcon added a comment - Fix findbugs warning (was trivial) Add a "mkdir" to the setup method of the test, to properly make the test data directory. For some reason on the Hudson box this is necessary (we had the same issue with the TestActiveStandbyElectorRealZK a month or so ago, now that I remember)
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12519905/hadoop-8206.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 8 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.ha.TestZKFailoverController

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/784//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/784//artifact/trunk/hadoop-common-project/patchprocess/newPatchFindbugsWarningshadoop-common.html
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/784//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519905/hadoop-8206.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.ha.TestZKFailoverController +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/784//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/784//artifact/trunk/hadoop-common-project/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/784//console This message is automatically generated.
          Hide
          Todd Lipcon added a comment -

          Attached patch applies on top of HADOOP-8212 and implements the common portions of the failover controller.

          Show
          Todd Lipcon added a comment - Attached patch applies on top of HADOOP-8212 and implements the common portions of the failover controller.

            People

            • Assignee:
              Todd Lipcon
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development