HBase
  1. HBase
  2. HBASE-5209

HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.90.5, 0.92.0, 0.94.0
    • Fix Version/s: 0.92.1, 0.94.0
    • Component/s: master
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      ClusterStatus client running this change running against servers without this change will encounter an error in readFields() as the new client will be expecting extra fields at the end of the Writable that the old server will not be providing.

      Description

      I have a multi-master HBase set up, and I'm trying to programmatically determine which of the masters is currently active. But the API does not allow me to do this. There is a getMaster() method in the HConnection class, but it returns an HMasterInterface, whose methods do not allow me to find out which master won the last race. The API should have a getActiveMasterHostname() or something to that effect.

      1. HBASE_5209_v5.diff
        22 kB
        David S. Wang
      2. 5209.addendum
        0.7 kB
        Ted Yu

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in HBase-0.92-security #104 (See https://builds.apache.org/job/HBase-0.92-security/104/)
          HBASE-5596 Few minor bugs from HBASE-5209 (David S. Wang) (Revision 1305663)

          Result = FAILURE
          jmhsieh :
          Files :

          • /hbase/branches/0.92/CHANGES.txt
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ServerName.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          Show
          Hudson added a comment - Integrated in HBase-0.92-security #104 (See https://builds.apache.org/job/HBase-0.92-security/104/ ) HBASE-5596 Few minor bugs from HBASE-5209 (David S. Wang) (Revision 1305663) Result = FAILURE jmhsieh : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ServerName.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #2697 (See https://builds.apache.org/job/HBase-TRUNK/2697/)
          HBASE-5596 Few minor bugs from HBASE-5209 (David S. Wang) (Revision 1305661)

          Result = FAILURE
          jmhsieh :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ServerName.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #2697 (See https://builds.apache.org/job/HBase-TRUNK/2697/ ) HBASE-5596 Few minor bugs from HBASE-5209 (David S. Wang) (Revision 1305661) Result = FAILURE jmhsieh : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ServerName.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-security #151 (See https://builds.apache.org/job/HBase-TRUNK-security/151/)
          HBASE-5596 Few minor bugs from HBASE-5209 (David S. Wang) (Revision 1305661)

          Result = FAILURE
          jmhsieh :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ServerName.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK-security #151 (See https://builds.apache.org/job/HBase-TRUNK-security/151/ ) HBASE-5596 Few minor bugs from HBASE-5209 (David S. Wang) (Revision 1305661) Result = FAILURE jmhsieh : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ServerName.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.92 #340 (See https://builds.apache.org/job/HBase-0.92/340/)
          HBASE-5596 Few minor bugs from HBASE-5209 (David S. Wang) (Revision 1305663)

          Result = FAILURE
          jmhsieh :
          Files :

          • /hbase/branches/0.92/CHANGES.txt
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ServerName.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          Show
          Hudson added a comment - Integrated in HBase-0.92 #340 (See https://builds.apache.org/job/HBase-0.92/340/ ) HBASE-5596 Few minor bugs from HBASE-5209 (David S. Wang) (Revision 1305663) Result = FAILURE jmhsieh : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ServerName.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94 #58 (See https://builds.apache.org/job/HBase-0.94/58/)
          HBASE-5596 Few minor bugs from HBASE-5209 (David S. Wang) (Revision 1305662)

          Result = ABORTED
          jmhsieh :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ServerName.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          Show
          Hudson added a comment - Integrated in HBase-0.94 #58 (See https://builds.apache.org/job/HBase-0.94/58/ ) HBASE-5596 Few minor bugs from HBASE-5209 (David S. Wang) (Revision 1305662) Result = ABORTED jmhsieh : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ServerName.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94-security #3 (See https://builds.apache.org/job/HBase-0.94-security/3/)
          HBASE-5596 Few minor bugs from HBASE-5209 (David S. Wang) (Revision 1305662)

          Result = ABORTED
          jmhsieh :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ServerName.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          Show
          Hudson added a comment - Integrated in HBase-0.94-security #3 (See https://builds.apache.org/job/HBase-0.94-security/3/ ) HBASE-5596 Few minor bugs from HBASE-5209 (David S. Wang) (Revision 1305662) Result = ABORTED jmhsieh : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ServerName.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3892/#review6055
          -----------------------------------------------------------

          Ship it!

          Dave,

          Additions look good to me. Since this is a little while since the others were committed please file another jira, and then I'll commit.

          Thanks!

          • jmhsieh

          On 2012-03-15 13:16:01, David Wang wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/3892/

          -----------------------------------------------------------

          (Updated 2012-03-15 13:16:01)

          Review request for hbase.

          Summary

          -------

          Problem:

          There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters.

          Solution:

          Augment ClusterStatus to return the currently active master and the list of backup masters.

          Notes:

          * I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it.

          * I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes.

          This addresses bug HBASE-5209.

          https://issues.apache.org/jira/browse/HBASE-5209

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ClusterStatus.java 9df4c10

          src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 7416ae2

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java e2bbbd0

          Diff: https://reviews.apache.org/r/3892/diff

          Testing

          -------

          * Ran mvn -P localTests test multiple times - no new tests fail

          * Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures

          * Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures

          * Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master)

          * Did the following before and after killing

          * hbase hbck -details - checked output to see that active and backup masters are reported properly

          * zk_dump - checked that active and backup masters are reported properly

          * Started cluster with no backup masters to make sure change operates correctly that way

          * Tested build with this diff vs. build without this diff, in all combinations of client and server

          * Verified that new client can run against old servers without incident and with the defaults applied.

          * Note that old clients get an error when running against new servers, because the old readFields() code in ClusterStatus does not handle exceptions of any kind. This is not solvable, at least in the scope of this change.

          12/02/15 15:15:38 INFO zookeeper.ClientCnxn: Session establishment complete on server haus02.sf.cloudera.com/172.29.5.33:30181, sessionid = 0x135834c75e20008, negotiated timeout = 5000

          12/02/15 15:15:39 ERROR io.HbaseObjectWritable: Error in readFields

          A record version mismatch occured. Expecting v2, found v3

          at org.apache.hadoop.io.VersionedWritable.readFields(VersionedWritable.java:46)

          at org.apache.hadoop.hbase.ClusterStatus.readFields(ClusterStatus.java:247)

          at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:583)

          at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297)

          * Ran dev-support/test-patch.sh - no new issues fail:

          -1 overall.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -136 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Thanks,

          David

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3892/#review6055 ----------------------------------------------------------- Ship it! Dave, Additions look good to me. Since this is a little while since the others were committed please file another jira, and then I'll commit. Thanks! jmhsieh On 2012-03-15 13:16:01, David Wang wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3892/ ----------------------------------------------------------- (Updated 2012-03-15 13:16:01) Review request for hbase. Summary ------- Problem: There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters. Solution: Augment ClusterStatus to return the currently active master and the list of backup masters. Notes: * I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it. * I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes. This addresses bug HBASE-5209 . https://issues.apache.org/jira/browse/HBASE-5209 Diffs ----- src/main/java/org/apache/hadoop/hbase/ClusterStatus.java 9df4c10 src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 7416ae2 src/main/java/org/apache/hadoop/hbase/master/HMaster.java e2bbbd0 Diff: https://reviews.apache.org/r/3892/diff Testing ------- * Ran mvn -P localTests test multiple times - no new tests fail * Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures * Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures * Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master) * Did the following before and after killing * hbase hbck -details - checked output to see that active and backup masters are reported properly * zk_dump - checked that active and backup masters are reported properly * Started cluster with no backup masters to make sure change operates correctly that way * Tested build with this diff vs. build without this diff, in all combinations of client and server * Verified that new client can run against old servers without incident and with the defaults applied. * Note that old clients get an error when running against new servers, because the old readFields() code in ClusterStatus does not handle exceptions of any kind. This is not solvable, at least in the scope of this change. 12/02/15 15:15:38 INFO zookeeper.ClientCnxn: Session establishment complete on server haus02.sf.cloudera.com/172.29.5.33:30181, sessionid = 0x135834c75e20008, negotiated timeout = 5000 12/02/15 15:15:39 ERROR io.HbaseObjectWritable: Error in readFields A record version mismatch occured. Expecting v2, found v3 at org.apache.hadoop.io.VersionedWritable.readFields(VersionedWritable.java:46) at org.apache.hadoop.hbase.ClusterStatus.readFields(ClusterStatus.java:247) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:583) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297) * Ran dev-support/test-patch.sh - no new issues fail: -1 overall. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. Thanks, David
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3892/
          -----------------------------------------------------------

          (Updated 2012-03-15 13:16:01.439710)

          Review request for hbase.

          Changes
          -------

          Addressed Jon's and Stack's comments with addendum patch. Passed unit tests and I threw it onto a setup to make sure everything looked OK.

          Summary
          -------

          Problem:
          There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters.

          Solution:
          Augment ClusterStatus to return the currently active master and the list of backup masters.

          Notes:

          • I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it.
          • I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes.

          This addresses bug HBASE-5209.
          https://issues.apache.org/jira/browse/HBASE-5209

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/ClusterStatus.java 9df4c10
          src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 7416ae2
          src/main/java/org/apache/hadoop/hbase/master/HMaster.java e2bbbd0

          Diff: https://reviews.apache.org/r/3892/diff

          Testing
          -------

          • Ran mvn -P localTests test multiple times - no new tests fail
          • Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures
          • Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures
          • Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master)
          • Did the following before and after killing
          • hbase hbck -details - checked output to see that active and backup masters are reported properly
          • zk_dump - checked that active and backup masters are reported properly
          • Started cluster with no backup masters to make sure change operates correctly that way
          • Tested build with this diff vs. build without this diff, in all combinations of client and server
          • Verified that new client can run against old servers without incident and with the defaults applied.
          • Note that old clients get an error when running against new servers, because the old readFields() code in ClusterStatus does not handle exceptions of any kind. This is not solvable, at least in the scope of this change.

          12/02/15 15:15:38 INFO zookeeper.ClientCnxn: Session establishment complete on server haus02.sf.cloudera.com/172.29.5.33:30181, sessionid = 0x135834c75e20008, negotiated timeout = 5000
          12/02/15 15:15:39 ERROR io.HbaseObjectWritable: Error in readFields
          A record version mismatch occured. Expecting v2, found v3
          at org.apache.hadoop.io.VersionedWritable.readFields(VersionedWritable.java:46)
          at org.apache.hadoop.hbase.ClusterStatus.readFields(ClusterStatus.java:247)
          at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:583)
          at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297)

          • Ran dev-support/test-patch.sh - no new issues fail:

          -1 overall.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -136 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Thanks,

          David

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3892/ ----------------------------------------------------------- (Updated 2012-03-15 13:16:01.439710) Review request for hbase. Changes ------- Addressed Jon's and Stack's comments with addendum patch. Passed unit tests and I threw it onto a setup to make sure everything looked OK. Summary ------- Problem: There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters. Solution: Augment ClusterStatus to return the currently active master and the list of backup masters. Notes: I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it. I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes. This addresses bug HBASE-5209 . https://issues.apache.org/jira/browse/HBASE-5209 Diffs (updated) src/main/java/org/apache/hadoop/hbase/ClusterStatus.java 9df4c10 src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 7416ae2 src/main/java/org/apache/hadoop/hbase/master/HMaster.java e2bbbd0 Diff: https://reviews.apache.org/r/3892/diff Testing ------- Ran mvn -P localTests test multiple times - no new tests fail Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master) Did the following before and after killing hbase hbck -details - checked output to see that active and backup masters are reported properly zk_dump - checked that active and backup masters are reported properly Started cluster with no backup masters to make sure change operates correctly that way Tested build with this diff vs. build without this diff, in all combinations of client and server Verified that new client can run against old servers without incident and with the defaults applied. Note that old clients get an error when running against new servers, because the old readFields() code in ClusterStatus does not handle exceptions of any kind. This is not solvable, at least in the scope of this change. 12/02/15 15:15:38 INFO zookeeper.ClientCnxn: Session establishment complete on server haus02.sf.cloudera.com/172.29.5.33:30181, sessionid = 0x135834c75e20008, negotiated timeout = 5000 12/02/15 15:15:39 ERROR io.HbaseObjectWritable: Error in readFields A record version mismatch occured. Expecting v2, found v3 at org.apache.hadoop.io.VersionedWritable.readFields(VersionedWritable.java:46) at org.apache.hadoop.hbase.ClusterStatus.readFields(ClusterStatus.java:247) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:583) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297) Ran dev-support/test-patch.sh - no new issues fail: -1 overall. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. Thanks, David
          Hide
          stack added a comment -

          Sorry. I missed that it was resolved.

          Show
          stack added a comment - Sorry. I missed that it was resolved.
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-03-10 00:35:03, Michael Stack wrote:

          > lgtm

          I didn't actually submit my updated patch yet in response to Jon's comments. I will soon.

          On 2012-03-10 00:35:03, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/ClusterStatus.java, line 227

          > <https://reviews.apache.org/r/3892/diff/3/?file=75401#file75401line227>

          >

          > This method seems a little superfluous

          I wanted to stay consistent with the other accessor APIs in this class that operated on similar lists.

          On 2012-03-10 00:35:03, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 1396

          > <https://reviews.apache.org/r/3892/diff/3/?file=75403#file75403line1396>

          >

          > Should we be writing the master name by doing ServerName#getVersionedBytes and then parseVersionedServerName

          I can do this.

          • David

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3892/#review5811
          -----------------------------------------------------------

          On 2012-02-16 06:30:31, David Wang wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/3892/

          -----------------------------------------------------------

          (Updated 2012-02-16 06:30:31)

          Review request for hbase.

          Summary

          -------

          Problem:

          There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters.

          Solution:

          Augment ClusterStatus to return the currently active master and the list of backup masters.

          Notes:

          * I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it.

          * I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes.

          This addresses bug HBASE-5209.

          https://issues.apache.org/jira/browse/HBASE-5209

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ClusterStatus.java b849429

          src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 2f60b23

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9d21903

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java f6f3f71

          src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 111f76e

          src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 3e3d131

          src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java 16e4744

          src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java bc98fb0

          Diff: https://reviews.apache.org/r/3892/diff

          Testing

          -------

          * Ran mvn -P localTests test multiple times - no new tests fail

          * Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures

          * Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures

          * Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master)

          * Did the following before and after killing

          * hbase hbck -details - checked output to see that active and backup masters are reported properly

          * zk_dump - checked that active and backup masters are reported properly

          * Started cluster with no backup masters to make sure change operates correctly that way

          * Tested build with this diff vs. build without this diff, in all combinations of client and server

          * Verified that new client can run against old servers without incident and with the defaults applied.

          * Note that old clients get an error when running against new servers, because the old readFields() code in ClusterStatus does not handle exceptions of any kind. This is not solvable, at least in the scope of this change.

          12/02/15 15:15:38 INFO zookeeper.ClientCnxn: Session establishment complete on server haus02.sf.cloudera.com/172.29.5.33:30181, sessionid = 0x135834c75e20008, negotiated timeout = 5000

          12/02/15 15:15:39 ERROR io.HbaseObjectWritable: Error in readFields

          A record version mismatch occured. Expecting v2, found v3

          at org.apache.hadoop.io.VersionedWritable.readFields(VersionedWritable.java:46)

          at org.apache.hadoop.hbase.ClusterStatus.readFields(ClusterStatus.java:247)

          at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:583)

          at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297)

          * Ran dev-support/test-patch.sh - no new issues fail:

          -1 overall.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -136 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Thanks,

          David

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-03-10 00:35:03, Michael Stack wrote: > lgtm I didn't actually submit my updated patch yet in response to Jon's comments. I will soon. On 2012-03-10 00:35:03, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/ClusterStatus.java, line 227 > < https://reviews.apache.org/r/3892/diff/3/?file=75401#file75401line227 > > > This method seems a little superfluous I wanted to stay consistent with the other accessor APIs in this class that operated on similar lists. On 2012-03-10 00:35:03, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 1396 > < https://reviews.apache.org/r/3892/diff/3/?file=75403#file75403line1396 > > > Should we be writing the master name by doing ServerName#getVersionedBytes and then parseVersionedServerName I can do this. David ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3892/#review5811 ----------------------------------------------------------- On 2012-02-16 06:30:31, David Wang wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3892/ ----------------------------------------------------------- (Updated 2012-02-16 06:30:31) Review request for hbase. Summary ------- Problem: There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters. Solution: Augment ClusterStatus to return the currently active master and the list of backup masters. Notes: * I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it. * I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes. This addresses bug HBASE-5209 . https://issues.apache.org/jira/browse/HBASE-5209 Diffs ----- src/main/java/org/apache/hadoop/hbase/ClusterStatus.java b849429 src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 2f60b23 src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9d21903 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java f6f3f71 src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 111f76e src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 3e3d131 src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java 16e4744 src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java bc98fb0 Diff: https://reviews.apache.org/r/3892/diff Testing ------- * Ran mvn -P localTests test multiple times - no new tests fail * Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures * Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures * Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master) * Did the following before and after killing * hbase hbck -details - checked output to see that active and backup masters are reported properly * zk_dump - checked that active and backup masters are reported properly * Started cluster with no backup masters to make sure change operates correctly that way * Tested build with this diff vs. build without this diff, in all combinations of client and server * Verified that new client can run against old servers without incident and with the defaults applied. * Note that old clients get an error when running against new servers, because the old readFields() code in ClusterStatus does not handle exceptions of any kind. This is not solvable, at least in the scope of this change. 12/02/15 15:15:38 INFO zookeeper.ClientCnxn: Session establishment complete on server haus02.sf.cloudera.com/172.29.5.33:30181, sessionid = 0x135834c75e20008, negotiated timeout = 5000 12/02/15 15:15:39 ERROR io.HbaseObjectWritable: Error in readFields A record version mismatch occured. Expecting v2, found v3 at org.apache.hadoop.io.VersionedWritable.readFields(VersionedWritable.java:46) at org.apache.hadoop.hbase.ClusterStatus.readFields(ClusterStatus.java:247) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:583) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297) * Ran dev-support/test-patch.sh - no new issues fail: -1 overall. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. Thanks, David
          Hide
          Ted Yu added a comment -

          Can we open new issue for future comments ?
          This JIRA has been resolved.

          Show
          Ted Yu added a comment - Can we open new issue for future comments ? This JIRA has been resolved.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3892/#review5811
          -----------------------------------------------------------

          Ship it!

          lgtm

          src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
          <https://reviews.apache.org/r/3892/#comment12695>

          This method seems a little superfluous

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          <https://reviews.apache.org/r/3892/#comment12696>

          Should we be writing the master name by doing ServerName#getVersionedBytes and then parseVersionedServerName

          • Michael

          On 2012-02-16 06:30:31, David Wang wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/3892/

          -----------------------------------------------------------

          (Updated 2012-02-16 06:30:31)

          Review request for hbase.

          Summary

          -------

          Problem:

          There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters.

          Solution:

          Augment ClusterStatus to return the currently active master and the list of backup masters.

          Notes:

          * I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it.

          * I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes.

          This addresses bug HBASE-5209.

          https://issues.apache.org/jira/browse/HBASE-5209

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ClusterStatus.java b849429

          src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 2f60b23

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9d21903

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java f6f3f71

          src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 111f76e

          src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 3e3d131

          src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java 16e4744

          src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java bc98fb0

          Diff: https://reviews.apache.org/r/3892/diff

          Testing

          -------

          * Ran mvn -P localTests test multiple times - no new tests fail

          * Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures

          * Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures

          * Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master)

          * Did the following before and after killing

          * hbase hbck -details - checked output to see that active and backup masters are reported properly

          * zk_dump - checked that active and backup masters are reported properly

          * Started cluster with no backup masters to make sure change operates correctly that way

          * Tested build with this diff vs. build without this diff, in all combinations of client and server

          * Verified that new client can run against old servers without incident and with the defaults applied.

          * Note that old clients get an error when running against new servers, because the old readFields() code in ClusterStatus does not handle exceptions of any kind. This is not solvable, at least in the scope of this change.

          12/02/15 15:15:38 INFO zookeeper.ClientCnxn: Session establishment complete on server haus02.sf.cloudera.com/172.29.5.33:30181, sessionid = 0x135834c75e20008, negotiated timeout = 5000

          12/02/15 15:15:39 ERROR io.HbaseObjectWritable: Error in readFields

          A record version mismatch occured. Expecting v2, found v3

          at org.apache.hadoop.io.VersionedWritable.readFields(VersionedWritable.java:46)

          at org.apache.hadoop.hbase.ClusterStatus.readFields(ClusterStatus.java:247)

          at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:583)

          at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297)

          * Ran dev-support/test-patch.sh - no new issues fail:

          -1 overall.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -136 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Thanks,

          David

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3892/#review5811 ----------------------------------------------------------- Ship it! lgtm src/main/java/org/apache/hadoop/hbase/ClusterStatus.java < https://reviews.apache.org/r/3892/#comment12695 > This method seems a little superfluous src/main/java/org/apache/hadoop/hbase/master/HMaster.java < https://reviews.apache.org/r/3892/#comment12696 > Should we be writing the master name by doing ServerName#getVersionedBytes and then parseVersionedServerName Michael On 2012-02-16 06:30:31, David Wang wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3892/ ----------------------------------------------------------- (Updated 2012-02-16 06:30:31) Review request for hbase. Summary ------- Problem: There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters. Solution: Augment ClusterStatus to return the currently active master and the list of backup masters. Notes: * I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it. * I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes. This addresses bug HBASE-5209 . https://issues.apache.org/jira/browse/HBASE-5209 Diffs ----- src/main/java/org/apache/hadoop/hbase/ClusterStatus.java b849429 src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 2f60b23 src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9d21903 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java f6f3f71 src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 111f76e src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 3e3d131 src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java 16e4744 src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java bc98fb0 Diff: https://reviews.apache.org/r/3892/diff Testing ------- * Ran mvn -P localTests test multiple times - no new tests fail * Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures * Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures * Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master) * Did the following before and after killing * hbase hbck -details - checked output to see that active and backup masters are reported properly * zk_dump - checked that active and backup masters are reported properly * Started cluster with no backup masters to make sure change operates correctly that way * Tested build with this diff vs. build without this diff, in all combinations of client and server * Verified that new client can run against old servers without incident and with the defaults applied. * Note that old clients get an error when running against new servers, because the old readFields() code in ClusterStatus does not handle exceptions of any kind. This is not solvable, at least in the scope of this change. 12/02/15 15:15:38 INFO zookeeper.ClientCnxn: Session establishment complete on server haus02.sf.cloudera.com/172.29.5.33:30181, sessionid = 0x135834c75e20008, negotiated timeout = 5000 12/02/15 15:15:39 ERROR io.HbaseObjectWritable: Error in readFields A record version mismatch occured. Expecting v2, found v3 at org.apache.hadoop.io.VersionedWritable.readFields(VersionedWritable.java:46) at org.apache.hadoop.hbase.ClusterStatus.readFields(ClusterStatus.java:247) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:583) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297) * Ran dev-support/test-patch.sh - no new issues fail: -1 overall. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. Thanks, David
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-03-07 23:38:39, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java, line 143

          > <https://reviews.apache.org/r/3892/diff/3/?file=75402#file75402line143>

          >

          > Another JIRA is good for me.

          Thanks, I'll file another one and also push one more patch to this one to address the other minor comments that Jon made.

          • David

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3892/#review5696
          -----------------------------------------------------------

          On 2012-02-16 06:30:31, David Wang wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/3892/

          -----------------------------------------------------------

          (Updated 2012-02-16 06:30:31)

          Review request for hbase.

          Summary

          -------

          Problem:

          There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters.

          Solution:

          Augment ClusterStatus to return the currently active master and the list of backup masters.

          Notes:

          * I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it.

          * I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes.

          This addresses bug HBASE-5209.

          https://issues.apache.org/jira/browse/HBASE-5209

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ClusterStatus.java b849429

          src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 2f60b23

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9d21903

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java f6f3f71

          src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 111f76e

          src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 3e3d131

          src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java 16e4744

          src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java bc98fb0

          Diff: https://reviews.apache.org/r/3892/diff

          Testing

          -------

          * Ran mvn -P localTests test multiple times - no new tests fail

          * Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures

          * Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures

          * Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master)

          * Did the following before and after killing

          * hbase hbck -details - checked output to see that active and backup masters are reported properly

          * zk_dump - checked that active and backup masters are reported properly

          * Started cluster with no backup masters to make sure change operates correctly that way

          * Tested build with this diff vs. build without this diff, in all combinations of client and server

          * Verified that new client can run against old servers without incident and with the defaults applied.

          * Note that old clients get an error when running against new servers, because the old readFields() code in ClusterStatus does not handle exceptions of any kind. This is not solvable, at least in the scope of this change.

          12/02/15 15:15:38 INFO zookeeper.ClientCnxn: Session establishment complete on server haus02.sf.cloudera.com/172.29.5.33:30181, sessionid = 0x135834c75e20008, negotiated timeout = 5000

          12/02/15 15:15:39 ERROR io.HbaseObjectWritable: Error in readFields

          A record version mismatch occured. Expecting v2, found v3

          at org.apache.hadoop.io.VersionedWritable.readFields(VersionedWritable.java:46)

          at org.apache.hadoop.hbase.ClusterStatus.readFields(ClusterStatus.java:247)

          at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:583)

          at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297)

          * Ran dev-support/test-patch.sh - no new issues fail:

          -1 overall.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -136 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Thanks,

          David

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-03-07 23:38:39, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java, line 143 > < https://reviews.apache.org/r/3892/diff/3/?file=75402#file75402line143 > > > Another JIRA is good for me. Thanks, I'll file another one and also push one more patch to this one to address the other minor comments that Jon made. David ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3892/#review5696 ----------------------------------------------------------- On 2012-02-16 06:30:31, David Wang wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3892/ ----------------------------------------------------------- (Updated 2012-02-16 06:30:31) Review request for hbase. Summary ------- Problem: There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters. Solution: Augment ClusterStatus to return the currently active master and the list of backup masters. Notes: * I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it. * I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes. This addresses bug HBASE-5209 . https://issues.apache.org/jira/browse/HBASE-5209 Diffs ----- src/main/java/org/apache/hadoop/hbase/ClusterStatus.java b849429 src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 2f60b23 src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9d21903 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java f6f3f71 src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 111f76e src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 3e3d131 src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java 16e4744 src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java bc98fb0 Diff: https://reviews.apache.org/r/3892/diff Testing ------- * Ran mvn -P localTests test multiple times - no new tests fail * Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures * Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures * Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master) * Did the following before and after killing * hbase hbck -details - checked output to see that active and backup masters are reported properly * zk_dump - checked that active and backup masters are reported properly * Started cluster with no backup masters to make sure change operates correctly that way * Tested build with this diff vs. build without this diff, in all combinations of client and server * Verified that new client can run against old servers without incident and with the defaults applied. * Note that old clients get an error when running against new servers, because the old readFields() code in ClusterStatus does not handle exceptions of any kind. This is not solvable, at least in the scope of this change. 12/02/15 15:15:38 INFO zookeeper.ClientCnxn: Session establishment complete on server haus02.sf.cloudera.com/172.29.5.33:30181, sessionid = 0x135834c75e20008, negotiated timeout = 5000 12/02/15 15:15:39 ERROR io.HbaseObjectWritable: Error in readFields A record version mismatch occured. Expecting v2, found v3 at org.apache.hadoop.io.VersionedWritable.readFields(VersionedWritable.java:46) at org.apache.hadoop.hbase.ClusterStatus.readFields(ClusterStatus.java:247) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:583) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297) * Ran dev-support/test-patch.sh - no new issues fail: -1 overall. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. Thanks, David
          Hide
          Ted Yu added a comment -

          ZOOKEEPER-1407 'Support GetData and GetChildren in Multi' is needed for the new JIRA.

          Show
          Ted Yu added a comment - ZOOKEEPER-1407 'Support GetData and GetChildren in Multi' is needed for the new JIRA.
          Hide
          jiraposter@reviews.apache.org added a comment - - edited

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3892/#review5696
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java
          <https://reviews.apache.org/r/3892/#comment12401>

          Another JIRA is good for me.

          • Ted
          Show
          jiraposter@reviews.apache.org added a comment - - edited ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3892/#review5696 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java < https://reviews.apache.org/r/3892/#comment12401 > Another JIRA is good for me. Ted
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-03-07 06:07:06, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java, line 143

          > <https://reviews.apache.org/r/3892/diff/3/?file=75402#file75402line143>

          >

          > From zookeeper 3.4.0 upward, the following function is provided:

          >

          > public List<OpResult> multi(Iterable<Op> ops) throws InterruptedException, KeeperException {

          >

          > where:

          > * @param ops An iterable that contains the operations to be done.

          > * These should be created using the factory methods on {@link Op}.

          >

          Ted,

          I think this would work, thanks for the suggestion as I was not aware of this relatively new functionality. However, I think adding support for this to the zookeeper code inside HBase might be too much to do for this particular JIRA, as it would involve changes to ZKUtil, RecoverableZooKeeper, and maybe more. Should I file another JIRA for that?

          • David

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3892/#review5680
          -----------------------------------------------------------

          On 2012-02-16 06:30:31, David Wang wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/3892/

          -----------------------------------------------------------

          (Updated 2012-02-16 06:30:31)

          Review request for hbase.

          Summary

          -------

          Problem:

          There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters.

          Solution:

          Augment ClusterStatus to return the currently active master and the list of backup masters.

          Notes:

          * I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it.

          * I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes.

          This addresses bug HBASE-5209.

          https://issues.apache.org/jira/browse/HBASE-5209

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ClusterStatus.java b849429

          src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 2f60b23

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9d21903

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java f6f3f71

          src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 111f76e

          src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 3e3d131

          src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java 16e4744

          src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java bc98fb0

          Diff: https://reviews.apache.org/r/3892/diff

          Testing

          -------

          * Ran mvn -P localTests test multiple times - no new tests fail

          * Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures

          * Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures

          * Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master)

          * Did the following before and after killing

          * hbase hbck -details - checked output to see that active and backup masters are reported properly

          * zk_dump - checked that active and backup masters are reported properly

          * Started cluster with no backup masters to make sure change operates correctly that way

          * Tested build with this diff vs. build without this diff, in all combinations of client and server

          * Verified that new client can run against old servers without incident and with the defaults applied.

          * Note that old clients get an error when running against new servers, because the old readFields() code in ClusterStatus does not handle exceptions of any kind. This is not solvable, at least in the scope of this change.

          12/02/15 15:15:38 INFO zookeeper.ClientCnxn: Session establishment complete on server haus02.sf.cloudera.com/172.29.5.33:30181, sessionid = 0x135834c75e20008, negotiated timeout = 5000

          12/02/15 15:15:39 ERROR io.HbaseObjectWritable: Error in readFields

          A record version mismatch occured. Expecting v2, found v3

          at org.apache.hadoop.io.VersionedWritable.readFields(VersionedWritable.java:46)

          at org.apache.hadoop.hbase.ClusterStatus.readFields(ClusterStatus.java:247)

          at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:583)

          at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297)

          * Ran dev-support/test-patch.sh - no new issues fail:

          -1 overall.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -136 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Thanks,

          David

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-03-07 06:07:06, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java, line 143 > < https://reviews.apache.org/r/3892/diff/3/?file=75402#file75402line143 > > > From zookeeper 3.4.0 upward, the following function is provided: > > public List<OpResult> multi(Iterable<Op> ops) throws InterruptedException, KeeperException { > > where: > * @param ops An iterable that contains the operations to be done. > * These should be created using the factory methods on {@link Op}. > Ted, I think this would work, thanks for the suggestion as I was not aware of this relatively new functionality. However, I think adding support for this to the zookeeper code inside HBase might be too much to do for this particular JIRA, as it would involve changes to ZKUtil, RecoverableZooKeeper, and maybe more. Should I file another JIRA for that? David ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3892/#review5680 ----------------------------------------------------------- On 2012-02-16 06:30:31, David Wang wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3892/ ----------------------------------------------------------- (Updated 2012-02-16 06:30:31) Review request for hbase. Summary ------- Problem: There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters. Solution: Augment ClusterStatus to return the currently active master and the list of backup masters. Notes: * I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it. * I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes. This addresses bug HBASE-5209 . https://issues.apache.org/jira/browse/HBASE-5209 Diffs ----- src/main/java/org/apache/hadoop/hbase/ClusterStatus.java b849429 src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 2f60b23 src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9d21903 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java f6f3f71 src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 111f76e src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 3e3d131 src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java 16e4744 src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java bc98fb0 Diff: https://reviews.apache.org/r/3892/diff Testing ------- * Ran mvn -P localTests test multiple times - no new tests fail * Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures * Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures * Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master) * Did the following before and after killing * hbase hbck -details - checked output to see that active and backup masters are reported properly * zk_dump - checked that active and backup masters are reported properly * Started cluster with no backup masters to make sure change operates correctly that way * Tested build with this diff vs. build without this diff, in all combinations of client and server * Verified that new client can run against old servers without incident and with the defaults applied. * Note that old clients get an error when running against new servers, because the old readFields() code in ClusterStatus does not handle exceptions of any kind. This is not solvable, at least in the scope of this change. 12/02/15 15:15:38 INFO zookeeper.ClientCnxn: Session establishment complete on server haus02.sf.cloudera.com/172.29.5.33:30181, sessionid = 0x135834c75e20008, negotiated timeout = 5000 12/02/15 15:15:39 ERROR io.HbaseObjectWritable: Error in readFields A record version mismatch occured. Expecting v2, found v3 at org.apache.hadoop.io.VersionedWritable.readFields(VersionedWritable.java:46) at org.apache.hadoop.hbase.ClusterStatus.readFields(ClusterStatus.java:247) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:583) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297) * Ran dev-support/test-patch.sh - no new issues fail: -1 overall. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. Thanks, David
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3892/#review5680
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java
          <https://reviews.apache.org/r/3892/#comment12382>

          From zookeeper 3.4.0 upward, the following function is provided:

          public List<OpResult> multi(Iterable<Op> ops) throws InterruptedException, KeeperException {

          where:

          • @param ops An iterable that contains the operations to be done.
          • These should be created using the factory methods on {@link Op}

            .

          • Ted

          On 2012-02-16 06:30:31, David Wang wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/3892/

          -----------------------------------------------------------

          (Updated 2012-02-16 06:30:31)

          Review request for hbase.

          Summary

          -------

          Problem:

          There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters.

          Solution:

          Augment ClusterStatus to return the currently active master and the list of backup masters.

          Notes:

          * I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it.

          * I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes.

          This addresses bug HBASE-5209.

          https://issues.apache.org/jira/browse/HBASE-5209

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ClusterStatus.java b849429

          src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 2f60b23

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9d21903

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java f6f3f71

          src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 111f76e

          src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 3e3d131

          src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java 16e4744

          src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java bc98fb0

          Diff: https://reviews.apache.org/r/3892/diff

          Testing

          -------

          * Ran mvn -P localTests test multiple times - no new tests fail

          * Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures

          * Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures

          * Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master)

          * Did the following before and after killing

          * hbase hbck -details - checked output to see that active and backup masters are reported properly

          * zk_dump - checked that active and backup masters are reported properly

          * Started cluster with no backup masters to make sure change operates correctly that way

          * Tested build with this diff vs. build without this diff, in all combinations of client and server

          * Verified that new client can run against old servers without incident and with the defaults applied.

          * Note that old clients get an error when running against new servers, because the old readFields() code in ClusterStatus does not handle exceptions of any kind. This is not solvable, at least in the scope of this change.

          12/02/15 15:15:38 INFO zookeeper.ClientCnxn: Session establishment complete on server haus02.sf.cloudera.com/172.29.5.33:30181, sessionid = 0x135834c75e20008, negotiated timeout = 5000

          12/02/15 15:15:39 ERROR io.HbaseObjectWritable: Error in readFields

          A record version mismatch occured. Expecting v2, found v3

          at org.apache.hadoop.io.VersionedWritable.readFields(VersionedWritable.java:46)

          at org.apache.hadoop.hbase.ClusterStatus.readFields(ClusterStatus.java:247)

          at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:583)

          at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297)

          * Ran dev-support/test-patch.sh - no new issues fail:

          -1 overall.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -136 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Thanks,

          David

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3892/#review5680 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java < https://reviews.apache.org/r/3892/#comment12382 > From zookeeper 3.4.0 upward, the following function is provided: public List<OpResult> multi(Iterable<Op> ops) throws InterruptedException, KeeperException { where: @param ops An iterable that contains the operations to be done. These should be created using the factory methods on {@link Op} . Ted On 2012-02-16 06:30:31, David Wang wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3892/ ----------------------------------------------------------- (Updated 2012-02-16 06:30:31) Review request for hbase. Summary ------- Problem: There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters. Solution: Augment ClusterStatus to return the currently active master and the list of backup masters. Notes: * I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it. * I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes. This addresses bug HBASE-5209 . https://issues.apache.org/jira/browse/HBASE-5209 Diffs ----- src/main/java/org/apache/hadoop/hbase/ClusterStatus.java b849429 src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 2f60b23 src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9d21903 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java f6f3f71 src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 111f76e src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 3e3d131 src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java 16e4744 src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java bc98fb0 Diff: https://reviews.apache.org/r/3892/diff Testing ------- * Ran mvn -P localTests test multiple times - no new tests fail * Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures * Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures * Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master) * Did the following before and after killing * hbase hbck -details - checked output to see that active and backup masters are reported properly * zk_dump - checked that active and backup masters are reported properly * Started cluster with no backup masters to make sure change operates correctly that way * Tested build with this diff vs. build without this diff, in all combinations of client and server * Verified that new client can run against old servers without incident and with the defaults applied. * Note that old clients get an error when running against new servers, because the old readFields() code in ClusterStatus does not handle exceptions of any kind. This is not solvable, at least in the scope of this change. 12/02/15 15:15:38 INFO zookeeper.ClientCnxn: Session establishment complete on server haus02.sf.cloudera.com/172.29.5.33:30181, sessionid = 0x135834c75e20008, negotiated timeout = 5000 12/02/15 15:15:39 ERROR io.HbaseObjectWritable: Error in readFields A record version mismatch occured. Expecting v2, found v3 at org.apache.hadoop.io.VersionedWritable.readFields(VersionedWritable.java:46) at org.apache.hadoop.hbase.ClusterStatus.readFields(ClusterStatus.java:247) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:583) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297) * Ran dev-support/test-patch.sh - no new issues fail: -1 overall. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. Thanks, David
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-03-07 00:53:15, jmhsieh wrote:

          > Should probably file follow-on issues to update AvroUtil.csToACS (convert to ClusterStatus to Avro for avro server) and likely to thrift/rest clients as well.

          OK.

          On 2012-03-07 00:53:15, jmhsieh wrote:

          > src/main/java/org/apache/hadoop/hbase/ClusterStatus.java, line 234

          > <https://reviews.apache.org/r/3892/diff/3/?file=75401#file75401line234>

          >

          > Would a copy be safer?

          backupMasters has no way of being modified after the instance is created, so I don't think a copy is any safer, if you are thinking about "safer" == "immutable". If someone added a mutator method for backupMasters, then yes we would have a problem, but I find that unlikely in this case given the mission of ClusterStatus being a snapshot of the cluster state.

          On 2012-03-07 00:53:15, jmhsieh wrote:

          > src/main/java/org/apache/hadoop/hbase/ClusterStatus.java, line 180

          > <https://reviews.apache.org/r/3892/diff/3/?file=75401#file75401line180>

          >

          > Since the type is Collection<xxx>, I have a feeling that there may be a equality and hash issues here. This should probably be:

          >

          > this.backupMasters.containsAll(((ClusterStatus)o).backupMasters)

          >

          > alternately, (if not present), add unit tests to convince that equality and hash operations work?

          Easier to just change this to containsAll() rather than add unit tests, since I think another patch is needed here.

          On 2012-03-07 00:53:15, jmhsieh wrote:

          > src/main/java/org/apache/hadoop/hbase/ClusterStatus.java, line 70

          > <https://reviews.apache.org/r/3892/diff/3/?file=75401#file75401line70>

          >

          > Unlikely, but someone could possibly name their host "unknown". Maybe change this to be something that is normally illegal as a host name (start with a symbol, maybe "#unknown#".)

          >

          OK.

          On 2012-03-07 00:53:15, jmhsieh wrote:

          > src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 1387

          > <https://reviews.apache.org/r/3892/diff/3/?file=75403#file75403line1387>

          >

          > Is this guaranteed to read/return values in the same order all the time? This may be is important for the equality comparison and hashcode comparisons. (Doesn't seemed to be used in hbase core code, but exposed to clients).

          I'll add a sort with appropriate Comparator, as the backing function from ZooKeeper does not guarantee ordering.

          On 2012-03-07 00:53:15, jmhsieh wrote:

          > src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java, line 143

          > <https://reviews.apache.org/r/3892/diff/3/?file=75402#file75402line143>

          >

          > Would this create a small race – why not have all masters write into a "potential-masters" zdir instead of "backup-masters" zdir and then juat have a node for the active master?

          >

          > Then we don't have to worry about someone accidentally reading an in between state where the a master is active and still in backup-masters or not master and not in backup-masters.

          I think you are suggesting having a list of potential masters, and deriving the list of backup masters from that list by just subtracting the active master.

          The problem then is that there is then a race whenever you derive the backup masters, between when you read the potential masters and when you read the active master, as those are separate operations and therefore something can change in between them. You're just moving where the race happens.

          What is really needed here is a read-write lock that protects this area. There are no read-write (or any) ZK-based locks currently in HBase, so I'm not sure if we want to add an implementation of that just for this case.

          • David

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3892/#review5673
          -----------------------------------------------------------

          On 2012-02-16 06:30:31, David Wang wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/3892/

          -----------------------------------------------------------

          (Updated 2012-02-16 06:30:31)

          Review request for hbase.

          Summary

          -------

          Problem:

          There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters.

          Solution:

          Augment ClusterStatus to return the currently active master and the list of backup masters.

          Notes:

          * I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it.

          * I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes.

          This addresses bug HBASE-5209.

          https://issues.apache.org/jira/browse/HBASE-5209

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ClusterStatus.java b849429

          src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 2f60b23

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9d21903

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java f6f3f71

          src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 111f76e

          src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 3e3d131

          src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java 16e4744

          src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java bc98fb0

          Diff: https://reviews.apache.org/r/3892/diff

          Testing

          -------

          * Ran mvn -P localTests test multiple times - no new tests fail

          * Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures

          * Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures

          * Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master)

          * Did the following before and after killing

          * hbase hbck -details - checked output to see that active and backup masters are reported properly

          * zk_dump - checked that active and backup masters are reported properly

          * Started cluster with no backup masters to make sure change operates correctly that way

          * Tested build with this diff vs. build without this diff, in all combinations of client and server

          * Verified that new client can run against old servers without incident and with the defaults applied.

          * Note that old clients get an error when running against new servers, because the old readFields() code in ClusterStatus does not handle exceptions of any kind. This is not solvable, at least in the scope of this change.

          12/02/15 15:15:38 INFO zookeeper.ClientCnxn: Session establishment complete on server haus02.sf.cloudera.com/172.29.5.33:30181, sessionid = 0x135834c75e20008, negotiated timeout = 5000

          12/02/15 15:15:39 ERROR io.HbaseObjectWritable: Error in readFields

          A record version mismatch occured. Expecting v2, found v3

          at org.apache.hadoop.io.VersionedWritable.readFields(VersionedWritable.java:46)

          at org.apache.hadoop.hbase.ClusterStatus.readFields(ClusterStatus.java:247)

          at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:583)

          at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297)

          * Ran dev-support/test-patch.sh - no new issues fail:

          -1 overall.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -136 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Thanks,

          David

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-03-07 00:53:15, jmhsieh wrote: > Should probably file follow-on issues to update AvroUtil.csToACS (convert to ClusterStatus to Avro for avro server) and likely to thrift/rest clients as well. OK. On 2012-03-07 00:53:15, jmhsieh wrote: > src/main/java/org/apache/hadoop/hbase/ClusterStatus.java, line 234 > < https://reviews.apache.org/r/3892/diff/3/?file=75401#file75401line234 > > > Would a copy be safer? backupMasters has no way of being modified after the instance is created, so I don't think a copy is any safer, if you are thinking about "safer" == "immutable". If someone added a mutator method for backupMasters, then yes we would have a problem, but I find that unlikely in this case given the mission of ClusterStatus being a snapshot of the cluster state. On 2012-03-07 00:53:15, jmhsieh wrote: > src/main/java/org/apache/hadoop/hbase/ClusterStatus.java, line 180 > < https://reviews.apache.org/r/3892/diff/3/?file=75401#file75401line180 > > > Since the type is Collection<xxx>, I have a feeling that there may be a equality and hash issues here. This should probably be: > > this.backupMasters.containsAll(((ClusterStatus)o).backupMasters) > > alternately, (if not present), add unit tests to convince that equality and hash operations work? Easier to just change this to containsAll() rather than add unit tests, since I think another patch is needed here. On 2012-03-07 00:53:15, jmhsieh wrote: > src/main/java/org/apache/hadoop/hbase/ClusterStatus.java, line 70 > < https://reviews.apache.org/r/3892/diff/3/?file=75401#file75401line70 > > > Unlikely, but someone could possibly name their host "unknown". Maybe change this to be something that is normally illegal as a host name (start with a symbol, maybe "#unknown#".) > OK. On 2012-03-07 00:53:15, jmhsieh wrote: > src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 1387 > < https://reviews.apache.org/r/3892/diff/3/?file=75403#file75403line1387 > > > Is this guaranteed to read/return values in the same order all the time? This may be is important for the equality comparison and hashcode comparisons. (Doesn't seemed to be used in hbase core code, but exposed to clients). I'll add a sort with appropriate Comparator, as the backing function from ZooKeeper does not guarantee ordering. On 2012-03-07 00:53:15, jmhsieh wrote: > src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java, line 143 > < https://reviews.apache.org/r/3892/diff/3/?file=75402#file75402line143 > > > Would this create a small race – why not have all masters write into a "potential-masters" zdir instead of "backup-masters" zdir and then juat have a node for the active master? > > Then we don't have to worry about someone accidentally reading an in between state where the a master is active and still in backup-masters or not master and not in backup-masters. I think you are suggesting having a list of potential masters, and deriving the list of backup masters from that list by just subtracting the active master. The problem then is that there is then a race whenever you derive the backup masters, between when you read the potential masters and when you read the active master, as those are separate operations and therefore something can change in between them. You're just moving where the race happens. What is really needed here is a read-write lock that protects this area. There are no read-write (or any) ZK-based locks currently in HBase, so I'm not sure if we want to add an implementation of that just for this case. David ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3892/#review5673 ----------------------------------------------------------- On 2012-02-16 06:30:31, David Wang wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3892/ ----------------------------------------------------------- (Updated 2012-02-16 06:30:31) Review request for hbase. Summary ------- Problem: There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters. Solution: Augment ClusterStatus to return the currently active master and the list of backup masters. Notes: * I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it. * I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes. This addresses bug HBASE-5209 . https://issues.apache.org/jira/browse/HBASE-5209 Diffs ----- src/main/java/org/apache/hadoop/hbase/ClusterStatus.java b849429 src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 2f60b23 src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9d21903 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java f6f3f71 src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 111f76e src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 3e3d131 src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java 16e4744 src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java bc98fb0 Diff: https://reviews.apache.org/r/3892/diff Testing ------- * Ran mvn -P localTests test multiple times - no new tests fail * Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures * Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures * Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master) * Did the following before and after killing * hbase hbck -details - checked output to see that active and backup masters are reported properly * zk_dump - checked that active and backup masters are reported properly * Started cluster with no backup masters to make sure change operates correctly that way * Tested build with this diff vs. build without this diff, in all combinations of client and server * Verified that new client can run against old servers without incident and with the defaults applied. * Note that old clients get an error when running against new servers, because the old readFields() code in ClusterStatus does not handle exceptions of any kind. This is not solvable, at least in the scope of this change. 12/02/15 15:15:38 INFO zookeeper.ClientCnxn: Session establishment complete on server haus02.sf.cloudera.com/172.29.5.33:30181, sessionid = 0x135834c75e20008, negotiated timeout = 5000 12/02/15 15:15:39 ERROR io.HbaseObjectWritable: Error in readFields A record version mismatch occured. Expecting v2, found v3 at org.apache.hadoop.io.VersionedWritable.readFields(VersionedWritable.java:46) at org.apache.hadoop.hbase.ClusterStatus.readFields(ClusterStatus.java:247) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:583) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297) * Ran dev-support/test-patch.sh - no new issues fail: -1 overall. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. Thanks, David
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3892/#review5673
          -----------------------------------------------------------

          Should probably file follow-on issues to update AvroUtil.csToACS (convert to ClusterStatus to Avro for avro server) and likely to thrift/rest clients as well.

          src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
          <https://reviews.apache.org/r/3892/#comment12356>

          Unlikely, but someone could possibly name their host "unknown". Maybe change this to be something that is normally illegal as a host name (start with a symbol, maybe "#unknown#".)

          src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
          <https://reviews.apache.org/r/3892/#comment12354>

          Since the type is Collection<xxx>, I have a feeling that there may be a equality and hash issues here. This should probably be:

          this.backupMasters.containsAll(((ClusterStatus)o).backupMasters)

          alternately, (if not present), add unit tests to convince that equality and hash operations work?

          src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
          <https://reviews.apache.org/r/3892/#comment12357>

          Would a copy be safer?

          src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java
          <https://reviews.apache.org/r/3892/#comment12358>

          Would this create a small race – why not have all masters write into a "potential-masters" zdir instead of "backup-masters" zdir and then juat have a node for the active master?

          Then we don't have to worry about someone accidentally reading an in between state where the a master is active and still in backup-masters or not master and not in backup-masters.

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          <https://reviews.apache.org/r/3892/#comment12359>

          Is this guaranteed to read/return values in the same order all the time? This may be is important for the equality comparison and hashcode comparisons. (Doesn't seemed to be used in hbase core code, but exposed to clients).

          • jmhsieh

          On 2012-02-16 06:30:31, David Wang wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/3892/

          -----------------------------------------------------------

          (Updated 2012-02-16 06:30:31)

          Review request for hbase.

          Summary

          -------

          Problem:

          There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters.

          Solution:

          Augment ClusterStatus to return the currently active master and the list of backup masters.

          Notes:

          * I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it.

          * I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes.

          This addresses bug HBASE-5209.

          https://issues.apache.org/jira/browse/HBASE-5209

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ClusterStatus.java b849429

          src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 2f60b23

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9d21903

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java f6f3f71

          src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 111f76e

          src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 3e3d131

          src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java 16e4744

          src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java bc98fb0

          Diff: https://reviews.apache.org/r/3892/diff

          Testing

          -------

          * Ran mvn -P localTests test multiple times - no new tests fail

          * Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures

          * Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures

          * Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master)

          * Did the following before and after killing

          * hbase hbck -details - checked output to see that active and backup masters are reported properly

          * zk_dump - checked that active and backup masters are reported properly

          * Started cluster with no backup masters to make sure change operates correctly that way

          * Tested build with this diff vs. build without this diff, in all combinations of client and server

          * Verified that new client can run against old servers without incident and with the defaults applied.

          * Note that old clients get an error when running against new servers, because the old readFields() code in ClusterStatus does not handle exceptions of any kind. This is not solvable, at least in the scope of this change.

          12/02/15 15:15:38 INFO zookeeper.ClientCnxn: Session establishment complete on server haus02.sf.cloudera.com/172.29.5.33:30181, sessionid = 0x135834c75e20008, negotiated timeout = 5000

          12/02/15 15:15:39 ERROR io.HbaseObjectWritable: Error in readFields

          A record version mismatch occured. Expecting v2, found v3

          at org.apache.hadoop.io.VersionedWritable.readFields(VersionedWritable.java:46)

          at org.apache.hadoop.hbase.ClusterStatus.readFields(ClusterStatus.java:247)

          at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:583)

          at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297)

          * Ran dev-support/test-patch.sh - no new issues fail:

          -1 overall.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -136 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Thanks,

          David

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3892/#review5673 ----------------------------------------------------------- Should probably file follow-on issues to update AvroUtil.csToACS (convert to ClusterStatus to Avro for avro server) and likely to thrift/rest clients as well. src/main/java/org/apache/hadoop/hbase/ClusterStatus.java < https://reviews.apache.org/r/3892/#comment12356 > Unlikely, but someone could possibly name their host "unknown". Maybe change this to be something that is normally illegal as a host name (start with a symbol, maybe "#unknown#".) src/main/java/org/apache/hadoop/hbase/ClusterStatus.java < https://reviews.apache.org/r/3892/#comment12354 > Since the type is Collection<xxx>, I have a feeling that there may be a equality and hash issues here. This should probably be: this.backupMasters.containsAll(((ClusterStatus)o).backupMasters) alternately, (if not present), add unit tests to convince that equality and hash operations work? src/main/java/org/apache/hadoop/hbase/ClusterStatus.java < https://reviews.apache.org/r/3892/#comment12357 > Would a copy be safer? src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java < https://reviews.apache.org/r/3892/#comment12358 > Would this create a small race – why not have all masters write into a "potential-masters" zdir instead of "backup-masters" zdir and then juat have a node for the active master? Then we don't have to worry about someone accidentally reading an in between state where the a master is active and still in backup-masters or not master and not in backup-masters. src/main/java/org/apache/hadoop/hbase/master/HMaster.java < https://reviews.apache.org/r/3892/#comment12359 > Is this guaranteed to read/return values in the same order all the time? This may be is important for the equality comparison and hashcode comparisons. (Doesn't seemed to be used in hbase core code, but exposed to clients). jmhsieh On 2012-02-16 06:30:31, David Wang wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3892/ ----------------------------------------------------------- (Updated 2012-02-16 06:30:31) Review request for hbase. Summary ------- Problem: There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters. Solution: Augment ClusterStatus to return the currently active master and the list of backup masters. Notes: * I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it. * I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes. This addresses bug HBASE-5209 . https://issues.apache.org/jira/browse/HBASE-5209 Diffs ----- src/main/java/org/apache/hadoop/hbase/ClusterStatus.java b849429 src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 2f60b23 src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9d21903 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java f6f3f71 src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 111f76e src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 3e3d131 src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java 16e4744 src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java bc98fb0 Diff: https://reviews.apache.org/r/3892/diff Testing ------- * Ran mvn -P localTests test multiple times - no new tests fail * Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures * Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures * Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master) * Did the following before and after killing * hbase hbck -details - checked output to see that active and backup masters are reported properly * zk_dump - checked that active and backup masters are reported properly * Started cluster with no backup masters to make sure change operates correctly that way * Tested build with this diff vs. build without this diff, in all combinations of client and server * Verified that new client can run against old servers without incident and with the defaults applied. * Note that old clients get an error when running against new servers, because the old readFields() code in ClusterStatus does not handle exceptions of any kind. This is not solvable, at least in the scope of this change. 12/02/15 15:15:38 INFO zookeeper.ClientCnxn: Session establishment complete on server haus02.sf.cloudera.com/172.29.5.33:30181, sessionid = 0x135834c75e20008, negotiated timeout = 5000 12/02/15 15:15:39 ERROR io.HbaseObjectWritable: Error in readFields A record version mismatch occured. Expecting v2, found v3 at org.apache.hadoop.io.VersionedWritable.readFields(VersionedWritable.java:46) at org.apache.hadoop.hbase.ClusterStatus.readFields(ClusterStatus.java:247) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:583) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297) * Ran dev-support/test-patch.sh - no new issues fail: -1 overall. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. Thanks, David
          Hide
          Hudson added a comment -

          Integrated in HBase-0.92-security #93 (See https://builds.apache.org/job/HBase-0.92-security/93/)
          HBASE-5209 Addendum adds znode creation call (David Wang) (Revision 1292068)

          Result = FAILURE
          tedyu :
          Files :

          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
          Show
          Hudson added a comment - Integrated in HBase-0.92-security #93 (See https://builds.apache.org/job/HBase-0.92-security/93/ ) HBASE-5209 Addendum adds znode creation call (David Wang) (Revision 1292068) Result = FAILURE tedyu : Files : /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.92 #296 (See https://builds.apache.org/job/HBase-0.92/296/)
          HBASE-5209 Addendum adds znode creation call (David Wang) (Revision 1292068)

          Result = FAILURE
          tedyu :
          Files :

          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
          Show
          Hudson added a comment - Integrated in HBase-0.92 #296 (See https://builds.apache.org/job/HBase-0.92/296/ ) HBASE-5209 Addendum adds znode creation call (David Wang) (Revision 1292068) Result = FAILURE tedyu : Files : /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
          Hide
          Ted Yu added a comment -

          Addendum integrated to 0.92 branch.

          Thanks for the reminder Adrian and David.

          Show
          Ted Yu added a comment - Addendum integrated to 0.92 branch. Thanks for the reminder Adrian and David.
          Hide
          David S. Wang added a comment -

          Thanks Ted. Looks good to me. This is for 0.92 only I'm assuming, as trunk already has this.

          Show
          David S. Wang added a comment - Thanks Ted. Looks good to me. This is for 0.92 only I'm assuming, as trunk already has this.
          Hide
          Ted Yu added a comment -

          This is the addendum I am going to apply.

          Show
          Ted Yu added a comment - This is the addendum I am going to apply.
          Hide
          Hudson added a comment -

          Integrated in HBase-0.92-security #92 (See https://builds.apache.org/job/HBase-0.92-security/92/)
          HBASE-5209 HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup – FIX BUILD ON 0.92; ADDENDUM (Revision 1291592)

          Result = FAILURE
          stack :
          Files :

          • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
          Show
          Hudson added a comment - Integrated in HBase-0.92-security #92 (See https://builds.apache.org/job/HBase-0.92-security/92/ ) HBASE-5209 HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup – FIX BUILD ON 0.92; ADDENDUM (Revision 1291592) Result = FAILURE stack : Files : /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
          Hide
          David S. Wang added a comment -

          Yes, the commit for 0.92 did not include the:

          ZKUtil.createAndFailSilent(this, backupMasterAddressesZNode);

          line from my patch. Stack, can you please add this line into 0.92? That should fix the current test error on 0.92 as well. Thanks.

          Show
          David S. Wang added a comment - Yes, the commit for 0.92 did not include the: ZKUtil.createAndFailSilent(this, backupMasterAddressesZNode); line from my patch. Stack, can you please add this line into 0.92? That should fix the current test error on 0.92 as well. Thanks.
          Hide
          Adrian Muraru added a comment -

          Hi Stack,

          Just tested this patch in 0.92 branch and it seems that /hbase/backup-masters node is not created in ZooKeeperWatcher#createBaseZNodes()
          although is included in the patch above.
          ZKUtil.createAndFailSilent(this, backupMasterAddressesZNode);

          Integration issue?

          Show
          Adrian Muraru added a comment - Hi Stack, Just tested this patch in 0.92 branch and it seems that /hbase/backup-masters node is not created in ZooKeeperWatcher#createBaseZNodes() although is included in the patch above. ZKUtil.createAndFailSilent(this, backupMasterAddressesZNode); Integration issue?
          Hide
          Hudson added a comment -

          Integrated in HBase-0.92 #294 (See https://builds.apache.org/job/HBase-0.92/294/)
          HBASE-5209 HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup – FIX BUILD ON 0.92; ADDENDUM (Revision 1291592)

          Result = FAILURE
          stack :
          Files :

          • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
          Show
          Hudson added a comment - Integrated in HBase-0.92 #294 (See https://builds.apache.org/job/HBase-0.92/294/ ) HBASE-5209 HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup – FIX BUILD ON 0.92; ADDENDUM (Revision 1291592) Result = FAILURE stack : Files : /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.92-security #91 (See https://builds.apache.org/job/HBase-0.92-security/91/)
          HBASE-5209 HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup (Revision 1290943)

          Result = FAILURE
          stack :
          Files :

          • /hbase/branches/0.92/CHANGES.txt
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
          • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java
          • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
          Show
          Hudson added a comment - Integrated in HBase-0.92-security #91 (See https://builds.apache.org/job/HBase-0.92-security/91/ ) HBASE-5209 HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup (Revision 1290943) Result = FAILURE stack : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/HMaster.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-security #116 (See https://builds.apache.org/job/HBase-TRUNK-security/116/)
          HBASE-5209 HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup (Revision 1290942)

          Result = FAILURE
          stack :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK-security #116 (See https://builds.apache.org/job/HBase-TRUNK-security/116/ ) HBASE-5209 HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup (Revision 1290942) Result = FAILURE stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.92 #293 (See https://builds.apache.org/job/HBase-0.92/293/)
          HBASE-5209 HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup (Revision 1290943)

          Result = FAILURE
          stack :
          Files :

          • /hbase/branches/0.92/CHANGES.txt
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
          • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java
          • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
          Show
          Hudson added a comment - Integrated in HBase-0.92 #293 (See https://builds.apache.org/job/HBase-0.92/293/ ) HBASE-5209 HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup (Revision 1290943) Result = FAILURE stack : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/HMaster.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
          Hide
          stack added a comment -

          Committed to 0.92 branch and to trunk. Thanks for nice facility David.

          Show
          stack added a comment - Committed to 0.92 branch and to trunk. Thanks for nice facility David.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12515116/HBASE_5209_v5.diff
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -136 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 159 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.mapreduce.TestImportTsv
          org.apache.hadoop.hbase.mapred.TestTableMapReduce
          org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/990//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/990//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/990//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515116/HBASE_5209_v5.diff against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 159 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/990//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/990//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/990//console This message is automatically generated.
          Hide
          David S. Wang added a comment -

          Attached patch with correct VERSION.

          Show
          David S. Wang added a comment - Attached patch with correct VERSION.
          Hide
          David S. Wang added a comment -

          Patch with right VERSION.

          Show
          David S. Wang added a comment - Patch with right VERSION.
          Hide
          David S. Wang added a comment -

          Attached patch with wrong VERSION ... resending patch with correct VERSION now.

          Show
          David S. Wang added a comment - Attached patch with wrong VERSION ... resending patch with correct VERSION now.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12515114/HBASE_5209_v4.diff
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -136 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 159 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks
          org.apache.hadoop.hbase.mapreduce.TestImportTsv
          org.apache.hadoop.hbase.mapred.TestTableMapReduce
          org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/989//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/989//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/989//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515114/HBASE_5209_v4.diff against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 159 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/989//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/989//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/989//console This message is automatically generated.
          Hide
          David S. Wang added a comment -

          Latest patch submission for Hadoop QA robot, as Stack requested.

          Show
          David S. Wang added a comment - Latest patch submission for Hadoop QA robot, as Stack requested.
          Hide
          David S. Wang added a comment -

          Latest patch submission for Hadoop QA robot, as Stack requested.

          Show
          David S. Wang added a comment - Latest patch submission for Hadoop QA robot, as Stack requested.
          Hide
          stack added a comment -

          My fault. I was reading HBASE-5209-v1.diff. Pardon me. Thanks for testing. Old client against new cluster is what needs to work (new client against old server is YMMD).

          Patch looks good to me. Mind attaching it here so we can run it through hadoopqa to make sure it doesn't have side effects? Then I'll commit. Thanks David.

          Show
          stack added a comment - My fault. I was reading HBASE-5209 -v1.diff. Pardon me. Thanks for testing. Old client against new cluster is what needs to work (new client against old server is YMMD). Patch looks good to me. Mind attaching it here so we can run it through hadoopqa to make sure it doesn't have side effects? Then I'll commit. Thanks David.
          Hide
          David S. Wang added a comment -

          Stack,

          Thanks for the review.

          I am a bit confused about your previous comment though: I do not update HMasterInterface in my latest patch, nor do I change anything related to isActiveMaster. The only version that I update is VERSION for ClusterStatus, and I already add the new fields to the end with my current patch. I think perhaps that is what you are referring to.

          I did two tests with "hbase hbck -details" to test ClusterStatus:

          1. I tested old client (top of trunk 0.92) with new server (0.92 with my patch but without bumping VERSION), and things worked fine.

          2. I tested new client (0.92 with my patch but without bumping VERSION), with old server (top of trunk 0.92), and got the following error. I'm thinking because the new client expects the new fields I added that the old server never sends. Is this OK behavior?

          INFO zookeeper.ClientCnxn: Session establishment complete on server haus02.sf.cloudera.com/172.29.5.33:30181, sessionid = 0x1358ef9f91b000b, negotiated timeout = 5000

          [... pauses for some time ...]

          12/02/17 15:51:46 ERROR io.HbaseObjectWritable: Error in readFields java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.29.5.33:41223 remote=haus04.sf.cloudera.com/172.29.5.35:31000]
          at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
          at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
          at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
          at java.io.FilterInputStream.read(FilterInputStream.java:116)
          at org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:311)
          at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
          at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
          at java.io.DataInputStream.readByte(DataInputStream.java:248)
          at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:299)
          at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:320)
          at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:146)
          at org.apache.hadoop.hbase.ClusterStatus.readFields(ClusterStatus.java:334)
          at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:647)
          at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:311)
          at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:593)
          at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:50

          Show
          David S. Wang added a comment - Stack, Thanks for the review. I am a bit confused about your previous comment though: I do not update HMasterInterface in my latest patch, nor do I change anything related to isActiveMaster. The only version that I update is VERSION for ClusterStatus, and I already add the new fields to the end with my current patch. I think perhaps that is what you are referring to. I did two tests with "hbase hbck -details" to test ClusterStatus: 1. I tested old client (top of trunk 0.92) with new server (0.92 with my patch but without bumping VERSION), and things worked fine. 2. I tested new client (0.92 with my patch but without bumping VERSION), with old server (top of trunk 0.92), and got the following error. I'm thinking because the new client expects the new fields I added that the old server never sends. Is this OK behavior? INFO zookeeper.ClientCnxn: Session establishment complete on server haus02.sf.cloudera.com/172.29.5.33:30181, sessionid = 0x1358ef9f91b000b, negotiated timeout = 5000 [... pauses for some time ...] 12/02/17 15:51:46 ERROR io.HbaseObjectWritable: Error in readFields java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel [connected local=/172.29.5.33:41223 remote=haus04.sf.cloudera.com/172.29.5.35:31000] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) at java.io.FilterInputStream.read(FilterInputStream.java:116) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:311) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at java.io.DataInputStream.readByte(DataInputStream.java:248) at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:299) at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:320) at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:146) at org.apache.hadoop.hbase.ClusterStatus.readFields(ClusterStatus.java:334) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:647) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:311) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:593) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:50
          Hide
          stack added a comment -

          Patch looks excellent.

          One issue is upping of the HMasterInterface version. Its the 'right' thing to do but then it means I can't apply to 0.92.1 and it breaks a 0.92 talking to a 0.94 which currently is possible. Can you try adding the isActiveMaster to the end of the Interface and NOT update the version. See if you can connect to a 0.92.1 server from a 0.92.0 client and see if it you can do basic HMasterInterface operations such as isLoadBalancer running.

          Show
          stack added a comment - Patch looks excellent. One issue is upping of the HMasterInterface version. Its the 'right' thing to do but then it means I can't apply to 0.92.1 and it breaks a 0.92 talking to a 0.94 which currently is possible. Can you try adding the isActiveMaster to the end of the Interface and NOT update the version. See if you can connect to a 0.92.1 server from a 0.92.0 client and see if it you can do basic HMasterInterface operations such as isLoadBalancer running.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3892/
          -----------------------------------------------------------

          (Updated 2012-02-16 06:30:31.996701)

          Review request for hbase.

          Changes
          -------

          Addressed Stack's comments:

          Bumped version and compatibility handling for new client - old server. Changed Set to List for backupMasters in HMaster.

          Note that old clients get an error when running against new servers, because the old readFields() code in ClusterStatus does not handle exceptions of any kind. This is not solvable, at least in the scope of this change.

          12/02/15 15:15:38 INFO zookeeper.ClientCnxn: Session establishment complete on server haus02.sf.cloudera.com/172.29.5.33:30181, sessionid = 0x135834c75e20008, negotiated timeout = 5000
          12/02/15 15:15:39 ERROR io.HbaseObjectWritable: Error in readFields
          A record version mismatch occured. Expecting v2, found v3
          at org.apache.hadoop.io.VersionedWritable.readFields(VersionedWritable.java:46)
          at org.apache.hadoop.hbase.ClusterStatus.readFields(ClusterStatus.java:247)
          at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:583)
          at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297)

          Summary
          -------

          Problem:
          There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters.

          Solution:
          Augment ClusterStatus to return the currently active master and the list of backup masters.

          Notes:

          • I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it.
          • I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes.

          This addresses bug HBASE-5209.
          https://issues.apache.org/jira/browse/HBASE-5209

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/ClusterStatus.java b849429
          src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 2f60b23
          src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9d21903
          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java f6f3f71
          src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 111f76e
          src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 3e3d131
          src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java 16e4744
          src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java bc98fb0

          Diff: https://reviews.apache.org/r/3892/diff

          Testing (updated)
          -------

          • Ran mvn -P localTests test multiple times - no new tests fail
          • Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures
          • Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures
          • Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master)
          • Did the following before and after killing
          • hbase hbck -details - checked output to see that active and backup masters are reported properly
          • zk_dump - checked that active and backup masters are reported properly
          • Started cluster with no backup masters to make sure change operates correctly that way
          • Tested build with this diff vs. build without this diff, in all combinations of client and server
          • Verified that new client can run against old servers without incident and with the defaults applied.
          • Note that old clients get an error when running against new servers, because the old readFields() code in ClusterStatus does not handle exceptions of any kind. This is not solvable, at least in the scope of this change.

          12/02/15 15:15:38 INFO zookeeper.ClientCnxn: Session establishment complete on server haus02.sf.cloudera.com/172.29.5.33:30181, sessionid = 0x135834c75e20008, negotiated timeout = 5000
          12/02/15 15:15:39 ERROR io.HbaseObjectWritable: Error in readFields
          A record version mismatch occured. Expecting v2, found v3
          at org.apache.hadoop.io.VersionedWritable.readFields(VersionedWritable.java:46)
          at org.apache.hadoop.hbase.ClusterStatus.readFields(ClusterStatus.java:247)
          at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:583)
          at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297)

          • Ran dev-support/test-patch.sh - no new issues fail:

          -1 overall.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -136 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Thanks,

          David

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3892/ ----------------------------------------------------------- (Updated 2012-02-16 06:30:31.996701) Review request for hbase. Changes ------- Addressed Stack's comments: Bumped version and compatibility handling for new client - old server. Changed Set to List for backupMasters in HMaster. Note that old clients get an error when running against new servers, because the old readFields() code in ClusterStatus does not handle exceptions of any kind. This is not solvable, at least in the scope of this change. 12/02/15 15:15:38 INFO zookeeper.ClientCnxn: Session establishment complete on server haus02.sf.cloudera.com/172.29.5.33:30181, sessionid = 0x135834c75e20008, negotiated timeout = 5000 12/02/15 15:15:39 ERROR io.HbaseObjectWritable: Error in readFields A record version mismatch occured. Expecting v2, found v3 at org.apache.hadoop.io.VersionedWritable.readFields(VersionedWritable.java:46) at org.apache.hadoop.hbase.ClusterStatus.readFields(ClusterStatus.java:247) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:583) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297) Summary ------- Problem: There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters. Solution: Augment ClusterStatus to return the currently active master and the list of backup masters. Notes: I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it. I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes. This addresses bug HBASE-5209 . https://issues.apache.org/jira/browse/HBASE-5209 Diffs (updated) src/main/java/org/apache/hadoop/hbase/ClusterStatus.java b849429 src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 2f60b23 src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9d21903 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java f6f3f71 src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 111f76e src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 3e3d131 src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java 16e4744 src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java bc98fb0 Diff: https://reviews.apache.org/r/3892/diff Testing (updated) ------- Ran mvn -P localTests test multiple times - no new tests fail Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master) Did the following before and after killing hbase hbck -details - checked output to see that active and backup masters are reported properly zk_dump - checked that active and backup masters are reported properly Started cluster with no backup masters to make sure change operates correctly that way Tested build with this diff vs. build without this diff, in all combinations of client and server Verified that new client can run against old servers without incident and with the defaults applied. Note that old clients get an error when running against new servers, because the old readFields() code in ClusterStatus does not handle exceptions of any kind. This is not solvable, at least in the scope of this change. 12/02/15 15:15:38 INFO zookeeper.ClientCnxn: Session establishment complete on server haus02.sf.cloudera.com/172.29.5.33:30181, sessionid = 0x135834c75e20008, negotiated timeout = 5000 12/02/15 15:15:39 ERROR io.HbaseObjectWritable: Error in readFields A record version mismatch occured. Expecting v2, found v3 at org.apache.hadoop.io.VersionedWritable.readFields(VersionedWritable.java:46) at org.apache.hadoop.hbase.ClusterStatus.readFields(ClusterStatus.java:247) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:583) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297) Ran dev-support/test-patch.sh - no new issues fail: -1 overall. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. Thanks, David
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-02-14 19:01:02, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/ClusterStatus.java, line 278

          > <https://reviews.apache.org/r/3892/diff/2/?file=74894#file74894line278>

          >

          > Isn't this Writable versioned? Should you up the version since you've added fields and check the version when deserializing (if you get a previous version, don't try and derserialize master and backupmasters)

          David Wang wrote:

          I took the lead from your commit at SHA b8b142a9e10aa729e0c0d4560035b1bdc4a0bd5c, where you bumped VERSION, and I didn't notice any special handling about checking for previous versions. I can easily add that if necessary, though since I am adding to the end of the Writable, do I need to?

          Michael Stack wrote:

          Yeah, that commit should have upped the version too... (I think this is the commit that added it):

          commit 65e23155db94d4b3f36c35e90089f5827dea549b

          Author: Andrew Kyle Purtell <apurtell@apache.org>

          Date: Mon Oct 24 23:18:04 2011 +0000

          HBASE-4070 Improve region server metrics to report loaded coprocessors to master

          git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1188442 13f79535-47bb-0310-9956-ffa450edef68

          It's too late for that previous commit since the version number wasn't bumped for it, but I will add something for my commit. I'll post a new diff soon.

          • David

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3892/#review5088
          -----------------------------------------------------------

          On 2012-02-14 18:50:54, David Wang wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/3892/

          -----------------------------------------------------------

          (Updated 2012-02-14 18:50:54)

          Review request for hbase.

          Summary

          -------

          Problem:

          There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters.

          Solution:

          Augment ClusterStatus to return the currently active master and the list of backup masters.

          Notes:

          * I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it.

          * I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes.

          This addresses bug HBASE-5209.

          https://issues.apache.org/jira/browse/HBASE-5209

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ClusterStatus.java b849429

          src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 2f60b23

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9d21903

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java f6f3f71

          src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 111f76e

          src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 3e3d131

          src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java 16e4744

          src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java bc98fb0

          Diff: https://reviews.apache.org/r/3892/diff

          Testing

          -------

          * Ran mvn -P localTests test multiple times - no new tests fail

          * Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures

          * Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures

          * Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master)

          * Did the following before and after killing

          * hbase fsck - checked output to see that active and backup masters are reported properly

          * zk_dump - checked that active and backup masters are reported properly

          * Started cluster with no backup masters to make sure change operates correctly that way

          * Ran dev-support/test-patch.sh - no new issues fail:

          -1 overall.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -136 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Thanks,

          David

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-02-14 19:01:02, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/ClusterStatus.java, line 278 > < https://reviews.apache.org/r/3892/diff/2/?file=74894#file74894line278 > > > Isn't this Writable versioned? Should you up the version since you've added fields and check the version when deserializing (if you get a previous version, don't try and derserialize master and backupmasters) David Wang wrote: I took the lead from your commit at SHA b8b142a9e10aa729e0c0d4560035b1bdc4a0bd5c, where you bumped VERSION, and I didn't notice any special handling about checking for previous versions. I can easily add that if necessary, though since I am adding to the end of the Writable, do I need to? Michael Stack wrote: Yeah, that commit should have upped the version too... (I think this is the commit that added it): commit 65e23155db94d4b3f36c35e90089f5827dea549b Author: Andrew Kyle Purtell <apurtell@apache.org> Date: Mon Oct 24 23:18:04 2011 +0000 HBASE-4070 Improve region server metrics to report loaded coprocessors to master git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1188442 13f79535-47bb-0310-9956-ffa450edef68 It's too late for that previous commit since the version number wasn't bumped for it, but I will add something for my commit. I'll post a new diff soon. David ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3892/#review5088 ----------------------------------------------------------- On 2012-02-14 18:50:54, David Wang wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3892/ ----------------------------------------------------------- (Updated 2012-02-14 18:50:54) Review request for hbase. Summary ------- Problem: There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters. Solution: Augment ClusterStatus to return the currently active master and the list of backup masters. Notes: * I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it. * I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes. This addresses bug HBASE-5209 . https://issues.apache.org/jira/browse/HBASE-5209 Diffs ----- src/main/java/org/apache/hadoop/hbase/ClusterStatus.java b849429 src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 2f60b23 src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9d21903 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java f6f3f71 src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 111f76e src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 3e3d131 src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java 16e4744 src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java bc98fb0 Diff: https://reviews.apache.org/r/3892/diff Testing ------- * Ran mvn -P localTests test multiple times - no new tests fail * Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures * Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures * Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master) * Did the following before and after killing * hbase fsck - checked output to see that active and backup masters are reported properly * zk_dump - checked that active and backup masters are reported properly * Started cluster with no backup masters to make sure change operates correctly that way * Ran dev-support/test-patch.sh - no new issues fail: -1 overall. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. Thanks, David
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-02-14 19:01:02, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/ClusterStatus.java, line 278

          > <https://reviews.apache.org/r/3892/diff/2/?file=74894#file74894line278>

          >

          > Isn't this Writable versioned? Should you up the version since you've added fields and check the version when deserializing (if you get a previous version, don't try and derserialize master and backupmasters)

          David Wang wrote:

          I took the lead from your commit at SHA b8b142a9e10aa729e0c0d4560035b1bdc4a0bd5c, where you bumped VERSION, and I didn't notice any special handling about checking for previous versions. I can easily add that if necessary, though since I am adding to the end of the Writable, do I need to?

          Yeah, that commit should have upped the version too... (I think this is the commit that added it):

          commit 65e23155db94d4b3f36c35e90089f5827dea549b
          Author: Andrew Kyle Purtell <apurtell@apache.org>
          Date: Mon Oct 24 23:18:04 2011 +0000

          HBASE-4070 Improve region server metrics to report loaded coprocessors to master

          git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1188442 13f79535-47bb-0310-9956-ffa450edef68

          On 2012-02-14 19:01:02, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java, line 182

          > <https://reviews.apache.org/r/3892/diff/2/?file=74895#file74895line182>

          >

          > You don't think this setStatus needed anymore?

          David Wang wrote:

          Actually, this is still being called. I just pulled out the setStatus() and Log.info() calls since they apply to all three of the cases in this block. Now the three cases in this block are just setting msg.

          Ok. Good.

          On 2012-02-14 19:01:02, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 1393

          > <https://reviews.apache.org/r/3892/diff/2/?file=74896#file74896line1393>

          >

          > List is a Collection. Why convert to a Set? Is it needed?

          >

          > If you have to do the convertion, use http://docs.oracle.com/javase/1.5.0/docs/api/java/util/Collections.html#addAll(java.util.Collection, T...) instead?

          David Wang wrote:

          I used Set instead of List because I figured backupMasters is very similar in concept to deadServers, which is an instance of DeadServer that uses Set.

          Also, List is ordered while Set is not. And I don't think backupMasters or deadServers are meant to be ordered.

          Not that important I'd say. Just do List I'd say.

          • Michael

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3892/#review5088
          -----------------------------------------------------------

          On 2012-02-14 18:50:54, David Wang wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/3892/

          -----------------------------------------------------------

          (Updated 2012-02-14 18:50:54)

          Review request for hbase.

          Summary

          -------

          Problem:

          There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters.

          Solution:

          Augment ClusterStatus to return the currently active master and the list of backup masters.

          Notes:

          * I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it.

          * I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes.

          This addresses bug HBASE-5209.

          https://issues.apache.org/jira/browse/HBASE-5209

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ClusterStatus.java b849429

          src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 2f60b23

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9d21903

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java f6f3f71

          src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 111f76e

          src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 3e3d131

          src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java 16e4744

          src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java bc98fb0

          Diff: https://reviews.apache.org/r/3892/diff

          Testing

          -------

          * Ran mvn -P localTests test multiple times - no new tests fail

          * Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures

          * Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures

          * Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master)

          * Did the following before and after killing

          * hbase fsck - checked output to see that active and backup masters are reported properly

          * zk_dump - checked that active and backup masters are reported properly

          * Started cluster with no backup masters to make sure change operates correctly that way

          * Ran dev-support/test-patch.sh - no new issues fail:

          -1 overall.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -136 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Thanks,

          David

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-02-14 19:01:02, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/ClusterStatus.java, line 278 > < https://reviews.apache.org/r/3892/diff/2/?file=74894#file74894line278 > > > Isn't this Writable versioned? Should you up the version since you've added fields and check the version when deserializing (if you get a previous version, don't try and derserialize master and backupmasters) David Wang wrote: I took the lead from your commit at SHA b8b142a9e10aa729e0c0d4560035b1bdc4a0bd5c, where you bumped VERSION, and I didn't notice any special handling about checking for previous versions. I can easily add that if necessary, though since I am adding to the end of the Writable, do I need to? Yeah, that commit should have upped the version too... (I think this is the commit that added it): commit 65e23155db94d4b3f36c35e90089f5827dea549b Author: Andrew Kyle Purtell <apurtell@apache.org> Date: Mon Oct 24 23:18:04 2011 +0000 HBASE-4070 Improve region server metrics to report loaded coprocessors to master git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1188442 13f79535-47bb-0310-9956-ffa450edef68 On 2012-02-14 19:01:02, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java, line 182 > < https://reviews.apache.org/r/3892/diff/2/?file=74895#file74895line182 > > > You don't think this setStatus needed anymore? David Wang wrote: Actually, this is still being called. I just pulled out the setStatus() and Log.info() calls since they apply to all three of the cases in this block. Now the three cases in this block are just setting msg. Ok. Good. On 2012-02-14 19:01:02, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 1393 > < https://reviews.apache.org/r/3892/diff/2/?file=74896#file74896line1393 > > > List is a Collection. Why convert to a Set? Is it needed? > > If you have to do the convertion, use http://docs.oracle.com/javase/1.5.0/docs/api/java/util/Collections.html#addAll(java.util.Collection , T...) instead? David Wang wrote: I used Set instead of List because I figured backupMasters is very similar in concept to deadServers, which is an instance of DeadServer that uses Set. Also, List is ordered while Set is not. And I don't think backupMasters or deadServers are meant to be ordered. Not that important I'd say. Just do List I'd say. Michael ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3892/#review5088 ----------------------------------------------------------- On 2012-02-14 18:50:54, David Wang wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3892/ ----------------------------------------------------------- (Updated 2012-02-14 18:50:54) Review request for hbase. Summary ------- Problem: There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters. Solution: Augment ClusterStatus to return the currently active master and the list of backup masters. Notes: * I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it. * I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes. This addresses bug HBASE-5209 . https://issues.apache.org/jira/browse/HBASE-5209 Diffs ----- src/main/java/org/apache/hadoop/hbase/ClusterStatus.java b849429 src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 2f60b23 src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9d21903 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java f6f3f71 src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 111f76e src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 3e3d131 src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java 16e4744 src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java bc98fb0 Diff: https://reviews.apache.org/r/3892/diff Testing ------- * Ran mvn -P localTests test multiple times - no new tests fail * Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures * Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures * Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master) * Did the following before and after killing * hbase fsck - checked output to see that active and backup masters are reported properly * zk_dump - checked that active and backup masters are reported properly * Started cluster with no backup masters to make sure change operates correctly that way * Ran dev-support/test-patch.sh - no new issues fail: -1 overall. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. Thanks, David
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-02-14 19:01:02, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java, line 182

          > <https://reviews.apache.org/r/3892/diff/2/?file=74895#file74895line182>

          >

          > You don't think this setStatus needed anymore?

          Actually, this is still being called. I just pulled out the setStatus() and Log.info() calls since they apply to all three of the cases in this block. Now the three cases in this block are just setting msg.

          On 2012-02-14 19:01:02, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 1393

          > <https://reviews.apache.org/r/3892/diff/2/?file=74896#file74896line1393>

          >

          > List is a Collection. Why convert to a Set? Is it needed?

          >

          > If you have to do the convertion, use http://docs.oracle.com/javase/1.5.0/docs/api/java/util/Collections.html#addAll(java.util.Collection, T...) instead?

          I used Set instead of List because I figured backupMasters is very similar in concept to deadServers, which is an instance of DeadServer that uses Set.

          Also, List is ordered while Set is not. And I don't think backupMasters or deadServers are meant to be ordered.

          On 2012-02-14 19:01:02, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/ClusterStatus.java, line 278

          > <https://reviews.apache.org/r/3892/diff/2/?file=74894#file74894line278>

          >

          > Isn't this Writable versioned? Should you up the version since you've added fields and check the version when deserializing (if you get a previous version, don't try and derserialize master and backupmasters)

          I took the lead from your commit at SHA b8b142a9e10aa729e0c0d4560035b1bdc4a0bd5c, where you bumped VERSION, and I didn't notice any special handling about checking for previous versions. I can easily add that if necessary, though since I am adding to the end of the Writable, do I need to?

          • David

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3892/#review5088
          -----------------------------------------------------------

          On 2012-02-14 18:50:54, David Wang wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/3892/

          -----------------------------------------------------------

          (Updated 2012-02-14 18:50:54)

          Review request for hbase.

          Summary

          -------

          Problem:

          There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters.

          Solution:

          Augment ClusterStatus to return the currently active master and the list of backup masters.

          Notes:

          * I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it.

          * I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes.

          This addresses bug HBASE-5209.

          https://issues.apache.org/jira/browse/HBASE-5209

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ClusterStatus.java b849429

          src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 2f60b23

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9d21903

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java f6f3f71

          src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 111f76e

          src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 3e3d131

          src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java 16e4744

          src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java bc98fb0

          Diff: https://reviews.apache.org/r/3892/diff

          Testing

          -------

          * Ran mvn -P localTests test multiple times - no new tests fail

          * Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures

          * Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures

          * Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master)

          * Did the following before and after killing

          * hbase fsck - checked output to see that active and backup masters are reported properly

          * zk_dump - checked that active and backup masters are reported properly

          * Started cluster with no backup masters to make sure change operates correctly that way

          * Ran dev-support/test-patch.sh - no new issues fail:

          -1 overall.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -136 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Thanks,

          David

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-02-14 19:01:02, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java, line 182 > < https://reviews.apache.org/r/3892/diff/2/?file=74895#file74895line182 > > > You don't think this setStatus needed anymore? Actually, this is still being called. I just pulled out the setStatus() and Log.info() calls since they apply to all three of the cases in this block. Now the three cases in this block are just setting msg. On 2012-02-14 19:01:02, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 1393 > < https://reviews.apache.org/r/3892/diff/2/?file=74896#file74896line1393 > > > List is a Collection. Why convert to a Set? Is it needed? > > If you have to do the convertion, use http://docs.oracle.com/javase/1.5.0/docs/api/java/util/Collections.html#addAll(java.util.Collection , T...) instead? I used Set instead of List because I figured backupMasters is very similar in concept to deadServers, which is an instance of DeadServer that uses Set. Also, List is ordered while Set is not. And I don't think backupMasters or deadServers are meant to be ordered. On 2012-02-14 19:01:02, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/ClusterStatus.java, line 278 > < https://reviews.apache.org/r/3892/diff/2/?file=74894#file74894line278 > > > Isn't this Writable versioned? Should you up the version since you've added fields and check the version when deserializing (if you get a previous version, don't try and derserialize master and backupmasters) I took the lead from your commit at SHA b8b142a9e10aa729e0c0d4560035b1bdc4a0bd5c, where you bumped VERSION, and I didn't notice any special handling about checking for previous versions. I can easily add that if necessary, though since I am adding to the end of the Writable, do I need to? David ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3892/#review5088 ----------------------------------------------------------- On 2012-02-14 18:50:54, David Wang wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3892/ ----------------------------------------------------------- (Updated 2012-02-14 18:50:54) Review request for hbase. Summary ------- Problem: There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters. Solution: Augment ClusterStatus to return the currently active master and the list of backup masters. Notes: * I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it. * I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes. This addresses bug HBASE-5209 . https://issues.apache.org/jira/browse/HBASE-5209 Diffs ----- src/main/java/org/apache/hadoop/hbase/ClusterStatus.java b849429 src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 2f60b23 src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9d21903 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java f6f3f71 src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 111f76e src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 3e3d131 src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java 16e4744 src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java bc98fb0 Diff: https://reviews.apache.org/r/3892/diff Testing ------- * Ran mvn -P localTests test multiple times - no new tests fail * Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures * Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures * Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master) * Did the following before and after killing * hbase fsck - checked output to see that active and backup masters are reported properly * zk_dump - checked that active and backup masters are reported properly * Started cluster with no backup masters to make sure change operates correctly that way * Ran dev-support/test-patch.sh - no new issues fail: -1 overall. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. Thanks, David
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3892/#review5088
          -----------------------------------------------------------

          Patch is looking really good. Thanks for the help.

          src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
          <https://reviews.apache.org/r/3892/#comment11159>

          Isn't this Writable versioned? Should you up the version since you've added fields and check the version when deserializing (if you get a previous version, don't try and derserialize master and backupmasters)

          src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java
          <https://reviews.apache.org/r/3892/#comment11160>

          You don't think this setStatus needed anymore?

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          <https://reviews.apache.org/r/3892/#comment11161>

          List is a Collection. Why convert to a Set? Is it needed?

          If you have to do the convertion, use http://docs.oracle.com/javase/1.5.0/docs/api/java/util/Collections.html#addAll(java.util.Collection, T...) instead?

          src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java
          <https://reviews.apache.org/r/3892/#comment11162>

          This is great

          • Michael

          On 2012-02-14 18:50:54, David Wang wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/3892/

          -----------------------------------------------------------

          (Updated 2012-02-14 18:50:54)

          Review request for hbase.

          Summary

          -------

          Problem:

          There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters.

          Solution:

          Augment ClusterStatus to return the currently active master and the list of backup masters.

          Notes:

          * I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it.

          * I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes.

          This addresses bug HBASE-5209.

          https://issues.apache.org/jira/browse/HBASE-5209

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ClusterStatus.java b849429

          src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 2f60b23

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9d21903

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java f6f3f71

          src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 111f76e

          src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 3e3d131

          src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java 16e4744

          src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java bc98fb0

          Diff: https://reviews.apache.org/r/3892/diff

          Testing

          -------

          * Ran mvn -P localTests test multiple times - no new tests fail

          * Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures

          * Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures

          * Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master)

          * Did the following before and after killing

          * hbase fsck - checked output to see that active and backup masters are reported properly

          * zk_dump - checked that active and backup masters are reported properly

          * Started cluster with no backup masters to make sure change operates correctly that way

          * Ran dev-support/test-patch.sh - no new issues fail:

          -1 overall.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -136 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Thanks,

          David

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3892/#review5088 ----------------------------------------------------------- Patch is looking really good. Thanks for the help. src/main/java/org/apache/hadoop/hbase/ClusterStatus.java < https://reviews.apache.org/r/3892/#comment11159 > Isn't this Writable versioned? Should you up the version since you've added fields and check the version when deserializing (if you get a previous version, don't try and derserialize master and backupmasters) src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java < https://reviews.apache.org/r/3892/#comment11160 > You don't think this setStatus needed anymore? src/main/java/org/apache/hadoop/hbase/master/HMaster.java < https://reviews.apache.org/r/3892/#comment11161 > List is a Collection. Why convert to a Set? Is it needed? If you have to do the convertion, use http://docs.oracle.com/javase/1.5.0/docs/api/java/util/Collections.html#addAll(java.util.Collection , T...) instead? src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java < https://reviews.apache.org/r/3892/#comment11162 > This is great Michael On 2012-02-14 18:50:54, David Wang wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3892/ ----------------------------------------------------------- (Updated 2012-02-14 18:50:54) Review request for hbase. Summary ------- Problem: There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters. Solution: Augment ClusterStatus to return the currently active master and the list of backup masters. Notes: * I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it. * I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes. This addresses bug HBASE-5209 . https://issues.apache.org/jira/browse/HBASE-5209 Diffs ----- src/main/java/org/apache/hadoop/hbase/ClusterStatus.java b849429 src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 2f60b23 src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9d21903 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java f6f3f71 src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 111f76e src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 3e3d131 src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java 16e4744 src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java bc98fb0 Diff: https://reviews.apache.org/r/3892/diff Testing ------- * Ran mvn -P localTests test multiple times - no new tests fail * Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures * Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures * Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master) * Did the following before and after killing * hbase fsck - checked output to see that active and backup masters are reported properly * zk_dump - checked that active and backup masters are reported properly * Started cluster with no backup masters to make sure change operates correctly that way * Ran dev-support/test-patch.sh - no new issues fail: -1 overall. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. Thanks, David
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3892/
          -----------------------------------------------------------

          (Updated 2012-02-14 18:50:54.512575)

          Review request for hbase.

          Changes
          -------

          Addressed changes that Ted suggested.

          Summary
          -------

          Problem:
          There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters.

          Solution:
          Augment ClusterStatus to return the currently active master and the list of backup masters.

          Notes:

          • I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it.
          • I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes.

          This addresses bug HBASE-5209.
          https://issues.apache.org/jira/browse/HBASE-5209

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/ClusterStatus.java b849429
          src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 2f60b23
          src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9d21903
          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java f6f3f71
          src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 111f76e
          src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 3e3d131
          src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java 16e4744
          src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java bc98fb0

          Diff: https://reviews.apache.org/r/3892/diff

          Testing
          -------

          • Ran mvn -P localTests test multiple times - no new tests fail
          • Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures
          • Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures
          • Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master)
          • Did the following before and after killing
          • hbase fsck - checked output to see that active and backup masters are reported properly
          • zk_dump - checked that active and backup masters are reported properly
          • Started cluster with no backup masters to make sure change operates correctly that way
          • Ran dev-support/test-patch.sh - no new issues fail:

          -1 overall.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -136 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Thanks,

          David

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3892/ ----------------------------------------------------------- (Updated 2012-02-14 18:50:54.512575) Review request for hbase. Changes ------- Addressed changes that Ted suggested. Summary ------- Problem: There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters. Solution: Augment ClusterStatus to return the currently active master and the list of backup masters. Notes: I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it. I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes. This addresses bug HBASE-5209 . https://issues.apache.org/jira/browse/HBASE-5209 Diffs (updated) src/main/java/org/apache/hadoop/hbase/ClusterStatus.java b849429 src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 2f60b23 src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9d21903 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java f6f3f71 src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 111f76e src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 3e3d131 src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java 16e4744 src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java bc98fb0 Diff: https://reviews.apache.org/r/3892/diff Testing ------- Ran mvn -P localTests test multiple times - no new tests fail Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master) Did the following before and after killing hbase fsck - checked output to see that active and backup masters are reported properly zk_dump - checked that active and backup masters are reported properly Started cluster with no backup masters to make sure change operates correctly that way Ran dev-support/test-patch.sh - no new issues fail: -1 overall. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. Thanks, David
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3892/#review5086
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
          <https://reviews.apache.org/r/3892/#comment11155>

          getBackupMastersSize() should be fine then.

          • Ted

          On 2012-02-14 14:41:55, David Wang wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/3892/

          -----------------------------------------------------------

          (Updated 2012-02-14 14:41:55)

          Review request for hbase.

          Summary

          -------

          Problem:

          There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters.

          Solution:

          Augment ClusterStatus to return the currently active master and the list of backup masters.

          Notes:

          * I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it.

          * I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes.

          This addresses bug HBASE-5209.

          https://issues.apache.org/jira/browse/HBASE-5209

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ClusterStatus.java b849429

          src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 2f60b23

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9d21903

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java f6f3f71

          src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 111f76e

          src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 3e3d131

          src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java 16e4744

          src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java bc98fb0

          Diff: https://reviews.apache.org/r/3892/diff

          Testing

          -------

          * Ran mvn -P localTests test multiple times - no new tests fail

          * Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures

          * Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures

          * Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master)

          * Did the following before and after killing

          * hbase fsck - checked output to see that active and backup masters are reported properly

          * zk_dump - checked that active and backup masters are reported properly

          * Started cluster with no backup masters to make sure change operates correctly that way

          * Ran dev-support/test-patch.sh - no new issues fail:

          -1 overall.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -136 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Thanks,

          David

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3892/#review5086 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/ClusterStatus.java < https://reviews.apache.org/r/3892/#comment11155 > getBackupMastersSize() should be fine then. Ted On 2012-02-14 14:41:55, David Wang wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3892/ ----------------------------------------------------------- (Updated 2012-02-14 14:41:55) Review request for hbase. Summary ------- Problem: There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters. Solution: Augment ClusterStatus to return the currently active master and the list of backup masters. Notes: * I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it. * I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes. This addresses bug HBASE-5209 . https://issues.apache.org/jira/browse/HBASE-5209 Diffs ----- src/main/java/org/apache/hadoop/hbase/ClusterStatus.java b849429 src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 2f60b23 src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9d21903 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java f6f3f71 src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 111f76e src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 3e3d131 src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java 16e4744 src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java bc98fb0 Diff: https://reviews.apache.org/r/3892/diff Testing ------- * Ran mvn -P localTests test multiple times - no new tests fail * Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures * Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures * Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master) * Did the following before and after killing * hbase fsck - checked output to see that active and backup masters are reported properly * zk_dump - checked that active and backup masters are reported properly * Started cluster with no backup masters to make sure change operates correctly that way * Ran dev-support/test-patch.sh - no new issues fail: -1 overall. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. Thanks, David
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-02-14 15:21:15, Ted Yu wrote:

          > This version looks cleaner.

          Thanks, just trying to get the hang of how things work. I now know the process for future patches.

          On 2012-02-14 15:21:15, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/ClusterStatus.java, line 214

          > <https://reviews.apache.org/r/3892/diff/1/?file=74836#file74836line214>

          >

          > GetMaster() should be good enough.

          I was trying to follow the existing naming conventions in the file, and I had to choose between getDeadServerNames() and getServers(). I'll change this to getMaster() as you suggest.

          On 2012-02-14 15:21:15, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/ClusterStatus.java, line 221

          > <https://reviews.apache.org/r/3892/diff/1/?file=74836#file74836line221>

          >

          > How about naming this method getNumberOfBackupMasters() ?

          I was trying to follow the existing naming convention in the file, and I was basing this off of getServersSize(). There is no precedent for "NumberOf" in this file, and I do not want to introduce yet another naming convention into this file. Please advise.

          On 2012-02-14 15:21:15, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/ClusterStatus.java, line 228

          > <https://reviews.apache.org/r/3892/diff/1/?file=74836#file74836line228>

          >

          > getBackupMasters() should be good enough.

          Sounds good.

          On 2012-02-14 15:21:15, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/ClusterStatus.java, line 319

          > <https://reviews.apache.org/r/3892/diff/1/?file=74836#file74836line319>

          >

          > Please wrap long line.

          > Lines are 80 chars or shorter.

          Will do.

          On 2012-02-14 15:21:15, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java, line 94

          > <https://reviews.apache.org/r/3892/diff/1/?file=74841#file74841line94>

          >

          > How about naming this member backupMasterAddressesZNode ?

          OK.

          On 2012-02-14 15:21:15, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java, line 211

          > <https://reviews.apache.org/r/3892/diff/1/?file=74841#file74841line211>

          >

          > Would "zookeeper.znode.backup.masters" be better config name ?

          > Similarly, "backup-masters" for default znode name.

          OK. I'll upload a new diff pending your advice on the "NumberOf" issue. Thanks!

          • David

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3892/#review5083
          -----------------------------------------------------------

          On 2012-02-14 14:41:55, David Wang wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/3892/

          -----------------------------------------------------------

          (Updated 2012-02-14 14:41:55)

          Review request for hbase.

          Summary

          -------

          Problem:

          There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters.

          Solution:

          Augment ClusterStatus to return the currently active master and the list of backup masters.

          Notes:

          * I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it.

          * I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes.

          This addresses bug HBASE-5209.

          https://issues.apache.org/jira/browse/HBASE-5209

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ClusterStatus.java b849429

          src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 2f60b23

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9d21903

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java f6f3f71

          src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 111f76e

          src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 3e3d131

          src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java 16e4744

          src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java bc98fb0

          Diff: https://reviews.apache.org/r/3892/diff

          Testing

          -------

          * Ran mvn -P localTests test multiple times - no new tests fail

          * Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures

          * Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures

          * Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master)

          * Did the following before and after killing

          * hbase fsck - checked output to see that active and backup masters are reported properly

          * zk_dump - checked that active and backup masters are reported properly

          * Started cluster with no backup masters to make sure change operates correctly that way

          * Ran dev-support/test-patch.sh - no new issues fail:

          -1 overall.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -136 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Thanks,

          David

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-02-14 15:21:15, Ted Yu wrote: > This version looks cleaner. Thanks, just trying to get the hang of how things work. I now know the process for future patches. On 2012-02-14 15:21:15, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/ClusterStatus.java, line 214 > < https://reviews.apache.org/r/3892/diff/1/?file=74836#file74836line214 > > > GetMaster() should be good enough. I was trying to follow the existing naming conventions in the file, and I had to choose between getDeadServerNames() and getServers(). I'll change this to getMaster() as you suggest. On 2012-02-14 15:21:15, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/ClusterStatus.java, line 221 > < https://reviews.apache.org/r/3892/diff/1/?file=74836#file74836line221 > > > How about naming this method getNumberOfBackupMasters() ? I was trying to follow the existing naming convention in the file, and I was basing this off of getServersSize(). There is no precedent for "NumberOf" in this file, and I do not want to introduce yet another naming convention into this file. Please advise. On 2012-02-14 15:21:15, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/ClusterStatus.java, line 228 > < https://reviews.apache.org/r/3892/diff/1/?file=74836#file74836line228 > > > getBackupMasters() should be good enough. Sounds good. On 2012-02-14 15:21:15, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/ClusterStatus.java, line 319 > < https://reviews.apache.org/r/3892/diff/1/?file=74836#file74836line319 > > > Please wrap long line. > Lines are 80 chars or shorter. Will do. On 2012-02-14 15:21:15, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java, line 94 > < https://reviews.apache.org/r/3892/diff/1/?file=74841#file74841line94 > > > How about naming this member backupMasterAddressesZNode ? OK. On 2012-02-14 15:21:15, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java, line 211 > < https://reviews.apache.org/r/3892/diff/1/?file=74841#file74841line211 > > > Would "zookeeper.znode.backup.masters" be better config name ? > Similarly, "backup-masters" for default znode name. OK. I'll upload a new diff pending your advice on the "NumberOf" issue. Thanks! David ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3892/#review5083 ----------------------------------------------------------- On 2012-02-14 14:41:55, David Wang wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3892/ ----------------------------------------------------------- (Updated 2012-02-14 14:41:55) Review request for hbase. Summary ------- Problem: There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters. Solution: Augment ClusterStatus to return the currently active master and the list of backup masters. Notes: * I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it. * I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes. This addresses bug HBASE-5209 . https://issues.apache.org/jira/browse/HBASE-5209 Diffs ----- src/main/java/org/apache/hadoop/hbase/ClusterStatus.java b849429 src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 2f60b23 src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9d21903 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java f6f3f71 src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 111f76e src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 3e3d131 src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java 16e4744 src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java bc98fb0 Diff: https://reviews.apache.org/r/3892/diff Testing ------- * Ran mvn -P localTests test multiple times - no new tests fail * Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures * Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures * Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master) * Did the following before and after killing * hbase fsck - checked output to see that active and backup masters are reported properly * zk_dump - checked that active and backup masters are reported properly * Started cluster with no backup masters to make sure change operates correctly that way * Ran dev-support/test-patch.sh - no new issues fail: -1 overall. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. Thanks, David
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3892/#review5083
          -----------------------------------------------------------

          This version looks cleaner.

          src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
          <https://reviews.apache.org/r/3892/#comment11143>

          GetMaster() should be good enough.

          src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
          <https://reviews.apache.org/r/3892/#comment11144>

          How about naming this method getNumberOfBackupMasters() ?

          src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
          <https://reviews.apache.org/r/3892/#comment11145>

          getBackupMasters() should be good enough.

          src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
          <https://reviews.apache.org/r/3892/#comment11146>

          Please wrap long line.
          Lines are 80 chars or shorter.

          src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
          <https://reviews.apache.org/r/3892/#comment11148>

          How about naming this member backupMasterAddressesZNode ?

          src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
          <https://reviews.apache.org/r/3892/#comment11147>

          Would "zookeeper.znode.backup.masters" be better config name ?
          Similarly, "backup-masters" for default znode name.

          • Ted

          On 2012-02-14 14:41:55, David Wang wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/3892/

          -----------------------------------------------------------

          (Updated 2012-02-14 14:41:55)

          Review request for hbase.

          Summary

          -------

          Problem:

          There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters.

          Solution:

          Augment ClusterStatus to return the currently active master and the list of backup masters.

          Notes:

          * I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it.

          * I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes.

          This addresses bug HBASE-5209.

          https://issues.apache.org/jira/browse/HBASE-5209

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ClusterStatus.java b849429

          src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 2f60b23

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9d21903

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java f6f3f71

          src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 111f76e

          src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 3e3d131

          src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java 16e4744

          src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java bc98fb0

          Diff: https://reviews.apache.org/r/3892/diff

          Testing

          -------

          * Ran mvn -P localTests test multiple times - no new tests fail

          * Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures

          * Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures

          * Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master)

          * Did the following before and after killing

          * hbase fsck - checked output to see that active and backup masters are reported properly

          * zk_dump - checked that active and backup masters are reported properly

          * Started cluster with no backup masters to make sure change operates correctly that way

          * Ran dev-support/test-patch.sh - no new issues fail:

          -1 overall.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -136 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Thanks,

          David

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3892/#review5083 ----------------------------------------------------------- This version looks cleaner. src/main/java/org/apache/hadoop/hbase/ClusterStatus.java < https://reviews.apache.org/r/3892/#comment11143 > GetMaster() should be good enough. src/main/java/org/apache/hadoop/hbase/ClusterStatus.java < https://reviews.apache.org/r/3892/#comment11144 > How about naming this method getNumberOfBackupMasters() ? src/main/java/org/apache/hadoop/hbase/ClusterStatus.java < https://reviews.apache.org/r/3892/#comment11145 > getBackupMasters() should be good enough. src/main/java/org/apache/hadoop/hbase/ClusterStatus.java < https://reviews.apache.org/r/3892/#comment11146 > Please wrap long line. Lines are 80 chars or shorter. src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java < https://reviews.apache.org/r/3892/#comment11148 > How about naming this member backupMasterAddressesZNode ? src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java < https://reviews.apache.org/r/3892/#comment11147 > Would "zookeeper.znode.backup.masters" be better config name ? Similarly, "backup-masters" for default znode name. Ted On 2012-02-14 14:41:55, David Wang wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3892/ ----------------------------------------------------------- (Updated 2012-02-14 14:41:55) Review request for hbase. Summary ------- Problem: There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters. Solution: Augment ClusterStatus to return the currently active master and the list of backup masters. Notes: * I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it. * I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes. This addresses bug HBASE-5209 . https://issues.apache.org/jira/browse/HBASE-5209 Diffs ----- src/main/java/org/apache/hadoop/hbase/ClusterStatus.java b849429 src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 2f60b23 src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9d21903 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java f6f3f71 src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 111f76e src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 3e3d131 src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java 16e4744 src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java bc98fb0 Diff: https://reviews.apache.org/r/3892/diff Testing ------- * Ran mvn -P localTests test multiple times - no new tests fail * Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures * Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures * Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master) * Did the following before and after killing * hbase fsck - checked output to see that active and backup masters are reported properly * zk_dump - checked that active and backup masters are reported properly * Started cluster with no backup masters to make sure change operates correctly that way * Ran dev-support/test-patch.sh - no new issues fail: -1 overall. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. Thanks, David
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3892/
          -----------------------------------------------------------

          Review request for hbase.

          Summary
          -------

          Problem:
          There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters.

          Solution:
          Augment ClusterStatus to return the currently active master and the list of backup masters.

          Notes:

          • I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it.
          • I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes.

          This addresses bug HBASE-5209.
          https://issues.apache.org/jira/browse/HBASE-5209

          Diffs


          src/main/java/org/apache/hadoop/hbase/ClusterStatus.java b849429
          src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 2f60b23
          src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9d21903
          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java f6f3f71
          src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 111f76e
          src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 3e3d131
          src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java 16e4744
          src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java bc98fb0

          Diff: https://reviews.apache.org/r/3892/diff

          Testing
          -------

          • Ran mvn -P localTests test multiple times - no new tests fail
          • Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures
          • Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures
          • Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master)
          • Did the following before and after killing
          • hbase fsck - checked output to see that active and backup masters are reported properly
          • zk_dump - checked that active and backup masters are reported properly
          • Started cluster with no backup masters to make sure change operates correctly that way
          • Ran dev-support/test-patch.sh - no new issues fail:

          -1 overall.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 7 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -136 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Thanks,

          David

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3892/ ----------------------------------------------------------- Review request for hbase. Summary ------- Problem: There is no method in the HBase client-facing APIs to determine which of the masters is currently active. This can be especially useful in setups with multiple backup masters. Solution: Augment ClusterStatus to return the currently active master and the list of backup masters. Notes: I uncovered a race condition in ActiveMasterManager, between when it determines that it did not win the original race to be the active master, and when it reads the ServerName of the active master. If the active master goes down in that time, the read to determine the active master's ServerName will fail ungracefully and the candidate master will abort. The solution incorporated in this patch is to check to see if the read of the ServerName succeeded before trying to use it. I fixed some minor formatting issues while going through the code. I can take these changes out if it is considered improper to commit such non-related changes with the main changes. This addresses bug HBASE-5209 . https://issues.apache.org/jira/browse/HBASE-5209 Diffs src/main/java/org/apache/hadoop/hbase/ClusterStatus.java b849429 src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 2f60b23 src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9d21903 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java f6f3f71 src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 111f76e src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 3e3d131 src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java 16e4744 src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java bc98fb0 Diff: https://reviews.apache.org/r/3892/diff Testing ------- Ran mvn -P localTests test multiple times - no new tests fail Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs - no failures Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no failures Started active and multiple backup masters, then killed active master, then brought it back up (will now be a backup master) Did the following before and after killing hbase fsck - checked output to see that active and backup masters are reported properly zk_dump - checked that active and backup masters are reported properly Started cluster with no backup masters to make sure change operates correctly that way Ran dev-support/test-patch.sh - no new issues fail: -1 overall. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version ) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. Thanks, David
          Hide
          David S. Wang added a comment -

          I'm going to take out isActiveMaster and isMasterRunning, which I think will address the bulk of the comments. Agree that we should be able to assume that if master is not null, that it is the active one.

          When I have a new patch ready, I'll use reviewboard.

          Thanks for the helpful comments; I appreciate them and ask for your understanding as this is my first upstream commit.

          Show
          David S. Wang added a comment - I'm going to take out isActiveMaster and isMasterRunning, which I think will address the bulk of the comments. Agree that we should be able to assume that if master is not null, that it is the active one. When I have a new patch ready, I'll use reviewboard. Thanks for the helpful comments; I appreciate them and ask for your understanding as this is my first upstream commit.
          Hide
          stack added a comment -

          I have a multi-master HBase set up, and I'm trying to programmatically determine which of the masters is currently active. But the API does not allow me to do this. There is a getMaster() method in the HConnection class, but it returns an HMasterInterface, whose methods do not allow me to find out which master won the last race. The API should have a getActiveMasterHostname() or something to that effect.

          If you do a getMaster, I'd think that you should get the active master, only, in HConnection. Are you saying that it'll give you an Interface on the non-active Master? Thats broke I'd say. For the name of the Master, yeah, getServerName should be part of HMasterInterface.

          On the patch:

          +  private boolean isMasterRunning, isActiveMaster;
          

          The above are the names of methods, not data members. Should be masterRunning and activeMaster.

          Whats going on here:

          +    this.master = master;
          +    this.isMasterRunning = isMasterRunning;
          +    this.isActiveMaster = isActiveMaster;
          

          So, we could be reporting a master that is not running and not the active master? Why would we even care about it in that case?

          getMasterInfo as method name returning master ServerName seems off. Is this the 'active' master or non-running master?

          I think we need to be clear that ClusterStatus reports on the active master only (unless you want to add list of all running master which I don't think yet possible since they do not register until they assume mastership — hmmm... looking further down in your patch, it looks like you are adding this facility to zk).

          Is this of any use?

          + public boolean isMasterRunning() {

          I mean, if master is not running, can you even get a ClusterStatus from the cluster?

          Ditto for + public boolean isActiveMaster() {

          Won't this just be true anytime you get a ClusterStatue?

          You up the ClusterStatue version number but you don't act on it (what if you are asked deserialize an earlier version of ClusterStatus?)

          On MasterInterface, I'd suggest don't bother upping the version number – just add the new method on the end. Thats usually ok. Also, isActiveMaster of any use even? (You could ask zk directly? Have hbaseadmin go ask zk rather than go via the master at all? Isn't the master znode name its ServerName? Isn't that what you need?)

          I like your registering backup masters... and adding the list to the zk report.

          Show
          stack added a comment - I have a multi-master HBase set up, and I'm trying to programmatically determine which of the masters is currently active. But the API does not allow me to do this. There is a getMaster() method in the HConnection class, but it returns an HMasterInterface, whose methods do not allow me to find out which master won the last race. The API should have a getActiveMasterHostname() or something to that effect. If you do a getMaster, I'd think that you should get the active master, only, in HConnection. Are you saying that it'll give you an Interface on the non-active Master? Thats broke I'd say. For the name of the Master, yeah, getServerName should be part of HMasterInterface. On the patch: + private boolean isMasterRunning, isActiveMaster; The above are the names of methods, not data members. Should be masterRunning and activeMaster. Whats going on here: + this .master = master; + this .isMasterRunning = isMasterRunning; + this .isActiveMaster = isActiveMaster; So, we could be reporting a master that is not running and not the active master? Why would we even care about it in that case? getMasterInfo as method name returning master ServerName seems off. Is this the 'active' master or non-running master? I think we need to be clear that ClusterStatus reports on the active master only (unless you want to add list of all running master which I don't think yet possible since they do not register until they assume mastership — hmmm... looking further down in your patch, it looks like you are adding this facility to zk). Is this of any use? + public boolean isMasterRunning() { I mean, if master is not running, can you even get a ClusterStatus from the cluster? Ditto for + public boolean isActiveMaster() { Won't this just be true anytime you get a ClusterStatue? You up the ClusterStatue version number but you don't act on it (what if you are asked deserialize an earlier version of ClusterStatus?) On MasterInterface, I'd suggest don't bother upping the version number – just add the new method on the end. Thats usually ok. Also, isActiveMaster of any use even? (You could ask zk directly? Have hbaseadmin go ask zk rather than go via the master at all? Isn't the master znode name its ServerName? Isn't that what you need?) I like your registering backup masters... and adding the list to the zk report.
          Hide
          Ted Yu added a comment -

          For ServerName to provide info port, we can open a separate JIRA.

          @David:
          Can you explain the rationale behind the new parameters ?

          +      final boolean isMasterRunning,
          +      final boolean isActiveMaster,
          

          Looking at the current fields related to servers in ClusterStatus, they are all of Collection.

          I would expect a ServerName to represent the active master.

          Once design passes review, please use review board for further discussion.

          Show
          Ted Yu added a comment - For ServerName to provide info port, we can open a separate JIRA. @David: Can you explain the rationale behind the new parameters ? + final boolean isMasterRunning, + final boolean isActiveMaster, Looking at the current fields related to servers in ClusterStatus, they are all of Collection. I would expect a ServerName to represent the active master. Once design passes review, please use review board for further discussion.
          Hide
          Jonathan Hsieh added a comment -

          Might be out of scope for this patch but the ServerName structure doesn't have the info (http) port of master or rs's (not sure about the history on this). It would be great if we could add some information in cluster stats that also provides the info port of the active master.

          Show
          Jonathan Hsieh added a comment - Might be out of scope for this patch but the ServerName structure doesn't have the info (http) port of master or rs's (not sure about the history on this). It would be great if we could add some information in cluster stats that also provides the info port of the active master.
          Hide
          David S. Wang added a comment -

          I will address the unit test failures.

          Show
          David S. Wang added a comment - I will address the unit test failures.
          Hide
          Ted Yu added a comment -

          @David:
          There are new test failures reported by Hadoop QA, can you double check your patch ?

          Show
          Ted Yu added a comment - @David: There are new test failures reported by Hadoop QA, can you double check your patch ?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12514085/HBASE-5209-v1.diff
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -136 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 156 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.master.TestActiveMasterManager
          org.apache.hadoop.hbase.master.TestMasterZKSessionRecovery
          org.apache.hadoop.hbase.io.hfile.TestHFileBlock
          org.apache.hadoop.hbase.mapreduce.TestImportTsv
          org.apache.hadoop.hbase.mapred.TestTableMapReduce
          org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
          org.apache.hadoop.hbase.TestZooKeeper

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/939//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/939//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/939//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12514085/HBASE-5209-v1.diff against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 156 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestActiveMasterManager org.apache.hadoop.hbase.master.TestMasterZKSessionRecovery org.apache.hadoop.hbase.io.hfile.TestHFileBlock org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat org.apache.hadoop.hbase.TestZooKeeper Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/939//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/939//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/939//console This message is automatically generated.
          Hide
          David S. Wang added a comment -

          Ran this one past test-patch.

          Show
          David S. Wang added a comment - Ran this one past test-patch.
          Hide
          David S. Wang added a comment -

          Need to resubmit new patch with correct formatting.

          Show
          David S. Wang added a comment - Need to resubmit new patch with correct formatting.
          Hide
          David S. Wang added a comment -

          Ran this one past test-patch.

          Show
          David S. Wang added a comment - Ran this one past test-patch.
          Hide
          David S. Wang added a comment -

          I also ran "hbase hbck -details" successfully in order to exercise the changes in HBaseFsck.java.

          Show
          David S. Wang added a comment - I also ran "hbase hbck -details" successfully in order to exercise the changes in HBaseFsck.java.
          Hide
          David S. Wang added a comment -

          I tested this by:

          • Running "mvn -P localTests -Dtest=TestMasterFailover test" successfully
          • Starting a master and a couple of backup masters, checking zk_dump, killing the master, checking zk_dump, restarting the master, checking zk_dump. All checks displayed expected values in the master and backupmaster ZNodes.
          Show
          David S. Wang added a comment - I tested this by: Running "mvn -P localTests -Dtest=TestMasterFailover test" successfully Starting a master and a couple of backup masters, checking zk_dump, killing the master, checking zk_dump, restarting the master, checking zk_dump. All checks displayed expected values in the master and backupmaster ZNodes.
          Hide
          Jonathan Hsieh added a comment -

          As a suggestion, instead of adding new methods to the HMasterInterface or HConnection, how about adding and serializing data into the o.a.h.hbase.ClusterStatus object?

          If we want to get lists of the standby masters, we'd probably want to add some info into ZK in a znode such as /hbase/backup-masters/<masterid>.

          Show
          Jonathan Hsieh added a comment - As a suggestion, instead of adding new methods to the HMasterInterface or HConnection, how about adding and serializing data into the o.a.h.hbase.ClusterStatus object? If we want to get lists of the standby masters, we'd probably want to add some info into ZK in a znode such as /hbase/backup-masters/<masterid> .

            People

            • Assignee:
              David S. Wang
              Reporter:
              Aditya Acharya
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development