HBase
  1. HBase
  2. HBASE-5926

Delete the master znode after a master crash

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.95.2
    • Fix Version/s: 0.95.0
    • Component/s: master, scripts
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      This is the continuation of the work done in HBASE-5844.
      But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.

      So if we apply the same strategy as for a regionserver, we may have this scenario:
      1) Master starts
      2) Backup master starts
      3) Master dies
      4) ZK detects it
      5) Backup master receives the update from ZK
      6) Backup master creates the new master node and become the main master
      7) Previous master script continues
      8) Previous master script deletes the master node in ZK
      9) => issue: we deleted the node just created by the new master

      This should not happen often (usually the znode will be deleted soon enough), but it can happen.

      1. 5926.v14.patch
        19 kB
        Nicolas Liochon
      2. 5926.v13.patch
        19 kB
        Nicolas Liochon
      3. 5926.v11.patch
        14 kB
        Nicolas Liochon
      4. 5926.v10.patch
        13 kB
        Nicolas Liochon
      5. 5926.v9.patch
        18 kB
        Nicolas Liochon
      6. 5926.v8.patch
        18 kB
        Nicolas Liochon
      7. 5926.v6.patch
        13 kB
        Nicolas Liochon

        Issue Links

          Activity

          Hide
          Nicolas Liochon added a comment -

          the race condition is decreased to a production-acceptable minimum imho. We do a compare & delete in the java code, so the race condition is now: between the comparison and the delete, we fail if, and only if: the session expires and the master node is deleted and the master backup recreates the node. That's unlikely.

          Show
          Nicolas Liochon added a comment - the race condition is decreased to a production-acceptable minimum imho. We do a compare & delete in the java code, so the race condition is now: between the comparison and the delete, we fail if, and only if: the session expires and the master node is deleted and the master backup recreates the node. That's unlikely.
          Hide
          Ted Yu added a comment -

          I cannot find CleanZnode class in the patch.

          Show
          Ted Yu added a comment - I cannot find CleanZnode class in the patch.
          Hide
          Ted Yu added a comment -
          +    "Usage: Master [opts] start|stop|cleanZNode\n" +
          

          Please add document for cleanZNode command.

          +   * delete the znode master if its content is same to the parameter
          

          'znode master' -> 'master znode', 'same to' -> 'same as'

          +    } catch (KeeperException ignore) {
          +    } catch (IOException ignore) {
          +    }
          

          I would expect some logging for above cases.

          Show
          Ted Yu added a comment - + "Usage: Master [opts] start|stop|cleanZNode\n" + Please add document for cleanZNode command. + * delete the znode master if its content is same to the parameter 'znode master' -> 'master znode', 'same to' -> 'same as' + } catch (KeeperException ignore) { + } catch (IOException ignore) { + } I would expect some logging for above cases.
          Hide
          Nicolas Liochon added a comment -

          v8. with Ted's comments taken into account.

          Show
          Nicolas Liochon added a comment - v8. with Ted's comments taken into account.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12527816/5926.v8.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 hadoop23. The patch compiles against the hadoop 0.23.x profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 33 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.replication.TestReplication
          org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1909//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1909//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1909//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12527816/5926.v8.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 33 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.replication.TestReplication org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1909//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1909//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1909//console This message is automatically generated.
          Hide
          Nicolas Liochon added a comment -

          These tests run ok locally.

          Show
          Nicolas Liochon added a comment - These tests run ok locally.
          Hide
          Ted Yu added a comment -

          Please add javadoc for the new class:

          +public class CleanZnode {
          

          readMyEphemeralNodeOnDisk() throws IOException but writeMyEphemeralNodeOnDisk() doesn't. What was the reason ?

          +   * Get the name of the file used to store the znode
          

          Please add ' contents' at the end of the above.

          +  public static int cleanZNode(Configuration conf) {
          +    conf.setInt("zookeeper.recovery.retry", 0);
          

          Should the setting be restored before exiting the above method ?

          Show
          Ted Yu added a comment - Please add javadoc for the new class: + public class CleanZnode { readMyEphemeralNodeOnDisk() throws IOException but writeMyEphemeralNodeOnDisk() doesn't. What was the reason ? + * Get the name of the file used to store the znode Please add ' contents' at the end of the above. + public static int cleanZNode(Configuration conf) { + conf.setInt( "zookeeper.recovery.retry" , 0); Should the setting be restored before exiting the above method ?
          Hide
          Nicolas Liochon added a comment -

          javadoc

          done.

          readMyEphemeralNodeOnDisk() throws IOException but writeMyEphemeralNodeOnDisk() doesn't. What was the reason ?

          When we write we ignore the results (i.e. we don't stop the master or the region server if we can't store the znode, we just continue). When we read, we're interested in the exception: the pattern in HMasterCommandLine is to return -1 on error.

          Please add ' contents' at the end of the above.

          ok.

          Should the setting be restored before exiting the above method ?

          I now clone the conf.

          Show
          Nicolas Liochon added a comment - javadoc done. readMyEphemeralNodeOnDisk() throws IOException but writeMyEphemeralNodeOnDisk() doesn't. What was the reason ? When we write we ignore the results (i.e. we don't stop the master or the region server if we can't store the znode, we just continue). When we read, we're interested in the exception: the pattern in HMasterCommandLine is to return -1 on error. Please add ' contents' at the end of the above. ok. Should the setting be restored before exiting the above method ? I now clone the conf.
          Hide
          Ted Yu added a comment -

          I now clone the conf.

          That is safer, avoiding race condition w.r.t. the original conf.

          Show
          Ted Yu added a comment - I now clone the conf. That is safer, avoiding race condition w.r.t. the original conf.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12527860/5926.v9.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 hadoop23. The patch compiles against the hadoop 0.23.x profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 33 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1912//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1912//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1912//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12527860/5926.v9.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 33 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1912//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1912//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1912//console This message is automatically generated.
          Hide
          Ted Yu added a comment -

          @Stack:
          Can you comment on the latest patch ?

          Show
          Ted Yu added a comment - @Stack: Can you comment on the latest patch ?
          Hide
          stack added a comment -

          CleanZNode should be in the tool package or in zookeeper (probably the latter since thats its dependencies) ... hmmm.. but hangon, I see why you have it here... because its used by master and regionserver packages. Thats fine. Its in the right place I'd say.

          You should look at the javadoc that is created from your src. Its going to be a jumble. Check it out. You need a little bit of html in there at least for your list of strategy dependencies.

          What is the filecontent? We don't need any, right? The name of the file is enough?

          This should be boolean rather than int? Or is it returned to shell? If so, should say so in the comment: "+ * @return if done returns 0 else -1."

          Is CleanZNode a good name? How about ZNodeCleaner or ZNodeClearer or CrashZNodeCleaner?

          I think in HMasterCommandLine, should be start|stop|clear so it fits format of the other commands.

          In MasterAddressTracker, can you get the znode sequence id and only delete if the sequence id matches?

          Show
          stack added a comment - CleanZNode should be in the tool package or in zookeeper (probably the latter since thats its dependencies) ... hmmm.. but hangon, I see why you have it here... because its used by master and regionserver packages. Thats fine. Its in the right place I'd say. You should look at the javadoc that is created from your src. Its going to be a jumble. Check it out. You need a little bit of html in there at least for your list of strategy dependencies. What is the filecontent? We don't need any, right? The name of the file is enough? This should be boolean rather than int? Or is it returned to shell? If so, should say so in the comment: "+ * @return if done returns 0 else -1." Is CleanZNode a good name? How about ZNodeCleaner or ZNodeClearer or CrashZNodeCleaner? I think in HMasterCommandLine, should be start|stop|clear so it fits format of the other commands. In MasterAddressTracker, can you get the znode sequence id and only delete if the sequence id matches?
          Hide
          Nicolas Liochon added a comment -

          You should look at the javadoc that is created from your src. Its going to be a jumble. Check it out. You need a little bit of html in there at least for your list of strategy dependencies.

          Done.

          What is the filecontent? We don't need any, right? The name of the file is enough?

          We need the content. For the regionserver, the content is the znode path. For the master it's the full ServerName (stringified).

          This should be boolean rather than int? Or is it returned to shell? If so, should say so in the comment: "+ * @return if done returns 0 else -1."

          Done.

          Is CleanZNode a good name? How about ZNodeCleaner or ZNodeClearer or CrashZNodeCleaner?

          Renamed to ZNodeClearer

          I think in HMasterCommandLine, should be start|stop|clear so it fits format of the other commands.

          Done.

          In MasterAddressTracker, can you get the znode sequence id and only delete if the sequence id matches?

          We store the full ServerName so if there is a restart we will see it. But maybe you're speaking about the znode version? Because I looked at the zk api, and with the version we could remove totally the race condition...

          Show
          Nicolas Liochon added a comment - You should look at the javadoc that is created from your src. Its going to be a jumble. Check it out. You need a little bit of html in there at least for your list of strategy dependencies. Done. What is the filecontent? We don't need any, right? The name of the file is enough? We need the content. For the regionserver, the content is the znode path. For the master it's the full ServerName (stringified). This should be boolean rather than int? Or is it returned to shell? If so, should say so in the comment: "+ * @return if done returns 0 else -1." Done. Is CleanZNode a good name? How about ZNodeCleaner or ZNodeClearer or CrashZNodeCleaner? Renamed to ZNodeClearer I think in HMasterCommandLine, should be start|stop|clear so it fits format of the other commands. Done. In MasterAddressTracker, can you get the znode sequence id and only delete if the sequence id matches? We store the full ServerName so if there is a restart we will see it. But maybe you're speaking about the znode version? Because I looked at the zk api, and with the version we could remove totally the race condition...
          Hide
          stack added a comment -

          We need the content. For the regionserver, the content is the znode path. For the master it's the full ServerName (stringified).

          Does it have to be full path? Can it not just be the node name? Thats safe to put in the fs? Master servername stringified should be safe in the fs too.

          Yeah, version.

          Show
          stack added a comment - We need the content. For the regionserver, the content is the znode path. For the master it's the full ServerName (stringified). Does it have to be full path? Can it not just be the node name? Thats safe to put in the fs? Master servername stringified should be safe in the fs too. Yeah, version.
          Hide
          Nicolas Liochon added a comment -

          Yes, it could be the node name only. Does it make a difference? To me, they can both be safely written in the fs.
          I check the version in v11, so there is no race condition at all now.

          Show
          Nicolas Liochon added a comment - Yes, it could be the node name only. Does it make a difference? To me, they can both be safely written in the fs. I check the version in v11, so there is no race condition at all now.
          Hide
          Ted Yu added a comment -

          @N:
          Did you forget to include ZNodeClearer class in the latest patch ?

          Show
          Ted Yu added a comment - @N: Did you forget to include ZNodeClearer class in the latest patch ?
          Hide
          Ted Yu added a comment -
          +    } catch (KeeperException e) {
          +      LOG.info("Can't get or delete the master znode", e);
          +    } catch (DeserializationException e) {
          +      LOG.info("Can't get or delete the master znode", e);
          +    }
          

          I think the above log should be at WARN level.

          Show
          Ted Yu added a comment - + } catch (KeeperException e) { + LOG.info( "Can't get or delete the master znode" , e); + } catch (DeserializationException e) { + LOG.info( "Can't get or delete the master znode" , e); + } I think the above log should be at WARN level.
          Hide
          stack added a comment -

          Its fine putting it in file. And what Ted said. Thanks N.

          Show
          stack added a comment - Its fine putting it in file. And what Ted said. Thanks N.
          Hide
          Nicolas Liochon added a comment -

          v13 should do it...

          Show
          Nicolas Liochon added a comment - v13 should do it...
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12528018/5926.v13.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 hadoop23. The patch compiles against the hadoop 0.23.x profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 33 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1927//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1927//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1927//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12528018/5926.v13.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 33 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1927//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1927//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1927//console This message is automatically generated.
          Hide
          Nicolas Liochon added a comment -

          I think it's ok, I don't have this locally...

          Show
          Nicolas Liochon added a comment - I think it's ok, I don't have this locally...
          Hide
          Ted Yu added a comment -

          Minor comments:

          + * servers. It allows to delete immediately the znode when the master or the regions server crash.
          

          'regions server crash' -> 'region server crashes'

          + * The region server / master write a specific file when they start / become main master. When they
          

          'write a specific' -> 'writes a specific'. 'they start / become' -> 'it starts / becomes'
          I think using 'they' is confusing because region server and master have different roles.

          +    " clear  Delete the master znode in ZooKeeper after a master crash\n "+
          

          'master crash' -> 'master crashes'

          Show
          Ted Yu added a comment - Minor comments: + * servers. It allows to delete immediately the znode when the master or the regions server crash. 'regions server crash' -> 'region server crashes' + * The region server / master write a specific file when they start / become main master. When they 'write a specific' -> 'writes a specific'. 'they start / become' -> 'it starts / becomes' I think using 'they' is confusing because region server and master have different roles. + " clear Delete the master znode in ZooKeeper after a master crash\n " + 'master crash' -> 'master crashes'
          Hide
          stack added a comment -

          +1 on patch v14

          Show
          stack added a comment - +1 on patch v14
          Hide
          stack added a comment -

          I'll fix Ted comments on commit.

          Show
          stack added a comment - I'll fix Ted comments on commit.
          Hide
          stack added a comment -

          Looks like N already addressed Ted's review comments. Great.

          Applied to trunk. Thanks for the patch N.

          Show
          stack added a comment - Looks like N already addressed Ted's review comments. Great. Applied to trunk. Thanks for the patch N.
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #2899 (See https://builds.apache.org/job/HBase-TRUNK/2899/)
          HBASE-5926 Delete the master znode after a master crash (Revision 1340185)

          Result = FAILURE
          stack :
          Files :

          • /hbase/trunk/bin/hbase-daemon.sh
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMasterCommandLine.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/MasterAddressTracker.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperNodeTracker.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #2899 (See https://builds.apache.org/job/HBase-TRUNK/2899/ ) HBASE-5926 Delete the master znode after a master crash (Revision 1340185) Result = FAILURE stack : Files : /hbase/trunk/bin/hbase-daemon.sh /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMasterCommandLine.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/MasterAddressTracker.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperNodeTracker.java
          Hide
          Ted Yu added a comment -

          ZNodeClearer.java isn't in source repo.

          Show
          Ted Yu added a comment - ZNodeClearer.java isn't in source repo.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12528104/5926.v14.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 hadoop23. The patch compiles against the hadoop 0.23.x profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 33 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1933//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1933//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1933//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12528104/5926.v14.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 33 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1933//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1933//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1933//console This message is automatically generated.
          Hide
          stack added a comment -

          Fixed. Thanks Ted.

          Show
          stack added a comment - Fixed. Thanks Ted.
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #2900 (See https://builds.apache.org/job/HBase-TRUNK/2900/)
          HBASE-5926 Delete the master znode after a master crash (Revision 1340200)

          Result = SUCCESS
          stack :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ZNodeClearer.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #2900 (See https://builds.apache.org/job/HBase-TRUNK/2900/ ) HBASE-5926 Delete the master znode after a master crash (Revision 1340200) Result = SUCCESS stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ZNodeClearer.java
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #10 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/10/)
          HBASE-5926 Delete the master znode after a master crash (Revision 1340200)
          HBASE-5926 Delete the master znode after a master crash (Revision 1340185)

          Result = FAILURE
          stack :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ZNodeClearer.java

          stack :
          Files :

          • /hbase/trunk/bin/hbase-daemon.sh
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMasterCommandLine.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/MasterAddressTracker.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperNodeTracker.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #10 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/10/ ) HBASE-5926 Delete the master znode after a master crash (Revision 1340200) HBASE-5926 Delete the master znode after a master crash (Revision 1340185) Result = FAILURE stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ZNodeClearer.java stack : Files : /hbase/trunk/bin/hbase-daemon.sh /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMasterCommandLine.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/MasterAddressTracker.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperNodeTracker.java
          Hide
          Jean-Daniel Cryans added a comment -

          This jira has the odd side-effect of printing out a lot of garbage when running in standalone and killing it with -9, gist of it being:

          2012-11-29 13:08:27,227 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
          2012-11-29 13:08:27,227 ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper getData failed after 0 retries
          2012-11-29 13:08:27,227 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: clean znode for master Unable to get data of znode /hbase/master
          org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
                  at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
                  at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
                  at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1131)
                  at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:291)
                  at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:562)
                  at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:168)
                  at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:150)
                  at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:110)
                  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
                  at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:78)
                  at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2298)
          

          Basically the znode cleaner fails hard because ZK is offline.

          I was confused to see more logs being printed out after running the kill.

          Show
          Jean-Daniel Cryans added a comment - This jira has the odd side-effect of printing out a lot of garbage when running in standalone and killing it with -9, gist of it being: 2012-11-29 13:08:27,227 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master 2012-11-29 13:08:27,227 ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper getData failed after 0 retries 2012-11-29 13:08:27,227 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: clean znode for master Unable to get data of znode /hbase/master org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1131) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:291) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:562) at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:168) at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:150) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:110) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:78) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2298) Basically the znode cleaner fails hard because ZK is offline. I was confused to see more logs being printed out after running the kill.
          Hide
          stack added a comment -

          Marking closed.

          Show
          stack added a comment - Marking closed.

            People

            • Assignee:
              Nicolas Liochon
              Reporter:
              Nicolas Liochon
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development