HBase
  1. HBase
  2. HBASE-1295 Multi data center replication
  3. HBASE-2611

Handle RS that fails while processing the failure of another one

    Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.94.5, 0.95.0
    • Component/s: Replication
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      The fix for this issue uses Zookeeper multi functionality (hbase.zookeeper.useMulti). Please refer to hbase-default.xml about this property. There is an addendum fix at HBase-8099 (fixed in 0.94.6). In case you are running on branch < 0.94.6, please patch it with HBase-8099, OR make sure hbase.zookeeper.useMulti is set to false.
      Show
      The fix for this issue uses Zookeeper multi functionality (hbase.zookeeper.useMulti). Please refer to hbase-default.xml about this property. There is an addendum fix at HBase-8099 (fixed in 0.94.6). In case you are running on branch < 0.94.6, please patch it with HBase-8099, OR make sure hbase.zookeeper.useMulti is set to false.

      Description

      HBASE-2223 doesn't manage region servers that fail while doing the transfer of HLogs queues from other region servers that failed. Devise a reliable way to do it.

      1. 2611-0.94.txt
        6 kB
        Lars Hofhansl
      2. 2611-trunk-v4.patch
        6 kB
        Ted Yu
      3. HBASE-2611-trunk-v3.patch
        6 kB
        Himanshu Vashishtha
      4. 2611-trunk-v3.patch
        6 kB
        Ted Yu
      5. HBASE-2611-trunk-v2.patch
        6 kB
        Himanshu Vashishtha
      6. 2611-v3.patch
        6 kB
        Ted Yu
      7. HBASE-2611-v2.patch
        6 kB
        Himanshu Vashishtha
      8. HBase-2611-upstream-v1.patch
        13 kB
        Himanshu Vashishtha

        Issue Links

          Activity

          Hide
          Jean-Daniel Cryans added a comment -

          My solution would require the use of 4 "types" of znodes:

          • Intention: When a node fails, a machine first writes locally the intention of locking the other's
          • Lock: After writing the intention a node "locks" another's folder of queues by creating a znode that contains its startcode
          • Tag: After sucessfully locking a znode, the winning node puts a znode locally that gives the locked RS' start code and lists the queues that are going to be copied
          • Delete: After copying everything that was listed in a tag, the node creates this znode aside the lock to mark those queues as being under deletion (because there's no atomic recursive delete in ZK)

          Then let's say we have the following machines:

          Machine A: The first machine that dies
          Machine B: The machine that's trying to failover A but that fails while doing it
          Machine C: The machine that's trying to failover B and that acquires the lock successfully

          Here's now what should happen to node C when B fails after step X.

          Machine B fails: Machine C
          After writing intention Reads the intention, sees it doesn't own the lock, removes the intention and proceeds with failover of just B
          After writing lock Reads the intention, sees the lock is owned by B, repeats the failover process going to the deepest node
          After writing tag same, but takes care not to copy what's in the tag when doing the failover of B
          After copying some znodes same behavior, as long as the tag is there what's in it is considered dirty
          After writing delete marker Reads the intention, sees the lock, sees the delete marker, finishes deleting all znodes, deletes the tag, then failovers B
          After deleting the local tag Reads the intention, sees the znode is gone, deletes the intention znode, failovers B
          After deleting the intention Basic failover
          Show
          Jean-Daniel Cryans added a comment - My solution would require the use of 4 "types" of znodes: Intention: When a node fails, a machine first writes locally the intention of locking the other's Lock: After writing the intention a node "locks" another's folder of queues by creating a znode that contains its startcode Tag: After sucessfully locking a znode, the winning node puts a znode locally that gives the locked RS' start code and lists the queues that are going to be copied Delete: After copying everything that was listed in a tag, the node creates this znode aside the lock to mark those queues as being under deletion (because there's no atomic recursive delete in ZK) Then let's say we have the following machines: Machine A: The first machine that dies Machine B: The machine that's trying to failover A but that fails while doing it Machine C: The machine that's trying to failover B and that acquires the lock successfully Here's now what should happen to node C when B fails after step X. Machine B fails: Machine C After writing intention Reads the intention, sees it doesn't own the lock, removes the intention and proceeds with failover of just B After writing lock Reads the intention, sees the lock is owned by B, repeats the failover process going to the deepest node After writing tag same, but takes care not to copy what's in the tag when doing the failover of B After copying some znodes same behavior, as long as the tag is there what's in it is considered dirty After writing delete marker Reads the intention, sees the lock, sees the delete marker, finishes deleting all znodes, deletes the tag, then failovers B After deleting the local tag Reads the intention, sees the znode is gone, deletes the intention znode, failovers B After deleting the intention Basic failover
          Hide
          Jean-Daniel Cryans added a comment -

          Punting, won't have time to do it for 0.90.0

          Show
          Jean-Daniel Cryans added a comment - Punting, won't have time to do it for 0.90.0
          Hide
          stack added a comment -

          Moving out of 0.92.0. Pull it back in if you think different.

          Show
          stack added a comment - Moving out of 0.92.0. Pull it back in if you think different.
          Hide
          Chris Trezzo added a comment -

          @J-D

          If you don't mind, I was thinking about taking a crack at this using your 4 "types" of znode strategy. I'll start working on a sketch patch.

          At a first glance, it seems as though most of the code changes are going to be in ReplicationSourceManager.NodeFailoverWorker.run().

          Show
          Chris Trezzo added a comment - @J-D If you don't mind, I was thinking about taking a crack at this using your 4 "types" of znode strategy. I'll start working on a sketch patch. At a first glance, it seems as though most of the code changes are going to be in ReplicationSourceManager.NodeFailoverWorker.run().
          Hide
          Chris Trezzo added a comment -

          ...and of course ReplicationZookeeper.

          Show
          Chris Trezzo added a comment - ...and of course ReplicationZookeeper.
          Hide
          Jean-Daniel Cryans added a comment -

          Actually it would be nice if it was in a separate utility package since atomically moving a znode "folder" recursively would be a very useful function in general. It might even already exist on the net.

          Show
          Jean-Daniel Cryans added a comment - Actually it would be nice if it was in a separate utility package since atomically moving a znode "folder" recursively would be a very useful function in general. It might even already exist on the net.
          Hide
          Chris Trezzo added a comment -

          I think adding the ability to atomically move a znode and all its child znodes might be a pretty invasive change. I couldn't seem to find any utility package for this on the net, but there is a patch in Zookeeper (ZOOKEEPER-965) implementing atomic batch operations that is scheduled for 3.4.

          I thought about the problem a little bit, and after conferring with Lars, I think we might not need the atomic move (although it would definitely make it simpler).

          Below is some pseudo code for the algorithm I came up with. It is very similar to what you suggested above. Both intentions and locks are tagged with the region server they point to (i.e. locks are tagged with the rs that holds them, and intentions are tagged with the rs they intend to lock). Intentions are at the same level in the znode structure as locks. It is a recursive, depth first algorithm.

          Questions/comments/suggestions always appreciated.

          Chris

          
          //this method is the top-level failover method (i.e. NodeFailoverWorker.run())
          failOverRun(FailedNode a) {
            recordIntention(a, this);
            if(getLock(a, this)) {
              //transfer all queues to local node
              moveState(a, this, this);
            }
            else {
              deleteIntention(a, this);
              return;
            }
            replicateQueues();
          }
          
          moveState(NodeToMove a, CurrentNode c, TargetNode t) {
            if(lock exists on a) {
              if(lock on a is owned by c) {
                moveStateHelper(a, c, t);
              }
              else {
                //someone else has the lock and is handling
                //the failover
                deleteIntention(a, c);
              }
            }
            else {
              if(queue znodes exist) {
                //we know that this node has queues to transfer
                if(getLock(a, c)) {
                  moveStateHelper(a, c, t);
                }
                else {
                  deleteIntention(a, c);
                }
              }
              else {
                //we know that this node is being deleted
                deleteState(a);
                deleteIntention(a, c);
              }
            }
          }
          
          moveStateHelper(NodeToMove a, CurrentNode c, TargetNode t) {
            for(every intention b of a) {
              moveState(b, a, t);
            }
            //we need to safely handle the case where we try to copy
            //queues that have already been copied
            copy all queues in a to t;
            deleteState(a);
            deleteIntention(a, c);
          }
          
          deleteState(NodeToDelete d) {
            //there is no need to traverse down the tree at all
            //because at this point everything below us should have
            //been deleted
            //
            //we need to safely handle the case where we attempt to delete
            //nodes that have already been deleted
          
            delete entire node;
          }
          
          
          Show
          Chris Trezzo added a comment - I think adding the ability to atomically move a znode and all its child znodes might be a pretty invasive change. I couldn't seem to find any utility package for this on the net, but there is a patch in Zookeeper ( ZOOKEEPER-965 ) implementing atomic batch operations that is scheduled for 3.4. I thought about the problem a little bit, and after conferring with Lars, I think we might not need the atomic move (although it would definitely make it simpler). Below is some pseudo code for the algorithm I came up with. It is very similar to what you suggested above. Both intentions and locks are tagged with the region server they point to (i.e. locks are tagged with the rs that holds them, and intentions are tagged with the rs they intend to lock). Intentions are at the same level in the znode structure as locks. It is a recursive, depth first algorithm. Questions/comments/suggestions always appreciated. Chris // this method is the top-level failover method (i.e. NodeFailoverWorker.run()) failOverRun(FailedNode a) { recordIntention(a, this ); if (getLock(a, this )) { //transfer all queues to local node moveState(a, this , this ); } else { deleteIntention(a, this ); return ; } replicateQueues(); } moveState(NodeToMove a, CurrentNode c, TargetNode t) { if (lock exists on a) { if (lock on a is owned by c) { moveStateHelper(a, c, t); } else { //someone else has the lock and is handling //the failover deleteIntention(a, c); } } else { if (queue znodes exist) { //we know that this node has queues to transfer if (getLock(a, c)) { moveStateHelper(a, c, t); } else { deleteIntention(a, c); } } else { //we know that this node is being deleted deleteState(a); deleteIntention(a, c); } } } moveStateHelper(NodeToMove a, CurrentNode c, TargetNode t) { for (every intention b of a) { moveState(b, a, t); } //we need to safely handle the case where we try to copy //queues that have already been copied copy all queues in a to t; deleteState(a); deleteIntention(a, c); } deleteState(NodeToDelete d) { //there is no need to traverse down the tree at all //because at this point everything below us should have //been deleted // //we need to safely handle the case where we attempt to delete //nodes that have already been deleted delete entire node; }
          Hide
          Ted Yu added a comment -

          In moveState(), if lock on a is owned by c, should lock be released after moveStateHelper() returns ?
          I guess lock release can also be done at the end of moveStateHelper().

          Show
          Ted Yu added a comment - In moveState(), if lock on a is owned by c, should lock be released after moveStateHelper() returns ? I guess lock release can also be done at the end of moveStateHelper().
          Hide
          Chris Trezzo added a comment -

          I should have specified that in deleteState(), the line "delete entire node" deletes the entire znode replication hierarchy for that region server. This would include the lock znode, which is essentially releasing the lock at the end of moveStateHelper().

          Thanks!
          Chris

          Show
          Chris Trezzo added a comment - I should have specified that in deleteState(), the line "delete entire node" deletes the entire znode replication hierarchy for that region server. This would include the lock znode, which is essentially releasing the lock at the end of moveStateHelper(). Thanks! Chris
          Hide
          Chris Trezzo added a comment -

          @J-D

          LarsH and I were talking about another approach to region server replication hlog queue failover yesterday, and I wanted to get some feedback on it.

          Currently when handling a nodeDeleted event, the live region servers only attempt to failover the node corresponding to the event. The nodeDeleted event is only fired once, so to protect ourselves from orphaning the znode state of the failed region server in a cascading failure scenario, we move the state to the znode of the region server that is performing the failover. Since we don't have an atomic way to move this state, it gets a little tricky.

          Instead of this approach, we could have the region server attempt to failover all failed region servers every time it receives a nodeDeleted event. For example, the nodeDeleted method could go something like this: refresh the region server list, get the list of region servers in the replication znode structure, attempt to lock and failover any region server listed in the replication znode structure that is not currently alive.

          The same race to lock the region server znode will occur. Only one region server will get the lock and handle the failover. Each NodeFailoverWorker that gets started could simply operate on the original dead region server znode structure. If the region server fails while preforming the failover, then both the region servers will get picked up by another region server when the nodeDeleted event for the second failure is fired. Locks would have to be ephemeral nodes to prevent permanent locking of a region server when the failover region server dies. Once the replication hlog queues are successfully replicated, the znode for the dead region server can be deleted.

          On the cons side, this approach makes the handling of a nodeDeleted event a heavier weight operation.

          On the pros side, it makes the failover code much simpler because we no longer have to worry about moving the region server znode state around in zookeeper.

          Thoughts always appreciated.

          Thanks,
          Chris

          Show
          Chris Trezzo added a comment - @J-D LarsH and I were talking about another approach to region server replication hlog queue failover yesterday, and I wanted to get some feedback on it. Currently when handling a nodeDeleted event, the live region servers only attempt to failover the node corresponding to the event. The nodeDeleted event is only fired once, so to protect ourselves from orphaning the znode state of the failed region server in a cascading failure scenario, we move the state to the znode of the region server that is performing the failover. Since we don't have an atomic way to move this state, it gets a little tricky. Instead of this approach, we could have the region server attempt to failover all failed region servers every time it receives a nodeDeleted event. For example, the nodeDeleted method could go something like this: refresh the region server list, get the list of region servers in the replication znode structure, attempt to lock and failover any region server listed in the replication znode structure that is not currently alive. The same race to lock the region server znode will occur. Only one region server will get the lock and handle the failover. Each NodeFailoverWorker that gets started could simply operate on the original dead region server znode structure. If the region server fails while preforming the failover, then both the region servers will get picked up by another region server when the nodeDeleted event for the second failure is fired. Locks would have to be ephemeral nodes to prevent permanent locking of a region server when the failover region server dies. Once the replication hlog queues are successfully replicated, the znode for the dead region server can be deleted. On the cons side, this approach makes the handling of a nodeDeleted event a heavier weight operation. On the pros side, it makes the failover code much simpler because we no longer have to worry about moving the region server znode state around in zookeeper. Thoughts always appreciated. Thanks, Chris
          Hide
          Himanshu Vashishtha added a comment -

          I looked at this issue from the perspective of using Zookeeper#multi Operation (present in 3.4). This API guarantees to do a list of Op as a single transaction, rolling back all the Ops in case any of the Op fails. I tested this functionality as a standalone case (where the transaction was to move a bunch of Znodes from one parent to another), and it works good (out of N threads which race to do the transfer, only 1 is successful). And in case of a failure, all the Ops done so far are rolled back. I can attach the sample code if required.

          Here is the approach I used to utilize multi for this issue:
          a) All the active region servers tries to "move" the logs of peers under the dead regionserver znode. It involves creating Op objects for creating new znodes and deleting old ones. As per the multi API guarantee, only one regionserver will be successful in moving the znodes.

          b) The regionservers will "keep on trying to move" the znodes from the dead regionserver untill they are sure that the node is deleted (by the successful regionserver), or there is no log to process. This is to avoid any corner case so as not to miss the logs for the dead regionserver. The number of trials can be made configurable.

          c) In case of cascading failure (when the successful regionserver dies before it gets the notification from zk about the successful move), other regionservers will get this new event and will proceed as normal (will try to move all the znodes from this newly dead regionserver znode).

          It will be good to know what others think about this approach. Other rogue conditions that can happen?

          Attached is a patch based and I tested it by manually killing regionservers at random (not totally random, but killing one and then killing the successful one when it has just transferred the logs) (its difficult to kill it while transferring as its an atomic operation now). Any ideas/suggestions for more direct testing are welcome.

          Show
          Himanshu Vashishtha added a comment - I looked at this issue from the perspective of using Zookeeper#multi Operation (present in 3.4). This API guarantees to do a list of Op as a single transaction, rolling back all the Ops in case any of the Op fails. I tested this functionality as a standalone case (where the transaction was to move a bunch of Znodes from one parent to another), and it works good (out of N threads which race to do the transfer, only 1 is successful). And in case of a failure, all the Ops done so far are rolled back. I can attach the sample code if required. Here is the approach I used to utilize multi for this issue: a) All the active region servers tries to "move" the logs of peers under the dead regionserver znode. It involves creating Op objects for creating new znodes and deleting old ones. As per the multi API guarantee, only one regionserver will be successful in moving the znodes. b) The regionservers will "keep on trying to move" the znodes from the dead regionserver untill they are sure that the node is deleted (by the successful regionserver), or there is no log to process. This is to avoid any corner case so as not to miss the logs for the dead regionserver. The number of trials can be made configurable. c) In case of cascading failure (when the successful regionserver dies before it gets the notification from zk about the successful move), other regionservers will get this new event and will proceed as normal (will try to move all the znodes from this newly dead regionserver znode). It will be good to know what others think about this approach. Other rogue conditions that can happen? Attached is a patch based and I tested it by manually killing regionservers at random (not totally random, but killing one and then killing the successful one when it has just transferred the logs) (its difficult to kill it while transferring as its an atomic operation now). Any ideas/suggestions for more direct testing are welcome.
          Hide
          Ted Yu added a comment -

          Putting patch on review board helps.

          +   * @param opList: list of Op to be executed as one trx.
          

          'trx' -> 'transaction'

          +    if(opList == null || opList.size() ==0)
          

          Space between if and (.

          +    }catch (InterruptedException ie) {
          +      LOG.warn("multi call interrupted; process failed!" + ie);
          

          Restore interrupt status for the thread (same for doMultiAndWatch). Space between } and catch.

          +      LOG.warn("multi call failed! One of the passed ops has failed which result in the rolled back.");
          

          Line length beyond 100.

          +   * @return
          +   */
          +  public SortedMap<String, SortedSet<String>> copyDeadRSLogsWithMulti(
          +      String deadRSZnode) {
          

          javadoc for the return value.

          +      LOG.warn("This is us! Skipping the processing as we might be closing down.");
          

          Add deadRSZnodePath to the log.

          +    RetryCounterFactory retryCounterFactory = new RetryCounterFactory(Integer.MAX_VALUE, 3 * 1000);
          

          I don't think MAX_VALUE is a good choice.

          +        SortedSet<String> logQueue = new TreeSet<String>();
          

          Why is logQueue backed by a TreeSet ?

          +        LOG.warn("KeeperException occurred in multi; " +
          +            "seems some other regionserver took the logs before us.");
          

          Add ke to the above message.

          +        Op deleteOpForLog = Op.delete(zNodeForCurrentLog, -1);
          +        znodesToWatch.add(logZnode);
          +        opsList.add(createOpForLog);
          +        opsList.add(deleteOpForLog);
          

          Please reorder the above calls so that znodesToWatch.add() is after opsList.add() calls. This would make code more readable.

          Show
          Ted Yu added a comment - Putting patch on review board helps. + * @param opList: list of Op to be executed as one trx. 'trx' -> 'transaction' + if (opList == null || opList.size() ==0) Space between if and (. + } catch (InterruptedException ie) { + LOG.warn( "multi call interrupted; process failed!" + ie); Restore interrupt status for the thread (same for doMultiAndWatch). Space between } and catch. + LOG.warn( "multi call failed! One of the passed ops has failed which result in the rolled back." ); Line length beyond 100. + * @ return + */ + public SortedMap< String , SortedSet< String >> copyDeadRSLogsWithMulti( + String deadRSZnode) { javadoc for the return value. + LOG.warn( "This is us! Skipping the processing as we might be closing down." ); Add deadRSZnodePath to the log. + RetryCounterFactory retryCounterFactory = new RetryCounterFactory( Integer .MAX_VALUE, 3 * 1000); I don't think MAX_VALUE is a good choice. + SortedSet< String > logQueue = new TreeSet< String >(); Why is logQueue backed by a TreeSet ? + LOG.warn( "KeeperException occurred in multi; " + + "seems some other regionserver took the logs before us." ); Add ke to the above message. + Op deleteOpForLog = Op.delete(zNodeForCurrentLog, -1); + znodesToWatch.add(logZnode); + opsList.add(createOpForLog); + opsList.add(deleteOpForLog); Please reorder the above calls so that znodesToWatch.add() is after opsList.add() calls. This would make code more readable.
          Hide
          Ted Yu added a comment -

          Suppose there are (relatively) large number of Op's in opsList, the chance of collision between active region servers is high.

          Some experiments should be performed so that we get idea of how long this procedure takes to succeed.

          Show
          Ted Yu added a comment - Suppose there are (relatively) large number of Op's in opsList, the chance of collision between active region servers is high. Some experiments should be performed so that we get idea of how long this procedure takes to succeed.
          Hide
          Himanshu Vashishtha added a comment -

          Thanks for the review Ted. I will upload a modified version on the rb. My initial idea of putting it here was to get some feedback on the approach.

          Yes, it is zk intensive as all other regionservers are competing to do the transaction. But, as soon as one is successful (the first one who create the list and issues the multi command), other regionservers which haven't had a chance to do a listChildern call on the dead regionserver znode will not see anything; and for other regionservers which have created the Ops, the very first Op will fail as the znode has already moved. Zookeeper#multi op is fail fast, it rolls back the transaction on first failure without retrying remaining Ops. I tested it on a 3 RS cluster with average load being 12-14 logs, and it usually is done within seconds after the regionserver failure is noticed. What sort of experiments you are thinking about.
          On an another note, TestReplication passes.

          Show
          Himanshu Vashishtha added a comment - Thanks for the review Ted. I will upload a modified version on the rb. My initial idea of putting it here was to get some feedback on the approach. Yes, it is zk intensive as all other regionservers are competing to do the transaction. But, as soon as one is successful (the first one who create the list and issues the multi command), other regionservers which haven't had a chance to do a listChildern call on the dead regionserver znode will not see anything; and for other regionservers which have created the Ops, the very first Op will fail as the znode has already moved. Zookeeper#multi op is fail fast, it rolls back the transaction on first failure without retrying remaining Ops. I tested it on a 3 RS cluster with average load being 12-14 logs, and it usually is done within seconds after the regionserver failure is noticed. What sort of experiments you are thinking about. On an another note, TestReplication passes.
          Hide
          Ted Yu added a comment -

          average load being 12-14 logs

          Can you make the above 10x ?

          Another consideration is when (which major release) zookeeper 3.4 would be listed as minimum requirement.
          There hasn't been consensus so far.

          Here're all the replication-related tests:

          src/test/java/org/apache/hadoop/hbase/client/replication/TestReplicationAdmin.java
          src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSink.java
          src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java
          src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java
          src/test/java/org/apache/hadoop/hbase/replication/TestReplicationDeleteTypes.java
          src/test/java/org/apache/hadoop/hbase/replication/TestReplicationPeer.java
          src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSource.java
          
          Show
          Ted Yu added a comment - average load being 12-14 logs Can you make the above 10x ? Another consideration is when (which major release) zookeeper 3.4 would be listed as minimum requirement. There hasn't been consensus so far. Here're all the replication-related tests: src/test/java/org/apache/hadoop/hbase/client/replication/TestReplicationAdmin.java src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSink.java src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java src/test/java/org/apache/hadoop/hbase/replication/TestReplicationDeleteTypes.java src/test/java/org/apache/hadoop/hbase/replication/TestReplicationPeer.java src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSource.java
          Hide
          Himanshu Vashishtha added a comment -

          zookeeper 3.4 is there in 0.92+? What do you mean by minimum requirement? Please explain.

          I find the related test, queuefailover, in TestReplication. Good to know about other test classes.

          Show
          Himanshu Vashishtha added a comment - zookeeper 3.4 is there in 0.92+? What do you mean by minimum requirement? Please explain. I find the related test, queuefailover, in TestReplication. Good to know about other test classes.
          Hide
          Ted Yu added a comment -

          The 3.4 is only for zookeeper client.
          Companies (such as StumbleUpon) run 3.3.x in production which doesn't support multi().

          Show
          Ted Yu added a comment - The 3.4 is only for zookeeper client. Companies (such as StumbleUpon) run 3.3.x in production which doesn't support multi().
          Hide
          Jesse Yates added a comment -

          3.4 is currently only required for security and further, is not yet a stable release of ZK. That said, if it does become stable its likely to be adopted given that its been pretty solid for many people.

          Show
          Jesse Yates added a comment - 3.4 is currently only required for security and further, is not yet a stable release of ZK. That said, if it does become stable its likely to be adopted given that its been pretty solid for many people.
          Hide
          Himanshu Vashishtha added a comment -

          3.4 is only for zookeeper client.

          I find this a bit confusing. Why is it so? What do we gain by this?

          @Jesse: TM is using secure hbase in their production (if i am not wrong). So, 3.4 seems pretty reasonable choice. Has there been any discussion on this. I would like to know more context on this.

          Show
          Himanshu Vashishtha added a comment - 3.4 is only for zookeeper client. I find this a bit confusing. Why is it so? What do we gain by this? @Jesse: TM is using secure hbase in their production (if i am not wrong). So, 3.4 seems pretty reasonable choice. Has there been any discussion on this. I would like to know more context on this.
          Hide
          Jesse Yates added a comment -

          @Himanshu here is the thread I started on this on dev@ little while ago: http://search-hadoop.com/m/u2D7j1yRpi72

          It basically comes down to the fact that it would be irresponsible to do a release of HBase that requires an unstable dependency. Yeah, TM has it in production, but that doesn't mean their usage is representative of everyone's. If the ZK fellas decide that 3.4 is a stable release, then I'm all for making it the requirement in 0.96, but until the guys who write the software feel like its stable, I don't think we are qualified to say it is stable.

          I do think its weird that we make 3.4 a dependency, but it really would be too weird (and honestly a waste of effort) to support two versions of the protocol, especially considering the trickiness of dealing with ZK clusters that may be in the process of upgrade, etc.

          Show
          Jesse Yates added a comment - @Himanshu here is the thread I started on this on dev@ little while ago: http://search-hadoop.com/m/u2D7j1yRpi72 It basically comes down to the fact that it would be irresponsible to do a release of HBase that requires an unstable dependency. Yeah, TM has it in production, but that doesn't mean their usage is representative of everyone's . If the ZK fellas decide that 3.4 is a stable release, then I'm all for making it the requirement in 0.96, but until the guys who write the software feel like its stable, I don't think we are qualified to say it is stable. I do think its weird that we make 3.4 a dependency, but it really would be too weird (and honestly a waste of effort) to support two versions of the protocol, especially considering the trickiness of dealing with ZK clusters that may be in the process of upgrade, etc.
          Hide
          Lars Hofhansl added a comment -

          Also see patch and discussion in HBASE-6695

          Show
          Lars Hofhansl added a comment - Also see patch and discussion in HBASE-6695
          Hide
          Lars Hofhansl added a comment -

          I think we should really try to fix this for 0.94.

          Show
          Lars Hofhansl added a comment - I think we should really try to fix this for 0.94.
          Hide
          Lars Hofhansl added a comment - - edited

          Chris Trezzo Wanna work on this?
          Have a look at the discussion in HBASE-6695.
          If we accept a bit of herding initially we can fix this with a simple change (I think)... And ZK herding is still better than data loss.

          Show
          Lars Hofhansl added a comment - - edited Chris Trezzo Wanna work on this? Have a look at the discussion in HBASE-6695 . If we accept a bit of herding initially we can fix this with a simple change (I think)... And ZK herding is still better than data loss.
          Hide
          Lars Hofhansl added a comment -

          Alas, looks like we won't get to this... again.

          Show
          Lars Hofhansl added a comment - Alas, looks like we won't get to this... again.
          Hide
          Lars Hofhansl added a comment -

          Moving out again

          Show
          Lars Hofhansl added a comment - Moving out again
          Hide
          Himanshu Vashishtha added a comment -

          Working on it; will provide a patch soon.

          Show
          Himanshu Vashishtha added a comment - Working on it; will provide a patch soon.
          Hide
          Himanshu Vashishtha added a comment -

          Patch that provides an alternative way to copy the znodes in an atomic way, using Zookeeper multi.
          It is configurable using "hbase.zookeeper.useMulti" property. It does a 'ls' on the znode and creates Operations to do the "move" (Create new and delete old) znodes. I tested it on a 3 node cluster and killed the server that had 200 log znodes. The other two regionserver competed and one took away the all the znodes. Ran TestReplication#queueFailover on a jenkins it passed.

          Show
          Himanshu Vashishtha added a comment - Patch that provides an alternative way to copy the znodes in an atomic way, using Zookeeper multi. It is configurable using "hbase.zookeeper.useMulti" property. It does a 'ls' on the znode and creates Operations to do the "move" (Create new and delete old) znodes. I tested it on a 3 node cluster and killed the server that had 200 log znodes. The other two regionserver competed and one took away the all the znodes. Ran TestReplication#queueFailover on a jenkins it passed.
          Hide
          Lars Hofhansl added a comment -

          So that call to multi better not fail, ever. Otherwise we'll still lose track of data to be replicated.
          There two problem currently:

          1. Transfer of queues is only attempted once
          2. Queues may be partially transferred

          This patch addresses the only #2.

          Show
          Lars Hofhansl added a comment - So that call to multi better not fail, ever. Otherwise we'll still lose track of data to be replicated. There two problem currently: Transfer of queues is only attempted once Queues may be partially transferred This patch addresses the only #2.
          Hide
          Ted Yu added a comment -
          +   * @param znode
          +   * @return
          

          Please finish javadoc.
          The key of SortedMap is peer cluster Id, right ?

          +      LOG.warn("Got exception in copyQueuesFromRSUsingMulti: " + e);
          

          If you use comma in place of +, you would get method names.

          There is no empty line in copyQueuesFromRSUsingMulti(). Consider adding empty line to separate sub-steps.

          Show
          Ted Yu added a comment - + * @param znode + * @ return Please finish javadoc. The key of SortedMap is peer cluster Id, right ? + LOG.warn( "Got exception in copyQueuesFromRSUsingMulti: " + e); If you use comma in place of +, you would get method names. There is no empty line in copyQueuesFromRSUsingMulti(). Consider adding empty line to separate sub-steps.
          Hide
          Himanshu Vashishtha added a comment -

          The call to multi is using RecoverableZookeeper#multi, which does a retry in case of

                    case CONNECTIONLOSS:
                    case SESSIONEXPIRED:
                    case OPERATIONTIMEOUT:
          

          which by default, is three. I find this approach better than the existing one.

          Show
          Himanshu Vashishtha added a comment - The call to multi is using RecoverableZookeeper#multi, which does a retry in case of case CONNECTIONLOSS: case SESSIONEXPIRED: case OPERATIONTIMEOUT: which by default, is three. I find this approach better than the existing one.
          Hide
          Chris Trezzo added a comment -

          But the retries in RecoverableZookeeper are not atomic... if the region server fails in the middle of RecoverableZooKeeper.multi, the queues will not get transferred.

          Show
          Chris Trezzo added a comment - But the retries in RecoverableZookeeper are not atomic... if the region server fails in the middle of RecoverableZooKeeper.multi, the queues will not get transferred.
          Hide
          Chris Trezzo added a comment -

          Also, I don't think your manual test described above hits this corner case. You need at least two region server failures for this to happen. For example, region server "A" fails, region server "B" races and wins the failover of A, and then region server B fails before it finishes copying A's queue to it's own queue. Then when someone picks up B, A's original queue will not get completely replicated.

          Thanks for working on this though! It is a tricky one.

          Show
          Chris Trezzo added a comment - Also, I don't think your manual test described above hits this corner case. You need at least two region server failures for this to happen. For example, region server "A" fails, region server "B" races and wins the failover of A, and then region server B fails before it finishes copying A's queue to it's own queue. Then when someone picks up B, A's original queue will not get completely replicated. Thanks for working on this though! It is a tricky one.
          Hide
          Chris Trezzo added a comment -

          Himanshu Vashishtha Hmm I may have miss spoke... atomic was not the right word choice.

          But the retries in RecoverableZookeeper are not atomic... if the region server fails in the middle of RecoverableZooKeeper.multi, the queues will not get transferred.

          I see that as long as a multi hasn't succeeded, all region servers will continue to try and failover the queues. So the problem seems to be more along the lines of if all region servers exhaust their multi retries, then the queues would get lost.

          Is there ever a case in practice where we would run into this and zookeeper is not down?

          Show
          Chris Trezzo added a comment - Himanshu Vashishtha Hmm I may have miss spoke... atomic was not the right word choice. But the retries in RecoverableZookeeper are not atomic... if the region server fails in the middle of RecoverableZooKeeper.multi, the queues will not get transferred. I see that as long as a multi hasn't succeeded, all region servers will continue to try and failover the queues. So the problem seems to be more along the lines of if all region servers exhaust their multi retries, then the queues would get lost. Is there ever a case in practice where we would run into this and zookeeper is not down?
          Hide
          Himanshu Vashishtha added a comment -

          Chris: Thanks for taking a look.

          Is there ever a case in practice where we would run into this and zookeeper is not down?

          Can't think of any.
          Even if that ever happens (let's say all regionservers can't connect to zk or whatever), then, we need something different (possibly beyond the scope of this jira) so any new joining regionserver take a look at existing log znodes, etc.

          Re: Testing:
          Yeah, I know. But, given that it is moved in one transaction, I can't think of how to replicate it in a testing environment. Therefore, I tested to see what happens when two regionservers tries to copy the queue, and whether this approach scales well with number of logs or not.

          Show
          Himanshu Vashishtha added a comment - Chris: Thanks for taking a look. Is there ever a case in practice where we would run into this and zookeeper is not down? Can't think of any. Even if that ever happens (let's say all regionservers can't connect to zk or whatever), then, we need something different (possibly beyond the scope of this jira) so any new joining regionserver take a look at existing log znodes, etc. Re: Testing: Yeah, I know. But, given that it is moved in one transaction, I can't think of how to replicate it in a testing environment. Therefore, I tested to see what happens when two regionservers tries to copy the queue, and whether this approach scales well with number of logs or not.
          Hide
          Lars Hofhansl added a comment -

          This is definitely an improvement.
          What happens when a region server dies after it copied the queues but before it could finish shipping all the edits?

          Show
          Lars Hofhansl added a comment - This is definitely an improvement. What happens when a region server dies after it copied the queues but before it could finish shipping all the edits?
          Hide
          Himanshu Vashishtha added a comment -

          Lars: Then a regionserver who does the failover will also process the leftover znodes (just like what happens currently).

          Show
          Himanshu Vashishtha added a comment - Lars: Then a regionserver who does the failover will also process the leftover znodes (just like what happens currently).
          Hide
          Lars Hofhansl added a comment -

          Cool... So as long as the multi itself does not fail we're good.

          Show
          Lars Hofhansl added a comment - Cool... So as long as the multi itself does not fail we're good.
          Hide
          Himanshu Vashishtha added a comment -

          Yes. I would ask, though, in what possible circumstances you foresee failure of multi()?

          Show
          Himanshu Vashishtha added a comment - Yes. I would ask, though, in what possible circumstances you foresee failure of multi()?
          Hide
          Himanshu Vashishtha added a comment -

          Lars Hofhansl: I asked about possible failure scenarios because it will be great if they can be worked upon beforehand.

          Show
          Himanshu Vashishtha added a comment - Lars Hofhansl : I asked about possible failure scenarios because it will be great if they can be worked upon beforehand.
          Hide
          Lars Hofhansl added a comment -

          Yeah, I don't know.

          But what can happen is that the region server who wins the race to take over the dead region server's queues could die before it even manages to call multi. In the case - since the ephemeral znode is only removed once - we won't ever retry to move that region server's queues again. Right?
          So another part of the puzzle is to have a way to retry the takeover later. Back in the comments here there are various suggestions about how to do that mostly centering around having all surviving RSs try to move a dead RS's queues.

          Show
          Lars Hofhansl added a comment - Yeah, I don't know. But what can happen is that the region server who wins the race to take over the dead region server's queues could die before it even manages to call multi. In the case - since the ephemeral znode is only removed once - we won't ever retry to move that region server's queues again. Right? So another part of the puzzle is to have a way to retry the takeover later. Back in the comments here there are various suggestions about how to do that mostly centering around having all surviving RSs try to move a dead RS's queues.
          Hide
          Himanshu Vashishtha added a comment -

          But what can happen is that the region server who wins the race to take over the dead region server's queues could die before it even manages to call multi.

          Not following your question. How can a regionserver wins a race before calling multi? If regionserver "A" fails, all regionserver will call multi to do the failover, and only one (let's say "B") will succeed. Now, if B also dies meanwhile (while it has succeeded in transferring the queue from zk perspective), the regionserver doing the failover for B will also process A's znodes (as they are with B now). Therefore, I don't see we really need a retry. Did I miss anything?

          Show
          Himanshu Vashishtha added a comment - But what can happen is that the region server who wins the race to take over the dead region server's queues could die before it even manages to call multi. Not following your question. How can a regionserver wins a race before calling multi? If regionserver "A" fails, all regionserver will call multi to do the failover, and only one (let's say "B") will succeed. Now, if B also dies meanwhile (while it has succeeded in transferring the queue from zk perspective), the regionserver doing the failover for B will also process A's znodes (as they are with B now). Therefore, I don't see we really need a retry. Did I miss anything?
          Hide
          Lars Hofhansl added a comment -

          But that is not case (unless I am misunderstanding completely). All RSs race to get the lock to take over the dead RS's queues. Once there is a winner, that RS will move the queues. So if the winning RS dies after it learn that it is the winner but before it move the queues those queues are lost.

          What you describe is one way to solve the problem: All RSs simply try to move the queues. That would work, but would lead to the herding effect (which I think is acceptable).

          Show
          Lars Hofhansl added a comment - But that is not case (unless I am misunderstanding completely). All RSs race to get the lock to take over the dead RS's queues. Once there is a winner, that RS will move the queues. So if the winning RS dies after it learn that it is the winner but before it move the queues those queues are lost. What you describe is one way to solve the problem: All RSs simply try to move the queues. That would work, but would lead to the herding effect (which I think is acceptable).
          Hide
          Himanshu Vashishtha added a comment -

          Yes, your description is totally correct. So, you okay with the approach, Lars?

          Show
          Himanshu Vashishtha added a comment - Yes, your description is totally correct. So, you okay with the approach, Lars?
          Hide
          Lars Hofhansl added a comment -

          This change is good (so +1), but it does not fix the whole problem (you're not having all RSs attempt the queue failover).
          Maybe we do your patch in a subtask and leave this issue open.

          Show
          Lars Hofhansl added a comment - This change is good (so +1), but it does not fix the whole problem (you're not having all RSs attempt the queue failover). Maybe we do your patch in a subtask and leave this issue open.
          Hide
          Lars Hofhansl added a comment -

          Or did you mean whether I'm OK with all RSs attempting to move the queues? I'm happy with that too. I think Jean-Daniel Cryans voiced some concerns over the incurred herding effect.

          Show
          Lars Hofhansl added a comment - Or did you mean whether I'm OK with all RSs attempting to move the queues? I'm happy with that too. I think Jean-Daniel Cryans voiced some concerns over the incurred herding effect.
          Hide
          Lars Hofhansl added a comment - - edited

          Specifically, check out ReplicatoinSourceManager.NodeFailoverWorker.run().
          First all surviving RSs race to obtain the lock:

                if (!zkHelper.lockOtherRS(rsZnode)) {
                  return;
                }
          

          Only one RS will continue to move the failed RS's queues.

          I think what we could do is this:
          If multi is supported we just have all surviving RSs attempt to move the queues (don't bother with the lock step). If multi is as atomic as advertised that should work and only one of the RS will succeed to move the queues atomically, but all will try.
          It seems like that should work.

          Show
          Lars Hofhansl added a comment - - edited Specifically, check out ReplicatoinSourceManager.NodeFailoverWorker.run(). First all surviving RSs race to obtain the lock: if (!zkHelper.lockOtherRS(rsZnode)) { return ; } Only one RS will continue to move the failed RS's queues. I think what we could do is this: If multi is supported we just have all surviving RSs attempt to move the queues (don't bother with the lock step). If multi is as atomic as advertised that should work and only one of the RS will succeed to move the queues atomically, but all will try. It seems like that should work.
          Hide
          Himanshu Vashishtha added a comment -

          Lars Hofhansl: Yes, I followed the same approach in the attached patch.

          Show
          Himanshu Vashishtha added a comment - Lars Hofhansl : Yes, I followed the same approach in the attached patch.
          Hide
          Lars Hofhansl added a comment -

          Hmm... Yes, you did. Sorry, somehow missed it when I looked at it first.
          Cool then, we came to the same conclusion. Just took me much longer to get to it

          +1 on patch, it should indeed fix this problem.

          Show
          Lars Hofhansl added a comment - Hmm... Yes, you did. Sorry, somehow missed it when I looked at it first. Cool then, we came to the same conclusion. Just took me much longer to get to it +1 on patch, it should indeed fix this problem.
          Hide
          Ted Yu added a comment -

          Patch v3 fills javadoc for copyQueuesFromRSUsingMulti().

          Show
          Ted Yu added a comment - Patch v3 fills javadoc for copyQueuesFromRSUsingMulti().
          Hide
          Himanshu Vashishtha added a comment -

          Thanks for the review Lars , and Ted for updating the patch.

          Show
          Himanshu Vashishtha added a comment - Thanks for the review Lars , and Ted for updating the patch.
          Hide
          Ted Yu added a comment -

          The trunk patch depends on HBASE-7382

          @Himanshu:
          Can you run the tests listed @ 28/Jun/12 04:07 ?

          Show
          Ted Yu added a comment - The trunk patch depends on HBASE-7382 @Himanshu: Can you run the tests listed @ 28/Jun/12 04:07 ?
          Hide
          Lars Hofhansl added a comment -

          Let's make critical so it gets in.

          Show
          Lars Hofhansl added a comment - Let's make critical so it gets in.
          Hide
          Ted Yu added a comment -
          p0 2611-upstream-v1.patch
          patching file hbase-server/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
          Hunk #1 succeeded at 25 (offset -1 lines).
          Hunk #2 FAILED at 41.
          Hunk #3 succeeded at 858 (offset 131 lines).
          1 out of 3 hunks FAILED -- saving rejects to file hbase-server/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java.rej
          patching file hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
          Hunk #1 succeeded at 579 (offset 19 lines).
          patching file hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java
          Reversed (or previously applied) patch detected!  Assume -R? [n] ^C
          

          @Himanshu:
          Can you update the upstream patch ?

          Thanks

          Show
          Ted Yu added a comment - p0 2611-upstream-v1.patch patching file hbase-server/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java Hunk #1 succeeded at 25 (offset -1 lines). Hunk #2 FAILED at 41. Hunk #3 succeeded at 858 (offset 131 lines). 1 out of 3 hunks FAILED -- saving rejects to file hbase-server/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java.rej patching file hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java Hunk #1 succeeded at 579 (offset 19 lines). patching file hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java Reversed (or previously applied) patch detected! Assume -R? [n] ^C @Himanshu: Can you update the upstream patch ? Thanks
          Hide
          Himanshu Vashishtha added a comment -

          updated trunk patch

          Show
          Himanshu Vashishtha added a comment - updated trunk patch
          Hide
          Lars Hofhansl added a comment -

          Himanshu, you are officially my hero now. We've been discussing this for over a year, and it looks like we're finally fixing it.

          Show
          Lars Hofhansl added a comment - Himanshu, you are officially my hero now. We've been discussing this for over a year, and it looks like we're finally fixing it.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12566463/HBASE-2611-trunk-v2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4177//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4177//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4177//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4177//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4177//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4177//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4177//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4177//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12566463/HBASE-2611-trunk-v2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. -1 javadoc . The javadoc tool appears to have generated 1 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4177//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4177//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4177//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4177//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4177//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4177//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4177//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4177//console This message is automatically generated.
          Hide
          Ted Yu added a comment -

          Patch v3 fixes the javadoc warning:

          +   * @return map of peer cluster to log queues 
          +   */
          +  public SortedMap<String, SortedSet<String>> copyQueuesFromRSUsingMulti(String znode) {
          
          Show
          Ted Yu added a comment - Patch v3 fixes the javadoc warning: + * @ return map of peer cluster to log queues + */ + public SortedMap< String , SortedSet< String >> copyQueuesFromRSUsingMulti( String znode) {
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12566510/2611-trunk-v3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4181//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4181//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4181//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4181//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4181//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4181//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4181//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4181//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12566510/2611-trunk-v3.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4181//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4181//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4181//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4181//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4181//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4181//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4181//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4181//console This message is automatically generated.
          Hide
          Ted Yu added a comment -

          Will integrated patch v3 later today if there is no further review comment.

          Show
          Ted Yu added a comment - Will integrated patch v3 later today if there is no further review comment.
          Hide
          Ted Yu added a comment -

          Jean-Daniel Cryans:
          It would be nice if you take a look at Himanshu's patch.

          Show
          Ted Yu added a comment - Jean-Daniel Cryans : It would be nice if you take a look at Himanshu's patch.
          Hide
          Lars Hofhansl added a comment -
          Show
          Lars Hofhansl added a comment - Jean-Daniel Cryans ?
          Hide
          Lars Hofhansl added a comment -

          Ted Yu Let's commit this. +1 from me.

          Show
          Lars Hofhansl added a comment - Ted Yu Let's commit this. +1 from me.
          Hide
          Jean-Daniel Cryans added a comment -

          Some comments:

          LOG.info("Moving " + rsZnode + "'s hlogs to my queue");

          This could be changed to say whether it's going to be done atomically or not.

          LOG.debug(" The multi list is: " + listOfOps + ", size: " + listOfOps.size());

          This is going to print a lot of object references... not sure how useful this is. Maybe just keep the size?

          LOG.info("Atomically moved the dead regionserver logs. ");

          With my first comment this becomes redundant and somewhere else it will say when the move is done anyway.

          LOG.warn("Got exception in copyQueuesFromRSUsingMulti: " + e);

          Put the "e" in the second paramater instead of appending it to the string.

          Show
          Jean-Daniel Cryans added a comment - Some comments: LOG.info("Moving " + rsZnode + "'s hlogs to my queue"); This could be changed to say whether it's going to be done atomically or not. LOG.debug(" The multi list is: " + listOfOps + ", size: " + listOfOps.size()); This is going to print a lot of object references... not sure how useful this is. Maybe just keep the size? LOG.info("Atomically moved the dead regionserver logs. "); With my first comment this becomes redundant and somewhere else it will say when the move is done anyway. LOG.warn("Got exception in copyQueuesFromRSUsingMulti: " + e); Put the "e" in the second paramater instead of appending it to the string.
          Hide
          Himanshu Vashishtha added a comment -

          incorporating JD's comments; I left one where it prints out after successfully moving the znodes. I think it is helpful as only one regionserver will print this, others will fail.

          Show
          Himanshu Vashishtha added a comment - incorporating JD's comments; I left one where it prints out after successfully moving the znodes. I think it is helpful as only one regionserver will print this, others will fail.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12566875/HBASE-2611-trunk-v3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4225//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4225//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4225//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4225//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4225//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4225//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4225//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4225//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12566875/HBASE-2611-trunk-v3.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. -1 javadoc . The javadoc tool appears to have generated 1 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4225//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4225//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4225//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4225//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4225//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4225//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4225//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4225//console This message is automatically generated.
          Hide
          Ted Yu added a comment -

          @Himanshu:
          Mind attaching patch for 0.94 ?

          Show
          Ted Yu added a comment - @Himanshu: Mind attaching patch for 0.94 ?
          Hide
          Ted Yu added a comment -

          Patch v4 fixes javadoc warning w.r.t. empty @return.

          Show
          Ted Yu added a comment - Patch v4 fixes javadoc warning w.r.t. empty @return.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12566899/2611-trunk-v4.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4228//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4228//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4228//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4228//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4228//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4228//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4228//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4228//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12566899/2611-trunk-v4.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4228//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4228//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4228//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4228//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4228//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4228//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4228//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4228//console This message is automatically generated.
          Hide
          Ted Yu added a comment -

          Patch v4 integrated to trunk.

          Thanks for the patch, Himanshu.

          Thanks for the reviews, Lars and J-D.

          Show
          Ted Yu added a comment - Patch v4 integrated to trunk. Thanks for the patch, Himanshu. Thanks for the reviews, Lars and J-D.
          Hide
          Lars Hofhansl added a comment -

          0.94 patch. Passes TestReplication locally.

          Show
          Lars Hofhansl added a comment - 0.94 patch. Passes TestReplication locally.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12566908/2611-0.94.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4229//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12566908/2611-0.94.txt against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4229//console This message is automatically generated.
          Hide
          Lars Hofhansl added a comment -

          Going to commit the 0.94 version tomorrow, unless I hear objections.

          Show
          Lars Hofhansl added a comment - Going to commit the 0.94 version tomorrow, unless I hear objections.
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #3820 (See https://builds.apache.org/job/HBase-TRUNK/3820/)
          HBASE-2611 Handle RS that fails while processing the failure of another one (Himanshu) (Revision 1439744)

          Result = FAILURE
          tedyu :
          Files :

          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #3820 (See https://builds.apache.org/job/HBase-TRUNK/3820/ ) HBASE-2611 Handle RS that fails while processing the failure of another one (Himanshu) (Revision 1439744) Result = FAILURE tedyu : Files : /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #382 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/382/)
          HBASE-2611 Handle RS that fails while processing the failure of another one (Himanshu) (Revision 1439744)

          Result = FAILURE
          tedyu :
          Files :

          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #382 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/382/ ) HBASE-2611 Handle RS that fails while processing the failure of another one (Himanshu) (Revision 1439744) Result = FAILURE tedyu : Files : /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
          Hide
          Lars Hofhansl added a comment -

          Committed to 0.94... Yeah!

          Show
          Lars Hofhansl added a comment - Committed to 0.94... Yeah!
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94 #800 (See https://builds.apache.org/job/HBase-0.94/800/)
          HBASE-2611 Handle RS that fails while processing the failure of another one (Himanshu Vashishtha) (Revision 1440054)

          Result = SUCCESS
          larsh :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
          Show
          Hudson added a comment - Integrated in HBase-0.94 #800 (See https://builds.apache.org/job/HBase-0.94/800/ ) HBASE-2611 Handle RS that fails while processing the failure of another one (Himanshu Vashishtha) (Revision 1440054) Result = SUCCESS larsh : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94-security #102 (See https://builds.apache.org/job/HBase-0.94-security/102/)
          HBASE-2611 Handle RS that fails while processing the failure of another one (Himanshu Vashishtha) (Revision 1440054)

          Result = SUCCESS
          larsh :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
          Show
          Hudson added a comment - Integrated in HBase-0.94-security #102 (See https://builds.apache.org/job/HBase-0.94-security/102/ ) HBASE-2611 Handle RS that fails while processing the failure of another one (Himanshu Vashishtha) (Revision 1440054) Result = SUCCESS larsh : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94-security-on-Hadoop-23 #11 (See https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/11/)
          HBASE-2611 Handle RS that fails while processing the failure of another one (Himanshu Vashishtha) (Revision 1440054)

          Result = FAILURE
          larsh :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
          Show
          Hudson added a comment - Integrated in HBase-0.94-security-on-Hadoop-23 #11 (See https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/11/ ) HBASE-2611 Handle RS that fails while processing the failure of another one (Himanshu Vashishtha) (Revision 1440054) Result = FAILURE larsh : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java

            People

            • Assignee:
              Himanshu Vashishtha
              Reporter:
              Jean-Daniel Cryans
            • Votes:
              0 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development