HBase
  1. HBase
  2. HBASE-6109

Improve RIT performances during assignment on large clusters

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.95.2
    • Fix Version/s: 0.95.0
    • Component/s: master
    • Labels:
      None

      Description

      The main points in this patch are:

      • lowering the number of copy of the RIT list
      • lowering the number of synchronization
      • synchronizing on a region rather than on everything

      It also contains:

      • some fixes around the RIT notification: the list was sometimes modified without a corresponding 'notify'.
      • some tests flakiness correction, actually unrelated to this patch.
      1. 6109.v7.patch
        73 kB
        Nicolas Liochon
      2. 6109.v24.patch
        61 kB
        Nicolas Liochon
      3. 6109.v23.patch
        62 kB
        Nicolas Liochon
      4. 6109.v21.patch
        59 kB
        Nicolas Liochon
      5. 6109.v19.patch
        59 kB
        Nicolas Liochon

        Issue Links

          Activity

          Hide
          Ted Yu added a comment -

          Looking forward to your patch, N.

          Show
          Ted Yu added a comment - Looking forward to your patch, N.
          Hide
          Nicolas Liochon added a comment -

          Here it is. I haven't merged it with trunk, as I don't know yet the impact of the modules and I expect many commits the next few days .

          Show
          Nicolas Liochon added a comment - Here it is. I haven't merged it with trunk, as I don't know yet the impact of the modules and I expect many commits the next few days .
          Hide
          stack added a comment -

          @nkeywal Trunk should have settled now. Can you redo your patch so its against the hbase root dir?

          +  final private Locker locker = new Locker();

          Is this a generic locker? Should it be named for what its locking?

          NotifiableConcurrentSkipListMap needs class comment. It seems like its for use in a very particular circumstance. It needs explaining.

          Does it need to be public? Only used in master package? Perhaps make it package private then?

          internalList is a bad name for the internal delegate instance. Is 'delegatee' a better name than internalList?

          For sure this is ok?

          +    while (!this.master.isStopped() &&
          +      // no lock concurrent access ok: sequentially consistent
          +      this.regionsInTransition.containsKey(hri.getEncodedName())) {
          +      this.regionsInTransition.waitForListUpdate();
               }
          

          We checked rit contains a name but then in a separate statement we do the waitForListUpdate? What if the region we are looking for is removed between the check and the waitForListUpdate invocation?

          Will this log be annoying?

          +      LOG.info("regionState=" + regionState +
          +        " failoverProcessedRegions.containsKey(encodedRegionName)=" + failoverProcessedRegions.containsKey(encodedRegionName));
          

          This too... '+ LOG.info("et=" + et);'?

          .. and this '+ LOG.info("regionInfo.isMetaTable()=" + regionInfo.isMetaTable());'?

          Add the region removed to the log message here? + LOG.debug("Removed region from reopening regions because it was closed");?

          Sometimes your indents are off. For example:

          -    synchronized (regionsInTransition) {
          +      // We need a lock here as we're going to do a put later and we don't want multiple state
          +      //  creation
          +    Reentran....
          

          There are gratuitious reformattings of code.

          Is this true:

          +      // no lock concurrency ok: there is a write when we update the timestamp but it's ok
          +      //  as its the only one updating this field
          +      RegionState rs = this.regionsInTransition.get(e.getKey());
          

          How is it enforced?

          Check these...

          
          +    }finally {
          
          
          
           or here
          
          
          +      }else {
          
          

          needs space after curly parens. Sometimes you do it and sometimes you don't.

          I reviewed half of the patch.

          It looks great. Nice stuff N.

          Show
          stack added a comment - @nkeywal Trunk should have settled now. Can you redo your patch so its against the hbase root dir? + final private Locker locker = new Locker(); Is this a generic locker? Should it be named for what its locking? NotifiableConcurrentSkipListMap needs class comment. It seems like its for use in a very particular circumstance. It needs explaining. Does it need to be public? Only used in master package? Perhaps make it package private then? internalList is a bad name for the internal delegate instance. Is 'delegatee' a better name than internalList? For sure this is ok? + while (! this .master.isStopped() && + // no lock concurrent access ok: sequentially consistent + this .regionsInTransition.containsKey(hri.getEncodedName())) { + this .regionsInTransition.waitForListUpdate(); } We checked rit contains a name but then in a separate statement we do the waitForListUpdate? What if the region we are looking for is removed between the check and the waitForListUpdate invocation? Will this log be annoying? + LOG.info( "regionState=" + regionState + + " failoverProcessedRegions.containsKey(encodedRegionName)=" + failoverProcessedRegions.containsKey(encodedRegionName)); This too... '+ LOG.info("et=" + et);'? .. and this '+ LOG.info("regionInfo.isMetaTable()=" + regionInfo.isMetaTable());'? Add the region removed to the log message here? + LOG.debug("Removed region from reopening regions because it was closed");? Sometimes your indents are off. For example: - synchronized (regionsInTransition) { + // We need a lock here as we're going to do a put later and we don't want multiple state + // creation + Reentran.... There are gratuitious reformattings of code. Is this true: + // no lock concurrency ok: there is a write when we update the timestamp but it's ok + // as its the only one updating this field + RegionState rs = this .regionsInTransition.get(e.getKey()); How is it enforced? Check these... + } finally { or here + } else { needs space after curly parens. Sometimes you do it and sometimes you don't. I reviewed half of the patch. It looks great. Nice stuff N.
          Hide
          Ted Yu added a comment -

          It would be nice to have a test for NotifiableConcurrentSkipListMap.

          +  public void waitListUpdate(long timeout) throws InterruptedException {
          +    synchronized (internalList){
          

          Since internalList is actually a Map, name the above method waitForUpdate() ?

          +  public void clear() {
          +    if (!internalList.isEmpty()) {
          +      synchronized (internalList) {
          

          Is it possible that internalList becomes empty after entering the synchronized block ?

          For Locker,

          + * An utility class to manage a set of lock. Each lock is identified by a String who serves
          

          the above should read 'A utility class to manage a set of locks. Each lock is identified by a String which serves'

          +public class Locker {
          +  private static final Log LOG = LogFactory.getLog(AssignmentManager.class);
          

          It should be Locker.class

          +  private static final int NB_CONCURRENT_LOCK = 1000;
          

          The constant should be named NB_CONCURRENT_LOCKS.

          +   * Return a lock for the given key. The lock is already lockek.
          

          The last word should be locked.

          +      String message = "Can't release the lock for " + key;
          

          It would be nice to add more about reason.

          -    synchronized (regionsInTransition) {
          -      nodes.removeAll(regionsInTransition.keySet());
          -    }
          +    // no lock concurrent access ok: some threads may be adding/removing items but its java-valid
          +    nodes.removeAll(regionsInTransition.keySet());
          

          Looking at batchRemove() of http://www.docjar.com/html/api/java/util/ArrayList.java.html around line 669, I don't see synchronization.
          Meaning, existence check of elements from nodes in regionsInTransition.keySet() may not be deterministic.

          Show
          Ted Yu added a comment - It would be nice to have a test for NotifiableConcurrentSkipListMap. + public void waitListUpdate( long timeout) throws InterruptedException { + synchronized (internalList){ Since internalList is actually a Map, name the above method waitForUpdate() ? + public void clear() { + if (!internalList.isEmpty()) { + synchronized (internalList) { Is it possible that internalList becomes empty after entering the synchronized block ? For Locker, + * An utility class to manage a set of lock. Each lock is identified by a String who serves the above should read 'A utility class to manage a set of locks. Each lock is identified by a String which serves' + public class Locker { + private static final Log LOG = LogFactory.getLog(AssignmentManager.class); It should be Locker.class + private static final int NB_CONCURRENT_LOCK = 1000; The constant should be named NB_CONCURRENT_LOCKS. + * Return a lock for the given key. The lock is already lockek. The last word should be locked. + String message = "Can't release the lock for " + key; It would be nice to add more about reason. - synchronized (regionsInTransition) { - nodes.removeAll(regionsInTransition.keySet()); - } + // no lock concurrent access ok: some threads may be adding/removing items but its java-valid + nodes.removeAll(regionsInTransition.keySet()); Looking at batchRemove() of http://www.docjar.com/html/api/java/util/ArrayList.java.html around line 669, I don't see synchronization. Meaning, existence check of elements from nodes in regionsInTransition.keySet() may not be deterministic.
          Hide
          Nicolas Liochon added a comment -

          @stack

          Is this a generic locker? Should it be named for what its locking?

          Renamed to LockerByString. If you have a better name...

          NotifiableConcurrentSkipListMap needs class comment. It seems like its for use in a very particular circumstance. It needs explaining.

          done.

          Does it need to be public? Only used in master package? Perhaps make it package private then?

          The issue was:

            public NotifiableConcurrentSkipListMap<String, RegionState> getRegionsInTransition() {
              return regionsInTransition;
            }
          

          But it's used in tests only, so I can actually make both package protected. Done.

          internalList is a bad name for the internal delegate instance. Is 'delegatee' a better name than internalList?

          done.

          We checked rit contains a name but then in a separate statement we do the waitForListUpdate? What if the region we are looking for is removed between the check and the waitForListUpdate invocation?

          Actually yes, it could happen. I added a timeout, so we will now check every 100ms.

          Will this log be annoying?

          Removed. I added them while debugging.

          This one was already there however. I kept it.

            public void removeClosedRegion(HRegionInfo hri) {
              if (regionsToReopen.remove(hri.getEncodedName()) != null) {
                LOG.debug("Removed region from reopening regions because it was closed");
              }
            }
          

          Is this true / How is it enforced?

          Oops, it not enforced (I don't know I could do it), but it's also not true: the update will set it as well. But it's not an issue as it's an atomic long. Comment updated.
          It's btw tempting to:

          • change the implementation of updateTimestampToNow to use a lazySet
          • get the timestamp only once before looping on the region set.

          I didn't do it in my patch, but I think it should be done.

          needs space after curly parens. Sometimes you do it and sometimes you don't.

          Done

          > @ted

          It would be nice to have a test for NotifiableConcurrentSkipListMap.

          Will do for final release.

          Since internalList is actually a Map, name the above method waitForUpdate() ?

          Done.

          the above should read 'A utility class to manage a set of locks. Each lock is identified by a String which serves'

          Done

          It should be Locker.class

          Done

          The constant should be named NB_CONCURRENT_LOCKS.

          Done

          The last word should be locked.

          Done

          It would be nice to add more about reason.

          Done.

          Looking at batchRemove() of http://www.docjar.com/html/api/java/util/ArrayList.java.html around line 669, I don't see synchronization. Meaning, existence check of elements from nodes in regionsInTransition.keySet() may not be deterministic.

          After looking at the java api code, I don't think there is an issue here. The set we're using is documented as: "The view's iterator is a "weakly consistent" iterator that will never throw ConcurrentModificationException, and guarantees to traverse elements as they existed upon construction of the iterator, and may (but is not guaranteed to) reflect any modifications subsequent to construction.". So we won't have any java error. Then, if an element is added/removed to/from the RIT while we're doing the removeAll, it may be added/removed or not, but we're not less deterministic that we would be by adding a lock around the removeAll: the add/remove could be as well be done just before/after we take the lock, and we would not know it.

          I'm currently checking how it works with split, then I will update it to the current trunk.

          Show
          Nicolas Liochon added a comment - @stack Is this a generic locker? Should it be named for what its locking? Renamed to LockerByString. If you have a better name... NotifiableConcurrentSkipListMap needs class comment. It seems like its for use in a very particular circumstance. It needs explaining. done. Does it need to be public? Only used in master package? Perhaps make it package private then? The issue was: public NotifiableConcurrentSkipListMap<String, RegionState> getRegionsInTransition() { return regionsInTransition; } But it's used in tests only, so I can actually make both package protected. Done. internalList is a bad name for the internal delegate instance. Is 'delegatee' a better name than internalList? done. We checked rit contains a name but then in a separate statement we do the waitForListUpdate? What if the region we are looking for is removed between the check and the waitForListUpdate invocation? Actually yes, it could happen. I added a timeout, so we will now check every 100ms. Will this log be annoying? Removed. I added them while debugging. This one was already there however. I kept it. public void removeClosedRegion(HRegionInfo hri) { if (regionsToReopen.remove(hri.getEncodedName()) != null) { LOG.debug("Removed region from reopening regions because it was closed"); } } Is this true / How is it enforced? Oops, it not enforced (I don't know I could do it), but it's also not true: the update will set it as well. But it's not an issue as it's an atomic long. Comment updated. It's btw tempting to: change the implementation of updateTimestampToNow to use a lazySet get the timestamp only once before looping on the region set. I didn't do it in my patch, but I think it should be done. needs space after curly parens. Sometimes you do it and sometimes you don't. Done > @ted It would be nice to have a test for NotifiableConcurrentSkipListMap. Will do for final release. Since internalList is actually a Map, name the above method waitForUpdate() ? Done. the above should read 'A utility class to manage a set of locks. Each lock is identified by a String which serves' Done It should be Locker.class Done The constant should be named NB_CONCURRENT_LOCKS. Done The last word should be locked. Done It would be nice to add more about reason. Done. Looking at batchRemove() of http://www.docjar.com/html/api/java/util/ArrayList.java.html around line 669, I don't see synchronization. Meaning, existence check of elements from nodes in regionsInTransition.keySet() may not be deterministic. After looking at the java api code, I don't think there is an issue here. The set we're using is documented as: "The view's iterator is a "weakly consistent" iterator that will never throw ConcurrentModificationException, and guarantees to traverse elements as they existed upon construction of the iterator, and may (but is not guaranteed to) reflect any modifications subsequent to construction.". So we won't have any java error. Then, if an element is added/removed to/from the RIT while we're doing the removeAll, it may be added/removed or not, but we're not less deterministic that we would be by adding a lock around the removeAll: the add/remove could be as well be done just before/after we take the lock, and we would not know it. I'm currently checking how it works with split, then I will update it to the current trunk.
          Hide
          stack added a comment -

          Renamed to LockerByString. If you have a better name...

          regionLocker?

          Show
          stack added a comment - Renamed to LockerByString. If you have a better name... regionLocker?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12530195/6109.v19.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 26 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2052//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2052//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12530195/6109.v19.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 26 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2052//testReport/ Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2052//console This message is automatically generated.
          Hide
          Nicolas Liochon added a comment -

          I think it's ok for a commit.
          From the code I read, we should have the same behavior as before on split. I will write some parallel tests later on, but I would expect the same behavior as today at least. It may take time as I may encounter some flakiness on this path .
          I don't have a test class for NotifiableConcurrentSkipListMap, this class is small so I don't think it's an issue right now. I will push one with the other tests I will write.

          Show
          Nicolas Liochon added a comment - I think it's ok for a commit. From the code I read, we should have the same behavior as before on split. I will write some parallel tests later on, but I would expect the same behavior as today at least. It may take time as I may encounter some flakiness on this path . I don't have a test class for NotifiableConcurrentSkipListMap, this class is small so I don't think it's an issue right now. I will push one with the other tests I will write.
          Hide
          Ted Yu added a comment -

          Patch v3 rebased on trunk.

          Show
          Ted Yu added a comment - Patch v3 rebased on trunk.
          Hide
          Ted Yu added a comment -

          Rename TestLocker class to TestKeyLocker ?

          +    // It has no reason to be a lock shares with the other operations.
          

          'shares with' -> 'shared with'

          Indentation in AssignmentManager.addToRITandCallClose() was off. It would be nice to correct the existing lines.

          +    // No lock concurrency: adding a share synchronized here would not prevent to have two
          +    //  entries as we don't check if the region is already there. This must be ensured by the
          +    //  method callers. todo nli: check
          

          'share synchronized' -> 'synchronized'. Remove the 'todo nli:' at the end.

          -    Map<String, RegionPlan> plans=new HashMap<String, RegionPlan>();
          +    Map<String, RegionPlan> plans=new HashMap<String, RegionPlan>(regions.size());
          

          Insert spaces around = sign.

          +   * @return True if none of the regions in the set are in transition
          

          'are in' -> 'is in'

          +  public NavigableMap<K, V> copyMap() {
          +    return delegatee.clone();
          

          Why not call the method clone() ?

          +  public void clear() {
          +    if (!delegatee.isEmpty()) {
          +      synchronized (delegatee) {
          

          Suppose delegatee is empty upon entry to the above method, what if an entry is added after the isEmpty() check ?

          Show
          Ted Yu added a comment - Rename TestLocker class to TestKeyLocker ? + // It has no reason to be a lock shares with the other operations. 'shares with' -> 'shared with' Indentation in AssignmentManager.addToRITandCallClose() was off. It would be nice to correct the existing lines. + // No lock concurrency: adding a share synchronized here would not prevent to have two + // entries as we don't check if the region is already there. This must be ensured by the + // method callers. todo nli: check 'share synchronized' -> 'synchronized'. Remove the 'todo nli:' at the end. - Map< String , RegionPlan> plans= new HashMap< String , RegionPlan>(); + Map< String , RegionPlan> plans= new HashMap< String , RegionPlan>(regions.size()); Insert spaces around = sign. + * @ return True if none of the regions in the set are in transition 'are in' -> 'is in' + public NavigableMap<K, V> copyMap() { + return delegatee.clone(); Why not call the method clone() ? + public void clear() { + if (!delegatee.isEmpty()) { + synchronized (delegatee) { Suppose delegatee is empty upon entry to the above method, what if an entry is added after the isEmpty() check ?
          Hide
          Ted Yu added a comment -
          +  // A number of lock we want to easily support. It's not a maximum.
          

          'A number' -> 'The number'

          +  // We need an atomic counter to manage the number of people using the lock and free it when
          +  //  it's equals to zero.
          

          'number of people' -> 'number of users'
          'it's equals to zero.' -> 'it's equal to zero.'

          +  static class RegionLock<K> extends ReentrantLock {
          

          The outer class is generic. The inner class shouldn't mention Region.

          Show
          Ted Yu added a comment - + // A number of lock we want to easily support. It's not a maximum. 'A number' -> 'The number' + // We need an atomic counter to manage the number of people using the lock and free it when + // it's equals to zero. 'number of people' -> 'number of users' 'it's equals to zero.' -> 'it's equal to zero.' + static class RegionLock<K> extends ReentrantLock { The outer class is generic. The inner class shouldn't mention Region.
          Hide
          Nicolas Liochon added a comment -

          Rename TestLocker class to TestKeyLocker ?

          Done.

          'shares with' -> 'shared with'

          Done.

          Indentation in AssignmentManager.addToRITandCallClose() was off. It would be nice to correct the existing lines.

          Done

          'share synchronized' -> 'synchronized'. Remove the 'todo nli:' at the end.

          Done

          Insert spaces around = sign.

          Done.

          'are in' -> 'is in'

          Done

          Why not call the method clone() ?

          We don't really want the NotifiableConcurrentSkipListMap to be cloneable: however, some functions want to work on a copy of the data structure, for reporting or test (with all the 'Map' semantic), hence the internal clone.

          Suppose delegatee is empty upon entry to the above method, what if an entry is added after the isEmpty() check ?

          It will be equivalent to adding it just after the clear.

          'A number' -> 'The number'

          Done.

          'number of people' -> 'number of users'

          Done

          'it's equals to zero.' -> 'it's equal to zero.'

          Done

          The outer class is generic. The inner class shouldn't mention Region.

          Done

          Show
          Nicolas Liochon added a comment - Rename TestLocker class to TestKeyLocker ? Done. 'shares with' -> 'shared with' Done. Indentation in AssignmentManager.addToRITandCallClose() was off. It would be nice to correct the existing lines. Done 'share synchronized' -> 'synchronized'. Remove the 'todo nli:' at the end. Done Insert spaces around = sign. Done. 'are in' -> 'is in' Done Why not call the method clone() ? We don't really want the NotifiableConcurrentSkipListMap to be cloneable: however, some functions want to work on a copy of the data structure, for reporting or test (with all the 'Map' semantic), hence the internal clone. Suppose delegatee is empty upon entry to the above method, what if an entry is added after the isEmpty() check ? It will be equivalent to adding it just after the clear. 'A number' -> 'The number' Done. 'number of people' -> 'number of users' Done 'it's equals to zero.' -> 'it's equal to zero.' Done The outer class is generic. The inner class shouldn't mention Region. Done
          Hide
          Ted Yu added a comment -

          @N:
          Thanks for the quick turn-around.

          Show
          Ted Yu added a comment - @N: Thanks for the quick turn-around.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12530237/6109.v21.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 26 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.TestDrainingServer
          org.apache.hadoop.hbase.master.TestAssignmentManager
          org.apache.hadoop.hbase.security.access.TestZKPermissionsWatcher

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2056//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2056//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12530237/6109.v21.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 26 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.TestDrainingServer org.apache.hadoop.hbase.master.TestAssignmentManager org.apache.hadoop.hbase.security.access.TestZKPermissionsWatcher Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2056//testReport/ Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2056//console This message is automatically generated.
          Hide
          Nicolas Liochon added a comment -

          I need to have a look at this one. org.apache.hadoop.hbase.master.TestAssignmentManager.testRegionInOpeningStateOnDeadRSWhileMasterFailover

          Show
          Nicolas Liochon added a comment - I need to have a look at this one. org.apache.hadoop.hbase.master.TestAssignmentManager.testRegionInOpeningStateOnDeadRSWhileMasterFailover
          Hide
          stack added a comment -

          Thanks N.

          Show
          stack added a comment - Thanks N.
          Hide
          Nicolas Liochon added a comment -

          testRegionInOpeningStateOnDeadRSWhileMasterFailover fails at this line:

            public void testRegionInOpeningStateOnDeadRSWhileMasterFailover() throws IOException,
                KeeperException, ServiceException, InterruptedException {
              AssignmentManagerWithExtrasForTesting am = setUpMockedAssignmentManager(this.server,
                  this.serverManager);
              ZKAssign.createNodeOffline(this.watcher, REGIONINFO, SERVERNAME_A);   <============== FAILED HERE: KeeperErrorCode = NodeExists for /hbase/unassigned/5c7fe078551611acb0923a9ca0e1e1f4
          

          So it's more a test error. This node should be deleted in the after() clause of the previous, for whatever reason it was not or was recreated after the delete. Investigating...

          Show
          Nicolas Liochon added a comment - testRegionInOpeningStateOnDeadRSWhileMasterFailover fails at this line: public void testRegionInOpeningStateOnDeadRSWhileMasterFailover() throws IOException, KeeperException, ServiceException, InterruptedException { AssignmentManagerWithExtrasForTesting am = setUpMockedAssignmentManager(this.server, this.serverManager); ZKAssign.createNodeOffline(this.watcher, REGIONINFO, SERVERNAME_A); <============== FAILED HERE: KeeperErrorCode = NodeExists for /hbase/unassigned/5c7fe078551611acb0923a9ca0e1e1f4 So it's more a test error. This node should be deleted in the after() clause of the previous, for whatever reason it was not or was recreated after the delete. Investigating...
          Hide
          Nicolas Liochon added a comment -

          Hopefully we're done
          v23 contains the fix for the issue above (unrelated to my changes) + the merge with the trunk as of now... I've executed locally 100 times TestAssignmentManager without getting any error.

          Show
          Nicolas Liochon added a comment - Hopefully we're done v23 contains the fix for the issue above (unrelated to my changes) + the merge with the trunk as of now... I've executed locally 100 times TestAssignmentManager without getting any error.
          Hide
          ramkrishna.s.vasudevan added a comment -

          @N
          Minor nits.

          /*
                 this.watcher.close();
          +      try {
          +        Thread.sleep(2000);
          +      } catch (InterruptedException e) {
          +      }  */
          

          Some commented line is there. Just had a glance on the latest patch.

          Show
          ramkrishna.s.vasudevan added a comment - @N Minor nits. /* this .watcher.close(); + try { + Thread .sleep(2000); + } catch (InterruptedException e) { + } */ Some commented line is there. Just had a glance on the latest patch.
          Hide
          Nicolas Liochon added a comment -

          @ram You're right, I forgot to remove my flakiness detector before doing the patch. Ok, I'm good for a v24 then. I will do it after looking at the test results for the v23...

          Show
          Nicolas Liochon added a comment - @ram You're right, I forgot to remove my flakiness detector before doing the patch. Ok, I'm good for a v24 then. I will do it after looking at the test results for the v23...
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12530368/6109.v23.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 26 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.coprocessor.TestRowProcessorEndpoint
          org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort
          org.apache.hadoop.hbase.replication.TestReplication

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2075//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2075//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12530368/6109.v23.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 26 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.coprocessor.TestRowProcessorEndpoint org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort org.apache.hadoop.hbase.replication.TestReplication Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2075//testReport/ Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2075//console This message is automatically generated.
          Hide
          Nicolas Liochon added a comment -

          Locally everything is ok and these tests are known as flaky, so I think it's ok. v24 is the version with the comments in TestAssignement.

          Show
          Nicolas Liochon added a comment - Locally everything is ok and these tests are known as flaky, so I think it's ok. v24 is the version with the comments in TestAssignement.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12530378/6109.v24.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 26 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2076//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2076//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12530378/6109.v24.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 26 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2076//testReport/ Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2076//console This message is automatically generated.
          Hide
          Nicolas Liochon added a comment -

          ok for commit imho.

          Show
          Nicolas Liochon added a comment - ok for commit imho.
          Hide
          Ted Yu added a comment -

          In ZKUtil.java, the only change was:

          +      LOG.debug("Deleting node "+node);
          

          Can I take this out at time of integration ?

          Show
          Ted Yu added a comment - In ZKUtil.java, the only change was: + LOG.debug( "Deleting node " +node); Can I take this out at time of integration ?
          Hide
          Ted Yu added a comment -

          Integrated to trunk.

          Thanks for the patch, N.

          Thanks for the review, Stack and Ram.

          Show
          Ted Yu added a comment - Integrated to trunk. Thanks for the patch, N. Thanks for the review, Stack and Ram.
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #2966 (See https://builds.apache.org/job/HBase-TRUNK/2966/)
          HBASE-6109 Improve RIT performances during assignment on large clusters (N Keywal) (Revision 1344802)

          Result = FAILURE
          tedyu :
          Files :

          • /hbase/trunk/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/AssignmentManagerStatusTmpl.jamon
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MXBeanImpl.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterDumpServlet.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/NotifiableConcurrentSkipListMap.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/KeyLocker.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/Mocking.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterStatusServlet.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestKeyLocker.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #2966 (See https://builds.apache.org/job/HBase-TRUNK/2966/ ) HBASE-6109 Improve RIT performances during assignment on large clusters (N Keywal) (Revision 1344802) Result = FAILURE tedyu : Files : /hbase/trunk/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/AssignmentManagerStatusTmpl.jamon /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MXBeanImpl.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterDumpServlet.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/NotifiableConcurrentSkipListMap.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/KeyLocker.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/Mocking.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterStatusServlet.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestKeyLocker.java
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #35 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/35/)
          HBASE-6109 Improve RIT performances during assignment on large clusters (N Keywal) (Revision 1344802)

          Result = FAILURE
          tedyu :
          Files :

          • /hbase/trunk/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/AssignmentManagerStatusTmpl.jamon
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MXBeanImpl.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterDumpServlet.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/NotifiableConcurrentSkipListMap.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/KeyLocker.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/Mocking.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterStatusServlet.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestKeyLocker.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #35 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/35/ ) HBASE-6109 Improve RIT performances during assignment on large clusters (N Keywal) (Revision 1344802) Result = FAILURE tedyu : Files : /hbase/trunk/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/AssignmentManagerStatusTmpl.jamon /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MXBeanImpl.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterDumpServlet.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/NotifiableConcurrentSkipListMap.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/KeyLocker.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/Mocking.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterStatusServlet.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestKeyLocker.java
          Hide
          stack added a comment -

          Marking closed.

          Show
          stack added a comment - Marking closed.

            People

            • Assignee:
              Nicolas Liochon
              Reporter:
              Nicolas Liochon
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development