HBase
  1. HBase
  2. HBASE-7871

HBase can be stuck when closing regions concurrently

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.95.2
    • Fix Version/s: 0.98.0, 0.95.1
    • Component/s: master
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The attached test fails ~1% of the the time on 0.96. It seems it does not fail on 0.94.5. It's simple: a table creation and some puts.

      I attach the stack. Logs says nothing it seems.
      The suspicious part is:

      "RS_CLOSE_REGION-localhost,57575,1361197489166-2" prio=10 tid=0x00007fb0c8775800 nid=0x61ac runnable [0x00007fb09f272000]
         java.lang.Thread.State: RUNNABLE
              at java.util.TreeMap.fixAfterDeletion(TreeMap.java:2193)
              at java.util.TreeMap.deleteEntry(TreeMap.java:2151)
              at java.util.TreeMap.remove(TreeMap.java:585)
              at java.util.TreeSet.remove(TreeSet.java:259)
              at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:55)
              at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:86)
              at org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:40)
              at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1063)
              at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:969)
              - locked <0x00000006944e2558> (a java.lang.Object)
              at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:146)
              at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:203)
              at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
              at java.lang.Thread.run(Thread.java:662)
      
      
      1. TestStartStop.java
        2 kB
        Nicolas Liochon
      2. s1.txt
        172 kB
        Nicolas Liochon
      3. 7871.patch
        0.7 kB
        Ted Yu
      4. 7871-v2.patch
        1 kB
        Ted Yu
      5. 7871-v3.txt
        4 kB
        Ted Yu
      6. 7871-v4.txt
        4 kB
        Ted Yu

        Activity

        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #476 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/476/)
        HBASE-7871 HBase can be stuck when closing regions concurrently (Ted Yu) (Revision 1464012)

        Result = FAILURE
        tedyu :
        Files :

        • /hbase/trunk/hbase-hadoop1-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java
        • /hbase/trunk/hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #476 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/476/ ) HBASE-7871 HBase can be stuck when closing regions concurrently (Ted Yu) (Revision 1464012) Result = FAILURE tedyu : Files : /hbase/trunk/hbase-hadoop1-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java /hbase/trunk/hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java
        Hide
        Hudson added a comment -

        Integrated in hbase-0.95-on-hadoop2 #53 (See https://builds.apache.org/job/hbase-0.95-on-hadoop2/53/)
        HBASE-7871 HBase can be stuck when closing regions concurrently (Ted Yu) (Revision 1464013)

        Result = FAILURE
        tedyu :
        Files :

        • /hbase/branches/0.95/hbase-hadoop1-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java
        • /hbase/branches/0.95/hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java
        Show
        Hudson added a comment - Integrated in hbase-0.95-on-hadoop2 #53 (See https://builds.apache.org/job/hbase-0.95-on-hadoop2/53/ ) HBASE-7871 HBase can be stuck when closing regions concurrently (Ted Yu) (Revision 1464013) Result = FAILURE tedyu : Files : /hbase/branches/0.95/hbase-hadoop1-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java /hbase/branches/0.95/hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #4009 (See https://builds.apache.org/job/HBase-TRUNK/4009/)
        HBASE-7871 HBase can be stuck when closing regions concurrently (Ted Yu) (Revision 1464012)

        Result = FAILURE
        tedyu :
        Files :

        • /hbase/trunk/hbase-hadoop1-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java
        • /hbase/trunk/hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK #4009 (See https://builds.apache.org/job/HBase-TRUNK/4009/ ) HBASE-7871 HBase can be stuck when closing regions concurrently (Ted Yu) (Revision 1464012) Result = FAILURE tedyu : Files : /hbase/trunk/hbase-hadoop1-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java /hbase/trunk/hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java
        Hide
        Hudson added a comment -

        Integrated in hbase-0.95 #121 (See https://builds.apache.org/job/hbase-0.95/121/)
        HBASE-7871 HBase can be stuck when closing regions concurrently (Ted Yu) (Revision 1464013)

        Result = SUCCESS
        tedyu :
        Files :

        • /hbase/branches/0.95/hbase-hadoop1-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java
        • /hbase/branches/0.95/hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java
        Show
        Hudson added a comment - Integrated in hbase-0.95 #121 (See https://builds.apache.org/job/hbase-0.95/121/ ) HBASE-7871 HBase can be stuck when closing regions concurrently (Ted Yu) (Revision 1464013) Result = SUCCESS tedyu : Files : /hbase/branches/0.95/hbase-hadoop1-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java /hbase/branches/0.95/hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java
        Hide
        Ted Yu added a comment -

        Integrated to 0.95 and trunk.

        Thanks for the reviews.

        Show
        Ted Yu added a comment - Integrated to 0.95 and trunk. Thanks for the reviews.
        Hide
        Nicolas Liochon added a comment -

        Tested, no error on 250 tries => +1 for me.

        Show
        Nicolas Liochon added a comment - Tested, no error on 250 tries => +1 for me.
        Hide
        Anoop Sam John added a comment -

        Yes Ram. This fix should solve this test failure also.

        Show
        Anoop Sam John added a comment - Yes Ram. This fix should solve this test failure also.
        Hide
        ramkrishna.s.vasudevan added a comment -

        Got this error in http://54.241.6.143/job/HBase-TRUNK/org.apache.hbase$hbase-server/74/testReport/junit/org.apache.hadoop.hbase.master/TestTableLockManager/testTableReadLock/

        java.lang.NullPointerException
        	at java.util.TreeMap.rotateRight(TreeMap.java:2057)
        	at java.util.TreeMap.fixAfterDeletion(TreeMap.java:2199)
        	at java.util.TreeMap.deleteEntry(TreeMap.java:2151)
        	at java.util.TreeMap.remove(TreeMap.java:585)
        	at java.util.TreeSet.remove(TreeSet.java:259)
        	at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:55)
        	at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:86)
        	at org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:40)
        	at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:962)
        	at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:863)
        	at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:147)
        

        Hope this patch will fix this issue also.
        I think this was the reason for the TestTableLockManager to hang.

        Show
        ramkrishna.s.vasudevan added a comment - Got this error in http://54.241.6.143/job/HBase-TRUNK/org.apache.hbase$hbase-server/74/testReport/junit/org.apache.hadoop.hbase.master/TestTableLockManager/testTableReadLock/ java.lang.NullPointerException at java.util.TreeMap.rotateRight(TreeMap.java:2057) at java.util.TreeMap.fixAfterDeletion(TreeMap.java:2199) at java.util.TreeMap.deleteEntry(TreeMap.java:2151) at java.util.TreeMap.remove(TreeMap.java:585) at java.util.TreeSet.remove(TreeSet.java:259) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:55) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:86) at org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:40) at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:962) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:863) at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:147) Hope this patch will fix this issue also. I think this was the reason for the TestTableLockManager to hang.
        Hide
        stack added a comment -

        Nicolas Liochon Any luck w/ the testing? Patch looks good to me. +1 on trunk and 0.95

        Show
        stack added a comment - Nicolas Liochon Any luck w/ the testing? Patch looks good to me. +1 on trunk and 0.95
        Hide
        Anoop Sam John added a comment -

        Thanks Elliott Clark for clarifying the usage. Yes I was also of the opinion of having a read write lock in such case.
        Ted Yu I am +1 on the V4 patch. We will wait for the test from Nicolas Liochon

        Show
        Anoop Sam John added a comment - Thanks Elliott Clark for clarifying the usage. Yes I was also of the opinion of having a read write lock in such case. Ted Yu I am +1 on the V4 patch. We will wait for the test from Nicolas Liochon
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12576647/7871-v4.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 lineLengths. The patch does not introduce lines longer than 100

        +1 site. The mvn site goal succeeds with this patch.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.master.TestTableLockManager
        org.apache.hadoop.hbase.security.access.TestAccessController

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/5101//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5101//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5101//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5101//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5101//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5101//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5101//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5101//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5101//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/5101//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576647/7871-v4.txt against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 site . The mvn site goal succeeds with this patch. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.master.TestTableLockManager org.apache.hadoop.hbase.security.access.TestAccessController Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/5101//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5101//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5101//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5101//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5101//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5101//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5101//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5101//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5101//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/5101//console This message is automatically generated.
        Hide
        Ted Yu added a comment -

        Patch v4 adds some comment for the lock.

        Show
        Ted Yu added a comment - Patch v4 adds some comment for the lock.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12576633/7871-v3.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 lineLengths. The patch does not introduce lines longer than 100

        +1 site. The mvn site goal succeeds with this patch.

        +1 core tests. The patch passed unit tests in .

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/5099//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5099//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5099//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5099//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5099//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5099//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5099//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5099//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5099//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/5099//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576633/7871-v3.txt against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 site . The mvn site goal succeeds with this patch. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/5099//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5099//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5099//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5099//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5099//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5099//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5099//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5099//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5099//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/5099//console This message is automatically generated.
        Hide
        Elliott Clark added a comment -

        Functionally it looks good though some comments about locking should be added.

        Show
        Elliott Clark added a comment - Functionally it looks good though some comments about locking should be added.
        Hide
        Nicolas Liochon added a comment -

        I've got some tests running, but I should be able to test it tomorrow.

        Show
        Nicolas Liochon added a comment - I've got some tests running, but I should be able to test it tomorrow.
        Hide
        Ted Yu added a comment -

        Patch v3 addresses Elliot's comments.

        Show
        Ted Yu added a comment - Patch v3 addresses Elliot's comments.
        Hide
        Elliott Clark added a comment -

        getMetrics is called from a thread spawned by the Hadoop metrics system. The hadoop metrics system calls getMetrics to copy all of the values that a Source has. It's always a thread outside of the control of HBase.

        Show
        Elliott Clark added a comment - getMetrics is called from a thread spawned by the Hadoop metrics system. The hadoop metrics system calls getMetrics to copy all of the values that a Source has. It's always a thread outside of the control of HBase. Anything we do in hadoop1-compat will probably have to be done in hadoop 2. https://github.com/apache/hbase/blob/trunk/hbase-hadoop1-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java#L49 https://github.com/apache/hbase/blob/trunk/hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsRegionAggregateSourceImpl.java#L50 I think we should go with reader writer locks inside of the Aggregate source, where the actual manipulation of the tree map happens. getMetrics takes a reader lock Adding or removing region sources would take a writer lock
        Hide
        Ted Yu added a comment -

        Elliott Clark:
        Your opinion on this issue would be valuable.

        Show
        Ted Yu added a comment - Elliott Clark : Your opinion on this issue would be valuable.
        Hide
        Ted Yu added a comment -

        I used the following command at the root of trunk to see where getMetrics(MetricsBuilder metricsBuilder, boolean all) is called:

        find . -name '*.java' -exec grep 'getMetrics(' {} \; -print | grep -v 'getMetrics()'

        I only saw reference in tests, such as hbase-hadoop1-compat/src/test/java/org/apache/hadoop/hbase/test/MetricsAssertHelperImpl.java, etc.

        Show
        Ted Yu added a comment - I used the following command at the root of trunk to see where getMetrics(MetricsBuilder metricsBuilder, boolean all) is called: find . -name '*.java' -exec grep 'getMetrics(' {} \; -print | grep -v 'getMetrics()' I only saw reference in tests, such as hbase-hadoop1-compat/src/test/java/org/apache/hadoop/hbase/test/MetricsAssertHelperImpl.java, etc.
        Hide
        Anoop Sam John added a comment -

        Thanks Nicolas Liochon for testing.
        MetricsRegionAggregateSourceImpl#getMetrics() how this is being used?
        Only one doubt with me. The concurrent iterating over the TreeSet with a register/deregister can throw ConcurrentModificationException? May be Elliott Clark can tell about the usage.

        Show
        Anoop Sam John added a comment - Thanks Nicolas Liochon for testing. MetricsRegionAggregateSourceImpl#getMetrics() how this is being used? Only one doubt with me. The concurrent iterating over the TreeSet with a register/deregister can throw ConcurrentModificationException? May be Elliott Clark can tell about the usage.
        Hide
        Ted Yu added a comment -

        Anoop Sam John:
        Do you have further comment(s) ?

        Thanks

        Show
        Ted Yu added a comment - Anoop Sam John : Do you have further comment(s) ? Thanks
        Hide
        Nicolas Liochon added a comment -

        It worked 300 times in a row w/o any issue. Let's say the first attempt was an error from me and the patch is fine.

        Show
        Nicolas Liochon added a comment - It worked 300 times in a row w/o any issue. Let's say the first attempt was an error from me and the patch is fine.
        Hide
        Nicolas Liochon added a comment -

        Can we change the Title of the issue accordingly?

        Done

        When I tried the patch it got stuck the very first time (without any trace of the monitoring stuff) then worked 150 times. I've just launched another run of tests.

        Show
        Nicolas Liochon added a comment - Can we change the Title of the issue accordingly? Done When I tried the patch it got stuck the very first time (without any trace of the monitoring stuff) then worked 150 times. I've just launched another run of tests.
        Hide
        Anoop Sam John added a comment -

        Not only at shutdown, but if there are concurrent region close operation in the HRS there is a chance for this. Can we change the Title of the issue accordingly?

        Show
        Anoop Sam John added a comment - Not only at shutdown, but if there are concurrent region close operation in the HRS there is a chance for this. Can we change the Title of the issue accordingly?
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12575347/7871-v2.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 lineLengths. The patch does not introduce lines longer than 100

        -1 site. The patch appears to cause mvn site goal to fail.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.client.TestHTableMultiplexer
        org.apache.hadoop.hbase.zookeeper.lock.TestZKInterProcessReadWriteLock

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4995//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4995//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4995//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4995//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4995//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4995//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4995//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4995//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4995//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4995//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12575347/7871-v2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.client.TestHTableMultiplexer org.apache.hadoop.hbase.zookeeper.lock.TestZKInterProcessReadWriteLock Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4995//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4995//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4995//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4995//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4995//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4995//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4995//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4995//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4995//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4995//console This message is automatically generated.
        Hide
        Ted Yu added a comment -

        From http://javasourcecode.org/html/open-source/jdk/jdk-6u23/java.util/TreeMap.java.html :

            /** From CLR */
            private void rotateLeft(Entry<K,V> p) {
                if (p != null) {
                    Entry<K,V> r = p.right;
                    p.right = r.left;
                    if (r.left != null)
                        r.left.parent = p;
                    r.parent = p.parent;
                    if (p.parent == null)
                        root = r;
                    else if (p.parent.left == p)
                        p.parent.left = r;
                    else
                        p.parent.right = r;
                    r.left = p;
                    p.parent = r;
                }
            }
        

        where line 2038 is:

                    p.right = r.left;
        

        where p.right gets assigned null at line 2142 in deleteEntry():

                    p.left = p.right = p.parent = null;
        

        This means that concurrent deregister() resulted in the NPE.

        See Anoop's comment @ 22/Mar/13 12:34 above for more background.

        Show
        Ted Yu added a comment - From http://javasourcecode.org/html/open-source/jdk/jdk-6u23/java.util/TreeMap.java.html : /** From CLR */ private void rotateLeft(Entry<K,V> p) { if (p != null ) { Entry<K,V> r = p.right; p.right = r.left; if (r.left != null ) r.left.parent = p; r.parent = p.parent; if (p.parent == null ) root = r; else if (p.parent.left == p) p.parent.left = r; else p.parent.right = r; r.left = p; p.parent = r; } } where line 2038 is: p.right = r.left; where p.right gets assigned null at line 2142 in deleteEntry(): p.left = p.right = p.parent = null ; This means that concurrent deregister() resulted in the NPE. See Anoop's comment @ 22/Mar/13 12:34 above for more background.
        Hide
        Nicolas Liochon added a comment -

        I'm going to test the patch locally. It may takes some time as it works 99% of the time

        Show
        Nicolas Liochon added a comment - I'm going to test the patch locally. It may takes some time as it works 99% of the time
        Hide
        Ted Yu added a comment -

        If I read https://builds.apache.org/job/PreCommit-HBASE-Build/4992/artifact/trunk/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.util.TestMergeTable-output.txt correctly, the test failure was related to this issue:

        2013-03-25 04:46:08,389 FATAL [RS_CLOSE_REGION-asf002.sp2.ygridcore.net,34472,1364186761997-2] regionserver.HRegionServer(1607): ABORTING region server asf002.sp2.ygridcore.net,34472,1364186761997: Unrecoverable exception while closing region test,row_80001,1364186755279.c9304bb72db64492117f5008cc3901e4., still finishing close
        java.lang.NullPointerException
        	at java.util.TreeMap.rotateLeft(TreeMap.java:2038)
        	at java.util.TreeMap.fixAfterDeletion(TreeMap.java:2211)
        	at java.util.TreeMap.deleteEntry(TreeMap.java:2151)
        	at java.util.TreeMap.remove(TreeMap.java:585)
        	at java.util.TreeSet.remove(TreeSet.java:259)
        	at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:55)
        	at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:86)
        	at org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:40)
        	at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:962)
        	at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:863)
        	at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:147)
        	at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:130)
        	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        	at java.lang.Thread.run(Thread.java:662)
        
        Show
        Ted Yu added a comment - If I read https://builds.apache.org/job/PreCommit-HBASE-Build/4992/artifact/trunk/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.util.TestMergeTable-output.txt correctly, the test failure was related to this issue: 2013-03-25 04:46:08,389 FATAL [RS_CLOSE_REGION-asf002.sp2.ygridcore.net,34472,1364186761997-2] regionserver.HRegionServer(1607): ABORTING region server asf002.sp2.ygridcore.net,34472,1364186761997: Unrecoverable exception while closing region test,row_80001,1364186755279.c9304bb72db64492117f5008cc3901e4., still finishing close java.lang.NullPointerException at java.util.TreeMap.rotateLeft(TreeMap.java:2038) at java.util.TreeMap.fixAfterDeletion(TreeMap.java:2211) at java.util.TreeMap.deleteEntry(TreeMap.java:2151) at java.util.TreeMap.remove(TreeMap.java:585) at java.util.TreeSet.remove(TreeSet.java:259) at org.apache.hadoop.hbase.regionserver.MetricsRegionAggregateSourceImpl.deregister(MetricsRegionAggregateSourceImpl.java:55) at org.apache.hadoop.hbase.regionserver.MetricsRegionSourceImpl.close(MetricsRegionSourceImpl.java:86) at org.apache.hadoop.hbase.regionserver.MetricsRegion.close(MetricsRegion.java:40) at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:962) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:863) at org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:147) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:130) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang. Thread .run( Thread .java:662)
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12575031/7871-v2.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 lineLengths. The patch does not introduce lines longer than 100

        +1 site. The mvn site goal succeeds with this patch.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.client.TestHTableMultiplexer

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4968//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4968//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12575031/7871-v2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 site . The mvn site goal succeeds with this patch. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.client.TestHTableMultiplexer Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4968//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4968//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4968//console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12575019/7871.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 lineLengths. The patch does not introduce lines longer than 100

        +1 site. The mvn site goal succeeds with this patch.

        +1 core tests. The patch passed unit tests in .

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4967//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4967//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12575019/7871.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 site . The mvn site goal succeeds with this patch. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4967//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4967//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4967//console This message is automatically generated.
        Hide
        Nicolas Liochon added a comment -

        Nice catch Anoop!
        I agree. Elliott Clark, could you please have a look? I think that synchronizing register & deregister should do it (I can do it if you confirm it's ok but lack time). In getMetrics, it's as well possible to remove the check for null on regionSources.

        Show
        Nicolas Liochon added a comment - Nice catch Anoop! I agree. Elliott Clark , could you please have a look? I think that synchronizing register & deregister should do it (I can do it if you confirm it's ok but lack time). In getMetrics, it's as well possible to remove the check for null on regionSources.
        Hide
        Anoop Sam John added a comment -

        We might need to sync the add() call to the TreeSet as well. Nicholas can you run test in your env with this patch pls? (With current code I tried some times but not able to reproduce problem) I will see any other places we are missing sync.

        Show
        Anoop Sam John added a comment - We might need to sync the add() call to the TreeSet as well. Nicholas can you run test in your env with this patch pls? (With current code I tried some times but not able to reproduce problem) I will see any other places we are missing sync.
        Hide
        Ted Yu added a comment -

        How about this patch ?

        Show
        Ted Yu added a comment - How about this patch ?
        Hide
        Anoop Sam John added a comment -

        We may get this issue when concurrent region close happening in RS. The close can get stuck. IMO we should up the priority of this issue and should close this.

        Show
        Anoop Sam John added a comment - We may get this issue when concurrent region close happening in RS. The close can get stuck. IMO we should up the priority of this issue and should close this.
        Hide
        Anoop Sam John added a comment - - edited

        I think I got the issue with Region close.
        Here in test multiple regions getting closed concurrently.
        All the MetricsRegionSourceImpl objects(One per region) share same MetricsRegionAggregateSourceImpl instance
        Refer code in MetricsRegionServerSourceFactoryImpl

        private synchronized MetricsRegionAggregateSourceImpl getAggregate() {
            if (FactoryStorage.INSTANCE.aggImpl == null) {
              FactoryStorage.INSTANCE.aggImpl = new MetricsRegionAggregateSourceImpl();
            }
            return FactoryStorage.INSTANCE.aggImpl;
          }
        @Override
          public MetricsRegionSource createRegion(MetricsRegionWrapper wrapper) {
            return new MetricsRegionSourceImpl(wrapper, getAggregate());
          }
        

        So concurrent calls for MetricsRegionAggregateSourceImpl#deregister()
        TreeSet<MetricsRegionSourceImpl> regionSources being not thread safe can create issue.

        From TreeSet's javadoc

        Note that this implementation is not synchronized. If multiple threads access a map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with an existing key is not a structural modification.)

        Concurrent structural modifications on non thread safe maps can cause endless loops. Am I correct here? I have seen some issues with Maps like this in the past. With HashMap I think. Pls correct me if I am wrong

        Show
        Anoop Sam John added a comment - - edited I think I got the issue with Region close. Here in test multiple regions getting closed concurrently. All the MetricsRegionSourceImpl objects(One per region) share same MetricsRegionAggregateSourceImpl instance Refer code in MetricsRegionServerSourceFactoryImpl private synchronized MetricsRegionAggregateSourceImpl getAggregate() { if (FactoryStorage.INSTANCE.aggImpl == null ) { FactoryStorage.INSTANCE.aggImpl = new MetricsRegionAggregateSourceImpl(); } return FactoryStorage.INSTANCE.aggImpl; } @Override public MetricsRegionSource createRegion(MetricsRegionWrapper wrapper) { return new MetricsRegionSourceImpl(wrapper, getAggregate()); } So concurrent calls for MetricsRegionAggregateSourceImpl#deregister() TreeSet<MetricsRegionSourceImpl> regionSources being not thread safe can create issue. From TreeSet's javadoc Note that this implementation is not synchronized. If multiple threads access a map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with an existing key is not a structural modification.) Concurrent structural modifications on non thread safe maps can cause endless loops. Am I correct here? I have seen some issues with Maps like this in the past. With HashMap I think. Pls correct me if I am wrong
        Hide
        Nick Dimiduk added a comment -
        Show
        Nick Dimiduk added a comment - Here's another example from Jenkins: https://builds.apache.org/job/PreCommit-HBASE-Build/4558//consoleFull

          People

          • Assignee:
            Ted Yu
            Reporter:
            Nicolas Liochon
          • Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development