HBase
  1. HBase
  2. HBASE-5673

The OOM problem of IPC client call cause all handle block

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 0.92.3
    • Component/s: None
    • Labels:
      None
    • Environment:

      0.90.6

    • Hadoop Flags:
      Incompatible change
    • Release Note:
      1
    • Tags:
      1

      Description

      if HBaseClient meet "unable to create new native thread" exception, the call will never complete because it be lost in calls queue.

      1. HBASE-5673-90.patch
        0.5 kB
        xufeng
      2. HBASE-5673-90-V2.patch
        0.7 kB
        xufeng

        Activity

        Hide
        Jean-Daniel Cryans added a comment -

        Patch doesn't apply anymore, unmarking as available.

        Show
        Jean-Daniel Cryans added a comment - Patch doesn't apply anymore, unmarking as available.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12520564/HBASE-5673-90-V2.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2874//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12520564/HBASE-5673-90-V2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2874//console This message is automatically generated.
        Hide
        Hudson added a comment -

        Integrated in HBase-0.92-security #104 (See https://builds.apache.org/job/HBase-0.92-security/104/)
        HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT OF V2, A SECOND REVERT (Revision 1307516)
        HBASE-5673 The OOM problem of IPC client call cause all handle block; REAPPLY; V2 OF PATCH (Revision 1307275)
        HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT – APPLIED WRONG PATCH (Revision 1307272)
        HBASE-5673 The OOM problem of IPC client call cause all handle block (Revision 1307246)

        Result = FAILURE
        stack :
        Files :

        • /hbase/branches/0.92/CHANGES.txt
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java

        stack :
        Files :

        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java

        stack :
        Files :

        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java

        stack :
        Files :

        • /hbase/branches/0.92/CHANGES.txt
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
        Show
        Hudson added a comment - Integrated in HBase-0.92-security #104 (See https://builds.apache.org/job/HBase-0.92-security/104/ ) HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT OF V2, A SECOND REVERT (Revision 1307516) HBASE-5673 The OOM problem of IPC client call cause all handle block; REAPPLY; V2 OF PATCH (Revision 1307275) HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT – APPLIED WRONG PATCH (Revision 1307272) HBASE-5673 The OOM problem of IPC client call cause all handle block (Revision 1307246) Result = FAILURE stack : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java stack : Files : /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java stack : Files : /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java stack : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.94-security #7 (See https://builds.apache.org/job/HBase-0.94-security/7/)
        HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT OF V2, A SECOND REVERT (Revision 1307515)

        Result = SUCCESS
        stack :
        Files :

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
        Show
        Hudson added a comment - Integrated in HBase-0.94-security #7 (See https://builds.apache.org/job/HBase-0.94-security/7/ ) HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT OF V2, A SECOND REVERT (Revision 1307515) Result = SUCCESS stack : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
        Hide
        xufeng added a comment -

        @Stack @Ted
        I analyze the problem of my patch.
        this is the result:
        I wrap all exception in IOException,this IOException can not be handled in CatalogTracker#private HRegionInterface getCachedConnection(ServerName sn)
        so the master will abort,the cases will fail.

        In the future,I will submit the patch with the test result.

        Show
        xufeng added a comment - @Stack @Ted I analyze the problem of my patch. this is the result: I wrap all exception in IOException,this IOException can not be handled in CatalogTracker#private HRegionInterface getCachedConnection(ServerName sn) so the master will abort,the cases will fail. In the future,I will submit the patch with the test result.
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK-security #155 (See https://builds.apache.org/job/HBase-TRUNK-security/155/)
        HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT OF V2, A SECOND REVERT (Revision 1307513)
        HBASE-5673 The OOM problem of IPC client call cause all handle block; REAPPLY; V2 OF PATCH (Revision 1307277)
        HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT – APPLIED WRONG PATCH (Revision 1307270)

        Result = SUCCESS
        stack :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java

        stack :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java

        stack :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK-security #155 (See https://builds.apache.org/job/HBase-TRUNK-security/155/ ) HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT OF V2, A SECOND REVERT (Revision 1307513) HBASE-5673 The OOM problem of IPC client call cause all handle block; REAPPLY; V2 OF PATCH (Revision 1307277) HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT – APPLIED WRONG PATCH (Revision 1307270) Result = SUCCESS stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
        Hide
        Ted Yu added a comment -

        @Xufeng:
        Use the following command:

        mvn test -P localTests -Dtest=TestMultiVersions
        
        Show
        Ted Yu added a comment - @Xufeng: Use the following command: mvn test -P localTests -Dtest=TestMultiVersions
        Hide
        xufeng added a comment -

        @Stack
        I will check why it happened.

        @Ted
        How to run a single test case by maven?
        I run the test in 0.94 by following commandline,
        mvn clean -Dtest=TestMultiVersionstest test
        but I get this reslut:
        Results :

        Tests run: 0, Failures: 0, Errors: 0, Skipped: 0

        [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12-TRUNK-HBASE-2:test (default-test) on project hbase: No tests were executed! (Set -DfailIfNoTests=false to ignore this error.) -> [Help 1]

        Show
        xufeng added a comment - @Stack I will check why it happened. @Ted How to run a single test case by maven? I run the test in 0.94 by following commandline, mvn clean -Dtest=TestMultiVersionstest test but I get this reslut: Results : Tests run: 0, Failures: 0, Errors: 0, Skipped: 0 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12-TRUNK- HBASE-2 :test (default-test) on project hbase: No tests were executed! (Set -DfailIfNoTests=false to ignore this error.) -> [Help 1]
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #2699 (See https://builds.apache.org/job/HBase-TRUNK/2699/)
        HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT OF V2, A SECOND REVERT (Revision 1307513)

        Result = FAILURE
        stack :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK #2699 (See https://builds.apache.org/job/HBase-TRUNK/2699/ ) HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT OF V2, A SECOND REVERT (Revision 1307513) Result = FAILURE stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.92 #347 (See https://builds.apache.org/job/HBase-0.92/347/)
        HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT OF V2, A SECOND REVERT (Revision 1307516)

        Result = SUCCESS
        stack :
        Files :

        • /hbase/branches/0.92/CHANGES.txt
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
        Show
        Hudson added a comment - Integrated in HBase-0.92 #347 (See https://builds.apache.org/job/HBase-0.92/347/ ) HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT OF V2, A SECOND REVERT (Revision 1307516) Result = SUCCESS stack : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.94 #70 (See https://builds.apache.org/job/HBase-0.94/70/)
        HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT OF V2, A SECOND REVERT (Revision 1307515)

        Result = SUCCESS
        stack :
        Files :

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
        Show
        Hudson added a comment - Integrated in HBase-0.94 #70 (See https://builds.apache.org/job/HBase-0.94/70/ ) HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT OF V2, A SECOND REVERT (Revision 1307515) Result = SUCCESS stack : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
        Hide
        stack added a comment -

        It was imprudent to integrate the patch without going through QA cycle.

        I got a little carried-away... It happens.

        @xufeng Can you figure why the test fail? Test is probably doing silly depdendency on returned exception.

        Also, was thinking later that you perhaps should check the Throwable. If its already an IOE, don't wrap it in a new IOE?

        Good stuff.

        Show
        stack added a comment - It was imprudent to integrate the patch without going through QA cycle. I got a little carried-away... It happens. @xufeng Can you figure why the test fail? Test is probably doing silly depdendency on returned exception. Also, was thinking later that you perhaps should check the Throwable. If its already an IOE, don't wrap it in a new IOE? Good stuff.
        Hide
        stack added a comment -

        Reverted everywhere.

        Show
        stack added a comment - Reverted everywhere.
        Hide
        stack added a comment -

        Np. Will undo the commit

        Show
        stack added a comment - Np. Will undo the commit
        Hide
        Ted Yu added a comment -

        The patch applies cleanly to TRUNK.

        It was imprudent to integrate the patch without going through QA cycle.

        Now all branches are broken.

        Show
        Ted Yu added a comment - The patch applies cleanly to TRUNK. It was imprudent to integrate the patch without going through QA cycle. Now all branches are broken.
        Hide
        Ted Yu added a comment -

        With the patch reverted, the test finished quickly:

        Running org.apache.hadoop.hbase.TestMultiVersions
        Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 17.566 sec
        
        Results :
        
        Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
        
        [INFO] 
        [INFO] --- maven-surefire-plugin:2.10:test (secondPartTestsExecution) @ hbase ---
        [INFO] Tests are skipped.
        [INFO] ------------------------------------------------------------------------
        [INFO] BUILD SUCCESS
        [INFO] ------------------------------------------------------------------------
        [INFO] Total time: 52.152s
        
        Show
        Ted Yu added a comment - With the patch reverted, the test finished quickly: Running org.apache.hadoop.hbase.TestMultiVersions Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 17.566 sec Results : Tests run: 1, Failures: 0, Errors: 0, Skipped: 0 [INFO] [INFO] --- maven-surefire-plugin:2.10:test (secondPartTestsExecution) @ hbase --- [INFO] Tests are skipped. [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 52.152s
        Hide
        Ted Yu added a comment -

        I saw the following in jstack when running TestMultiVersions:

        "main" prio=10 tid=0x0000000040cc7800 nid=0x2b36 waiting on condition [0x00007fd703f95000]
           java.lang.Thread.State: TIMED_WAITING (sleeping)
        	at java.lang.Thread.sleep(Native Method)
        	at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:211)
        	at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:424)
        	at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:200)
        	at org.apache.hadoop.hbase.MiniHBaseCluster.<init>(MiniHBaseCluster.java:80)
        	at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:638)
        	at org.apache.hadoop.hbase.TestMultiVersions.testGetRowVersions(TestMultiVersions.java:143)
        
        Show
        Ted Yu added a comment - I saw the following in jstack when running TestMultiVersions: "main" prio=10 tid=0x0000000040cc7800 nid=0x2b36 waiting on condition [0x00007fd703f95000] java.lang. Thread .State: TIMED_WAITING (sleeping) at java.lang. Thread .sleep(Native Method) at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:211) at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:424) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:200) at org.apache.hadoop.hbase.MiniHBaseCluster.<init>(MiniHBaseCluster.java:80) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:638) at org.apache.hadoop.hbase.TestMultiVersions.testGetRowVersions(TestMultiVersions.java:143)
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #2698 (See https://builds.apache.org/job/HBase-TRUNK/2698/)
        HBASE-5673 The OOM problem of IPC client call cause all handle block; REAPPLY; V2 OF PATCH (Revision 1307277)
        HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT – APPLIED WRONG PATCH (Revision 1307270)
        HBASE-5673 The OOM problem of IPC client call cause all handle block (Revision 1307240)

        Result = FAILURE
        stack :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java

        stack :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java

        stack :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK #2698 (See https://builds.apache.org/job/HBase-TRUNK/2698/ ) HBASE-5673 The OOM problem of IPC client call cause all handle block; REAPPLY; V2 OF PATCH (Revision 1307277) HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT – APPLIED WRONG PATCH (Revision 1307270) HBASE-5673 The OOM problem of IPC client call cause all handle block (Revision 1307240) Result = FAILURE stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.94-security #6 (See https://builds.apache.org/job/HBase-0.94-security/6/)
        HBASE-5673 The OOM problem of IPC client call cause all handle block; REAPPLY; V2 OF PATCH (Revision 1307276)
        HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT – APPLIED WRONG PATCH (Revision 1307271)
        HBASE-5673 The OOM problem of IPC client call cause all handle block (Revision 1307242)

        Result = SUCCESS
        stack :
        Files :

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java

        stack :
        Files :

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java

        stack :
        Files :

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
        Show
        Hudson added a comment - Integrated in HBase-0.94-security #6 (See https://builds.apache.org/job/HBase-0.94-security/6/ ) HBASE-5673 The OOM problem of IPC client call cause all handle block; REAPPLY; V2 OF PATCH (Revision 1307276) HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT – APPLIED WRONG PATCH (Revision 1307271) HBASE-5673 The OOM problem of IPC client call cause all handle block (Revision 1307242) Result = SUCCESS stack : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java stack : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java stack : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
        Hide
        xufeng added a comment -

        Build failed!
        My patch cause it happened?

        Show
        xufeng added a comment - Build failed! My patch cause it happened?
        Hide
        Hudson added a comment -

        Integrated in HBase-0.92 #346 (See https://builds.apache.org/job/HBase-0.92/346/)
        HBASE-5673 The OOM problem of IPC client call cause all handle block; REAPPLY; V2 OF PATCH (Revision 1307275)
        HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT – APPLIED WRONG PATCH (Revision 1307272)

        Result = FAILURE
        stack :
        Files :

        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java

        stack :
        Files :

        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
        Show
        Hudson added a comment - Integrated in HBase-0.92 #346 (See https://builds.apache.org/job/HBase-0.92/346/ ) HBASE-5673 The OOM problem of IPC client call cause all handle block; REAPPLY; V2 OF PATCH (Revision 1307275) HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT – APPLIED WRONG PATCH (Revision 1307272) Result = FAILURE stack : Files : /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java stack : Files : /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.94 #69 (See https://builds.apache.org/job/HBase-0.94/69/)
        HBASE-5673 The OOM problem of IPC client call cause all handle block; REAPPLY; V2 OF PATCH (Revision 1307276)
        HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT – APPLIED WRONG PATCH (Revision 1307271)

        Result = FAILURE
        stack :
        Files :

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java

        stack :
        Files :

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
        Show
        Hudson added a comment - Integrated in HBase-0.94 #69 (See https://builds.apache.org/job/HBase-0.94/69/ ) HBASE-5673 The OOM problem of IPC client call cause all handle block; REAPPLY; V2 OF PATCH (Revision 1307276) HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT – APPLIED WRONG PATCH (Revision 1307271) Result = FAILURE stack : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java stack : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
        Hide
        stack added a comment -

        No problem Xufeng.

        I reverted the v1 patch from all places and then applied everywhere your v2 patch.

        Show
        stack added a comment - No problem Xufeng. I reverted the v1 patch from all places and then applied everywhere your v2 patch.
        Hide
        Hudson added a comment -

        Integrated in HBase-0.94 #68 (See https://builds.apache.org/job/HBase-0.94/68/)
        HBASE-5673 The OOM problem of IPC client call cause all handle block (Revision 1307242)

        Result = FAILURE
        stack :
        Files :

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
        Show
        Hudson added a comment - Integrated in HBase-0.94 #68 (See https://builds.apache.org/job/HBase-0.94/68/ ) HBASE-5673 The OOM problem of IPC client call cause all handle block (Revision 1307242) Result = FAILURE stack : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.92 #345 (See https://builds.apache.org/job/HBase-0.92/345/)
        HBASE-5673 The OOM problem of IPC client call cause all handle block (Revision 1307246)

        Result = FAILURE
        stack :
        Files :

        • /hbase/branches/0.92/CHANGES.txt
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
        Show
        Hudson added a comment - Integrated in HBase-0.92 #345 (See https://builds.apache.org/job/HBase-0.92/345/ ) HBASE-5673 The OOM problem of IPC client call cause all handle block (Revision 1307246) Result = FAILURE stack : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK-security #154 (See https://builds.apache.org/job/HBase-TRUNK-security/154/)
        HBASE-5673 The OOM problem of IPC client call cause all handle block (Revision 1307240)

        Result = FAILURE
        stack :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK-security #154 (See https://builds.apache.org/job/HBase-TRUNK-security/154/ ) HBASE-5673 The OOM problem of IPC client call cause all handle block (Revision 1307240) Result = FAILURE stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
        Hide
        xufeng added a comment -

        @Stack
        Sorry,I attach a wrong patch.
        I updated it.name:HBASE-5673-90-V2.patch

        Show
        xufeng added a comment - @Stack Sorry,I attach a wrong patch. I updated it.name: HBASE-5673 -90-V2.patch
        Hide
        stack added a comment -

        Committed to 0.90, 0.92, 0.94 and trunk. Thanks for the patch Xufeng

        Show
        stack added a comment - Committed to 0.90, 0.92, 0.94 and trunk. Thanks for the patch Xufeng
        Hide
        xufeng added a comment -

        this patch for 0.90
        unit test is running now.

        I have checked the code of trunk and 0.92 and also have this issue.

        pls review the 90 patch and give me some suggestions,thanks.

        Show
        xufeng added a comment - this patch for 0.90 unit test is running now. I have checked the code of trunk and 0.92 and also have this issue. pls review the 90 patch and give me some suggestions,thanks.
        Hide
        xufeng added a comment -

        Step 4 miss some logs info:

        java.lang.OutOfMemoryError: unable to create new native thread
        	at java.lang.Thread.start0(Native Method)
        	at java.lang.Thread.start(Thread.java:640)
        	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:351)
        
        Show
        xufeng added a comment - Step 4 miss some logs info: java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:640) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:351)
        Hide
        xufeng added a comment -

        I found this issue in my cluster.

        1.I found any regionserver call not report to master because sockettimeout.

        [2012-03-26 14:48:09,815] [INFO ] [regionserver20020] [org.apache.hadoop.hbase.regionserver.HRegionServer 1469] Attempting connect to Master server at DDB03:20000
        [2012-03-26 14:49:09,818] [INFO ] [regionserver20020] [org.apache.hadoop.ipc.HbaseRPC 360] Problem connecting to server: DDB03/192.168.28.53:20000
        [2012-03-26 14:49:09,819] [WARN ] [regionserver20020] [org.apache.hadoop.hbase.regionserver.HRegionServer 1483] Unable to connect to master. Retrying. Error was:
        java.net.SocketTimeoutException: Call to DDB03/192.168.28.53:20000 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.28.53:59520 remote=DDB03/192.168.28.53:20000]
        

        2.through the jstack log of master,I found that one handle is waitting and others is blocked(waitForMeta).

        。。。。。。。。。。。。
        "IPC Server handler 90 on 20000" daemon prio=10 tid=0x00007f219c540000 nid=0x4c3f in Object.wait() [0x00007f21963a7000]
           java.lang.Thread.State: WAITING (on object monitor)
        	at java.lang.Object.wait(Native Method)
        	at java.lang.Object.wait(Object.java:485)
        	at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:757)
        	。。。。。。。。。
        
        "IPC Server handler 87 on 20000" daemon prio=10 tid=0x00007f219c53a000 nid=0x4c37 waiting for monitor entry [0x00007f21966aa000]
           java.lang.Thread.State: BLOCKED (on object monitor)
        	at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:397)
        	- waiting to lock <0x0000000612486960> (a java.util.concurrent.atomic.AtomicBoolean)
        	at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:437)
        。。。。。。。。。。。
        

        3.I also ensure that the waitting handle cause the others blocked,the waitting handle is waitting for the call to complete.

        4.But the unable to create new native thread” happened, the IOException can not caught it.

        protected synchronized void setupIOstreams() throws IOException {
        。。。。
                start();
              } catch (IOException e) {
                markClosed(e);
                close();
        
                throw e;
              }
        。。。。。
        

        5.thus the call will be lost in call queue and never to complete.

        public Writable call(......)
        {
        ......
            synchronized (call) {
              while (!call.done) {
                try {
                  call.wait();                           // wait for the result
                } catch (InterruptedException ignored) {
                  // save the fact that we were interrupted
                  interrupted = true;
                }
              }
        ......
        }
        
        
        Show
        xufeng added a comment - I found this issue in my cluster. 1.I found any regionserver call not report to master because sockettimeout. [2012-03-26 14:48:09,815] [INFO ] [regionserver20020] [org.apache.hadoop.hbase.regionserver.HRegionServer 1469] Attempting connect to Master server at DDB03:20000 [2012-03-26 14:49:09,818] [INFO ] [regionserver20020] [org.apache.hadoop.ipc.HbaseRPC 360] Problem connecting to server: DDB03/192.168.28.53:20000 [2012-03-26 14:49:09,819] [WARN ] [regionserver20020] [org.apache.hadoop.hbase.regionserver.HRegionServer 1483] Unable to connect to master. Retrying. Error was: java.net.SocketTimeoutException: Call to DDB03/192.168.28.53:20000 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.28.53:59520 remote=DDB03/192.168.28.53:20000] 2.through the jstack log of master,I found that one handle is waitting and others is blocked(waitForMeta). 。。。。。。。。。。。。 "IPC Server handler 90 on 20000" daemon prio=10 tid=0x00007f219c540000 nid=0x4c3f in Object.wait() [0x00007f21963a7000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:485) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:757) 。。。。。。。。。 "IPC Server handler 87 on 20000" daemon prio=10 tid=0x00007f219c53a000 nid=0x4c37 waiting for monitor entry [0x00007f21966aa000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:397) - waiting to lock <0x0000000612486960> (a java.util.concurrent.atomic.AtomicBoolean) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:437) 。。。。。。。。。。。 3.I also ensure that the waitting handle cause the others blocked,the waitting handle is waitting for the call to complete. 4.But the unable to create new native thread” happened, the IOException can not caught it. protected synchronized void setupIOstreams() throws IOException { 。。。。 start(); } catch (IOException e) { markClosed(e); close(); throw e; } 。。。。。 5.thus the call will be lost in call queue and never to complete. public Writable call(......) { ...... synchronized (call) { while (!call.done) { try { call.wait(); // wait for the result } catch (InterruptedException ignored) { // save the fact that we were interrupted interrupted = true; } } ...... }

          People

          • Assignee:
            xufeng
            Reporter:
            xufeng
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Due:
              Created:
              Updated:

              Development