Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-5673

The OOM problem of IPC client call cause all handle block

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Abandoned
    • None
    • 0.92.3
    • None
    • None
    • 0.90.6

    • 1
    • 1

    Description

      if HBaseClient meet "unable to create new native thread" exception, the call will never complete because it be lost in calls queue.

      Attachments

        1. HBASE-5673-90.patch
          0.5 kB
          xufeng
        2. HBASE-5673-90-V2.patch
          0.7 kB
          xufeng

        Activity

          Patch doesn't apply anymore, unmarking as available.

          jdcryans Jean-Daniel Cryans added a comment - Patch doesn't apply anymore, unmarking as available.
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12520564/HBASE-5673-90-V2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2874//console

          This message is automatically generated.

          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12520564/HBASE-5673-90-V2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2874//console This message is automatically generated.
          hudson Hudson added a comment -

          Integrated in HBase-0.92-security #104 (See https://builds.apache.org/job/HBase-0.92-security/104/)
          HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT OF V2, A SECOND REVERT (Revision 1307516)
          HBASE-5673 The OOM problem of IPC client call cause all handle block; REAPPLY; V2 OF PATCH (Revision 1307275)
          HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT – APPLIED WRONG PATCH (Revision 1307272)
          HBASE-5673 The OOM problem of IPC client call cause all handle block (Revision 1307246)

          Result = FAILURE
          stack :
          Files :

          • /hbase/branches/0.92/CHANGES.txt
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java

          stack :
          Files :

          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java

          stack :
          Files :

          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java

          stack :
          Files :

          • /hbase/branches/0.92/CHANGES.txt
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          hudson Hudson added a comment - Integrated in HBase-0.92-security #104 (See https://builds.apache.org/job/HBase-0.92-security/104/ ) HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT OF V2, A SECOND REVERT (Revision 1307516) HBASE-5673 The OOM problem of IPC client call cause all handle block; REAPPLY; V2 OF PATCH (Revision 1307275) HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT – APPLIED WRONG PATCH (Revision 1307272) HBASE-5673 The OOM problem of IPC client call cause all handle block (Revision 1307246) Result = FAILURE stack : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java stack : Files : /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java stack : Files : /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java stack : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          hudson Hudson added a comment -

          Integrated in HBase-0.94-security #7 (See https://builds.apache.org/job/HBase-0.94-security/7/)
          HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT OF V2, A SECOND REVERT (Revision 1307515)

          Result = SUCCESS
          stack :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          hudson Hudson added a comment - Integrated in HBase-0.94-security #7 (See https://builds.apache.org/job/HBase-0.94-security/7/ ) HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT OF V2, A SECOND REVERT (Revision 1307515) Result = SUCCESS stack : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          xufeng xufeng added a comment -

          @Stack @Ted
          I analyze the problem of my patch.
          this is the result:
          I wrap all exception in IOException,this IOException can not be handled in CatalogTracker#private HRegionInterface getCachedConnection(ServerName sn)
          so the master will abort,the cases will fail.

          In the future,I will submit the patch with the test result.

          xufeng xufeng added a comment - @Stack @Ted I analyze the problem of my patch. this is the result: I wrap all exception in IOException,this IOException can not be handled in CatalogTracker#private HRegionInterface getCachedConnection(ServerName sn) so the master will abort,the cases will fail. In the future,I will submit the patch with the test result.
          hudson Hudson added a comment -

          Integrated in HBase-TRUNK-security #155 (See https://builds.apache.org/job/HBase-TRUNK-security/155/)
          HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT OF V2, A SECOND REVERT (Revision 1307513)
          HBASE-5673 The OOM problem of IPC client call cause all handle block; REAPPLY; V2 OF PATCH (Revision 1307277)
          HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT – APPLIED WRONG PATCH (Revision 1307270)

          Result = SUCCESS
          stack :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java

          stack :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java

          stack :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          hudson Hudson added a comment - Integrated in HBase-TRUNK-security #155 (See https://builds.apache.org/job/HBase-TRUNK-security/155/ ) HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT OF V2, A SECOND REVERT (Revision 1307513) HBASE-5673 The OOM problem of IPC client call cause all handle block; REAPPLY; V2 OF PATCH (Revision 1307277) HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT – APPLIED WRONG PATCH (Revision 1307270) Result = SUCCESS stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          zhihyu@ebaysf.com Ted Yu added a comment -

          @Xufeng:
          Use the following command:

          mvn test -P localTests -Dtest=TestMultiVersions
          
          zhihyu@ebaysf.com Ted Yu added a comment - @Xufeng: Use the following command: mvn test -P localTests -Dtest=TestMultiVersions
          xufeng xufeng added a comment -

          @Stack
          I will check why it happened.

          @Ted
          How to run a single test case by maven?
          I run the test in 0.94 by following commandline,
          mvn clean -Dtest=TestMultiVersionstest test
          but I get this reslut:
          Results :

          Tests run: 0, Failures: 0, Errors: 0, Skipped: 0

          [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12-TRUNK-HBASE-2:test (default-test) on project hbase: No tests were executed! (Set -DfailIfNoTests=false to ignore this error.) -> [Help 1]

          xufeng xufeng added a comment - @Stack I will check why it happened. @Ted How to run a single test case by maven? I run the test in 0.94 by following commandline, mvn clean -Dtest=TestMultiVersionstest test but I get this reslut: Results : Tests run: 0, Failures: 0, Errors: 0, Skipped: 0 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12-TRUNK- HBASE-2 :test (default-test) on project hbase: No tests were executed! (Set -DfailIfNoTests=false to ignore this error.) -> [Help 1]
          hudson Hudson added a comment -

          Integrated in HBase-TRUNK #2699 (See https://builds.apache.org/job/HBase-TRUNK/2699/)
          HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT OF V2, A SECOND REVERT (Revision 1307513)

          Result = FAILURE
          stack :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          hudson Hudson added a comment - Integrated in HBase-TRUNK #2699 (See https://builds.apache.org/job/HBase-TRUNK/2699/ ) HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT OF V2, A SECOND REVERT (Revision 1307513) Result = FAILURE stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          hudson Hudson added a comment -

          Integrated in HBase-0.92 #347 (See https://builds.apache.org/job/HBase-0.92/347/)
          HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT OF V2, A SECOND REVERT (Revision 1307516)

          Result = SUCCESS
          stack :
          Files :

          • /hbase/branches/0.92/CHANGES.txt
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          hudson Hudson added a comment - Integrated in HBase-0.92 #347 (See https://builds.apache.org/job/HBase-0.92/347/ ) HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT OF V2, A SECOND REVERT (Revision 1307516) Result = SUCCESS stack : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          hudson Hudson added a comment -

          Integrated in HBase-0.94 #70 (See https://builds.apache.org/job/HBase-0.94/70/)
          HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT OF V2, A SECOND REVERT (Revision 1307515)

          Result = SUCCESS
          stack :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          hudson Hudson added a comment - Integrated in HBase-0.94 #70 (See https://builds.apache.org/job/HBase-0.94/70/ ) HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT OF V2, A SECOND REVERT (Revision 1307515) Result = SUCCESS stack : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          stack Michael Stack added a comment -

          It was imprudent to integrate the patch without going through QA cycle.

          I got a little carried-away... It happens.

          @xufeng Can you figure why the test fail? Test is probably doing silly depdendency on returned exception.

          Also, was thinking later that you perhaps should check the Throwable. If its already an IOE, don't wrap it in a new IOE?

          Good stuff.

          stack Michael Stack added a comment - It was imprudent to integrate the patch without going through QA cycle. I got a little carried-away... It happens. @xufeng Can you figure why the test fail? Test is probably doing silly depdendency on returned exception. Also, was thinking later that you perhaps should check the Throwable. If its already an IOE, don't wrap it in a new IOE? Good stuff.
          stack Michael Stack added a comment -

          Reverted everywhere.

          stack Michael Stack added a comment - Reverted everywhere.
          stack Michael Stack added a comment -

          Np. Will undo the commit

          stack Michael Stack added a comment - Np. Will undo the commit
          zhihyu@ebaysf.com Ted Yu added a comment -

          The patch applies cleanly to TRUNK.

          It was imprudent to integrate the patch without going through QA cycle.

          Now all branches are broken.

          zhihyu@ebaysf.com Ted Yu added a comment - The patch applies cleanly to TRUNK. It was imprudent to integrate the patch without going through QA cycle. Now all branches are broken.
          zhihyu@ebaysf.com Ted Yu added a comment -

          With the patch reverted, the test finished quickly:

          Running org.apache.hadoop.hbase.TestMultiVersions
          Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 17.566 sec
          
          Results :
          
          Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
          
          [INFO] 
          [INFO] --- maven-surefire-plugin:2.10:test (secondPartTestsExecution) @ hbase ---
          [INFO] Tests are skipped.
          [INFO] ------------------------------------------------------------------------
          [INFO] BUILD SUCCESS
          [INFO] ------------------------------------------------------------------------
          [INFO] Total time: 52.152s
          
          zhihyu@ebaysf.com Ted Yu added a comment - With the patch reverted, the test finished quickly: Running org.apache.hadoop.hbase.TestMultiVersions Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 17.566 sec Results : Tests run: 1, Failures: 0, Errors: 0, Skipped: 0 [INFO] [INFO] --- maven-surefire-plugin:2.10:test (secondPartTestsExecution) @ hbase --- [INFO] Tests are skipped. [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 52.152s
          zhihyu@ebaysf.com Ted Yu added a comment -

          I saw the following in jstack when running TestMultiVersions:

          "main" prio=10 tid=0x0000000040cc7800 nid=0x2b36 waiting on condition [0x00007fd703f95000]
             java.lang.Thread.State: TIMED_WAITING (sleeping)
          	at java.lang.Thread.sleep(Native Method)
          	at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:211)
          	at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:424)
          	at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:200)
          	at org.apache.hadoop.hbase.MiniHBaseCluster.<init>(MiniHBaseCluster.java:80)
          	at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:638)
          	at org.apache.hadoop.hbase.TestMultiVersions.testGetRowVersions(TestMultiVersions.java:143)
          
          zhihyu@ebaysf.com Ted Yu added a comment - I saw the following in jstack when running TestMultiVersions: "main" prio=10 tid=0x0000000040cc7800 nid=0x2b36 waiting on condition [0x00007fd703f95000] java.lang. Thread .State: TIMED_WAITING (sleeping) at java.lang. Thread .sleep(Native Method) at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:211) at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:424) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:200) at org.apache.hadoop.hbase.MiniHBaseCluster.<init>(MiniHBaseCluster.java:80) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:638) at org.apache.hadoop.hbase.TestMultiVersions.testGetRowVersions(TestMultiVersions.java:143)
          hudson Hudson added a comment -

          Integrated in HBase-TRUNK #2698 (See https://builds.apache.org/job/HBase-TRUNK/2698/)
          HBASE-5673 The OOM problem of IPC client call cause all handle block; REAPPLY; V2 OF PATCH (Revision 1307277)
          HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT – APPLIED WRONG PATCH (Revision 1307270)
          HBASE-5673 The OOM problem of IPC client call cause all handle block (Revision 1307240)

          Result = FAILURE
          stack :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java

          stack :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java

          stack :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          hudson Hudson added a comment - Integrated in HBase-TRUNK #2698 (See https://builds.apache.org/job/HBase-TRUNK/2698/ ) HBASE-5673 The OOM problem of IPC client call cause all handle block; REAPPLY; V2 OF PATCH (Revision 1307277) HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT – APPLIED WRONG PATCH (Revision 1307270) HBASE-5673 The OOM problem of IPC client call cause all handle block (Revision 1307240) Result = FAILURE stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          hudson Hudson added a comment -

          Integrated in HBase-0.94-security #6 (See https://builds.apache.org/job/HBase-0.94-security/6/)
          HBASE-5673 The OOM problem of IPC client call cause all handle block; REAPPLY; V2 OF PATCH (Revision 1307276)
          HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT – APPLIED WRONG PATCH (Revision 1307271)
          HBASE-5673 The OOM problem of IPC client call cause all handle block (Revision 1307242)

          Result = SUCCESS
          stack :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java

          stack :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java

          stack :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          hudson Hudson added a comment - Integrated in HBase-0.94-security #6 (See https://builds.apache.org/job/HBase-0.94-security/6/ ) HBASE-5673 The OOM problem of IPC client call cause all handle block; REAPPLY; V2 OF PATCH (Revision 1307276) HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT – APPLIED WRONG PATCH (Revision 1307271) HBASE-5673 The OOM problem of IPC client call cause all handle block (Revision 1307242) Result = SUCCESS stack : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java stack : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java stack : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          xufeng xufeng added a comment -

          Build failed!
          My patch cause it happened?

          xufeng xufeng added a comment - Build failed! My patch cause it happened?
          hudson Hudson added a comment -

          Integrated in HBase-0.92 #346 (See https://builds.apache.org/job/HBase-0.92/346/)
          HBASE-5673 The OOM problem of IPC client call cause all handle block; REAPPLY; V2 OF PATCH (Revision 1307275)
          HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT – APPLIED WRONG PATCH (Revision 1307272)

          Result = FAILURE
          stack :
          Files :

          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java

          stack :
          Files :

          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          hudson Hudson added a comment - Integrated in HBase-0.92 #346 (See https://builds.apache.org/job/HBase-0.92/346/ ) HBASE-5673 The OOM problem of IPC client call cause all handle block; REAPPLY; V2 OF PATCH (Revision 1307275) HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT – APPLIED WRONG PATCH (Revision 1307272) Result = FAILURE stack : Files : /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java stack : Files : /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          hudson Hudson added a comment -

          Integrated in HBase-0.94 #69 (See https://builds.apache.org/job/HBase-0.94/69/)
          HBASE-5673 The OOM problem of IPC client call cause all handle block; REAPPLY; V2 OF PATCH (Revision 1307276)
          HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT – APPLIED WRONG PATCH (Revision 1307271)

          Result = FAILURE
          stack :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java

          stack :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          hudson Hudson added a comment - Integrated in HBase-0.94 #69 (See https://builds.apache.org/job/HBase-0.94/69/ ) HBASE-5673 The OOM problem of IPC client call cause all handle block; REAPPLY; V2 OF PATCH (Revision 1307276) HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT – APPLIED WRONG PATCH (Revision 1307271) Result = FAILURE stack : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java stack : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          stack Michael Stack added a comment -

          No problem Xufeng.

          I reverted the v1 patch from all places and then applied everywhere your v2 patch.

          stack Michael Stack added a comment - No problem Xufeng. I reverted the v1 patch from all places and then applied everywhere your v2 patch.
          hudson Hudson added a comment -

          Integrated in HBase-0.94 #68 (See https://builds.apache.org/job/HBase-0.94/68/)
          HBASE-5673 The OOM problem of IPC client call cause all handle block (Revision 1307242)

          Result = FAILURE
          stack :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          hudson Hudson added a comment - Integrated in HBase-0.94 #68 (See https://builds.apache.org/job/HBase-0.94/68/ ) HBASE-5673 The OOM problem of IPC client call cause all handle block (Revision 1307242) Result = FAILURE stack : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          hudson Hudson added a comment -

          Integrated in HBase-0.92 #345 (See https://builds.apache.org/job/HBase-0.92/345/)
          HBASE-5673 The OOM problem of IPC client call cause all handle block (Revision 1307246)

          Result = FAILURE
          stack :
          Files :

          • /hbase/branches/0.92/CHANGES.txt
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          hudson Hudson added a comment - Integrated in HBase-0.92 #345 (See https://builds.apache.org/job/HBase-0.92/345/ ) HBASE-5673 The OOM problem of IPC client call cause all handle block (Revision 1307246) Result = FAILURE stack : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          hudson Hudson added a comment -

          Integrated in HBase-TRUNK-security #154 (See https://builds.apache.org/job/HBase-TRUNK-security/154/)
          HBASE-5673 The OOM problem of IPC client call cause all handle block (Revision 1307240)

          Result = FAILURE
          stack :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          hudson Hudson added a comment - Integrated in HBase-TRUNK-security #154 (See https://builds.apache.org/job/HBase-TRUNK-security/154/ ) HBASE-5673 The OOM problem of IPC client call cause all handle block (Revision 1307240) Result = FAILURE stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          xufeng xufeng added a comment -

          @Stack
          Sorry,I attach a wrong patch.
          I updated it.name:HBASE-5673-90-V2.patch

          xufeng xufeng added a comment - @Stack Sorry,I attach a wrong patch. I updated it.name: HBASE-5673 -90-V2.patch
          stack Michael Stack added a comment -

          Committed to 0.90, 0.92, 0.94 and trunk. Thanks for the patch Xufeng

          stack Michael Stack added a comment - Committed to 0.90, 0.92, 0.94 and trunk. Thanks for the patch Xufeng
          xufeng xufeng added a comment -

          this patch for 0.90
          unit test is running now.

          I have checked the code of trunk and 0.92 and also have this issue.

          pls review the 90 patch and give me some suggestions,thanks.

          xufeng xufeng added a comment - this patch for 0.90 unit test is running now. I have checked the code of trunk and 0.92 and also have this issue. pls review the 90 patch and give me some suggestions,thanks.
          xufeng xufeng added a comment -

          Step 4 miss some logs info:

          java.lang.OutOfMemoryError: unable to create new native thread
          	at java.lang.Thread.start0(Native Method)
          	at java.lang.Thread.start(Thread.java:640)
          	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:351)
          
          xufeng xufeng added a comment - Step 4 miss some logs info: java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:640) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:351)
          xufeng xufeng added a comment -

          I found this issue in my cluster.

          1.I found any regionserver call not report to master because sockettimeout.

          [2012-03-26 14:48:09,815] [INFO ] [regionserver20020] [org.apache.hadoop.hbase.regionserver.HRegionServer 1469] Attempting connect to Master server at DDB03:20000
          [2012-03-26 14:49:09,818] [INFO ] [regionserver20020] [org.apache.hadoop.ipc.HbaseRPC 360] Problem connecting to server: DDB03/192.168.28.53:20000
          [2012-03-26 14:49:09,819] [WARN ] [regionserver20020] [org.apache.hadoop.hbase.regionserver.HRegionServer 1483] Unable to connect to master. Retrying. Error was:
          java.net.SocketTimeoutException: Call to DDB03/192.168.28.53:20000 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.28.53:59520 remote=DDB03/192.168.28.53:20000]
          

          2.through the jstack log of master,I found that one handle is waitting and others is blocked(waitForMeta).

          。。。。。。。。。。。。
          "IPC Server handler 90 on 20000" daemon prio=10 tid=0x00007f219c540000 nid=0x4c3f in Object.wait() [0x00007f21963a7000]
             java.lang.Thread.State: WAITING (on object monitor)
          	at java.lang.Object.wait(Native Method)
          	at java.lang.Object.wait(Object.java:485)
          	at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:757)
          	。。。。。。。。。
          
          "IPC Server handler 87 on 20000" daemon prio=10 tid=0x00007f219c53a000 nid=0x4c37 waiting for monitor entry [0x00007f21966aa000]
             java.lang.Thread.State: BLOCKED (on object monitor)
          	at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:397)
          	- waiting to lock <0x0000000612486960> (a java.util.concurrent.atomic.AtomicBoolean)
          	at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:437)
          。。。。。。。。。。。
          

          3.I also ensure that the waitting handle cause the others blocked,the waitting handle is waitting for the call to complete.

          4.But the unable to create new native thread” happened, the IOException can not caught it.

          protected synchronized void setupIOstreams() throws IOException {
          。。。。
                  start();
                } catch (IOException e) {
                  markClosed(e);
                  close();
          
                  throw e;
                }
          。。。。。
          

          5.thus the call will be lost in call queue and never to complete.

          public Writable call(......)
          {
          ......
              synchronized (call) {
                while (!call.done) {
                  try {
                    call.wait();                           // wait for the result
                  } catch (InterruptedException ignored) {
                    // save the fact that we were interrupted
                    interrupted = true;
                  }
                }
          ......
          }
          
          
          xufeng xufeng added a comment - I found this issue in my cluster. 1.I found any regionserver call not report to master because sockettimeout. [2012-03-26 14:48:09,815] [INFO ] [regionserver20020] [org.apache.hadoop.hbase.regionserver.HRegionServer 1469] Attempting connect to Master server at DDB03:20000 [2012-03-26 14:49:09,818] [INFO ] [regionserver20020] [org.apache.hadoop.ipc.HbaseRPC 360] Problem connecting to server: DDB03/192.168.28.53:20000 [2012-03-26 14:49:09,819] [WARN ] [regionserver20020] [org.apache.hadoop.hbase.regionserver.HRegionServer 1483] Unable to connect to master. Retrying. Error was: java.net.SocketTimeoutException: Call to DDB03/192.168.28.53:20000 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.28.53:59520 remote=DDB03/192.168.28.53:20000] 2.through the jstack log of master,I found that one handle is waitting and others is blocked(waitForMeta). 。。。。。。。。。。。。 "IPC Server handler 90 on 20000" daemon prio=10 tid=0x00007f219c540000 nid=0x4c3f in Object.wait() [0x00007f21963a7000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:485) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:757) 。。。。。。。。。 "IPC Server handler 87 on 20000" daemon prio=10 tid=0x00007f219c53a000 nid=0x4c37 waiting for monitor entry [0x00007f21966aa000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:397) - waiting to lock <0x0000000612486960> (a java.util.concurrent.atomic.AtomicBoolean) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:437) 。。。。。。。。。。。 3.I also ensure that the waitting handle cause the others blocked,the waitting handle is waitting for the call to complete. 4.But the unable to create new native thread” happened, the IOException can not caught it. protected synchronized void setupIOstreams() throws IOException { 。。。。 start(); } catch (IOException e) { markClosed(e); close(); throw e; } 。。。。。 5.thus the call will be lost in call queue and never to complete. public Writable call(......) { ...... synchronized (call) { while (!call.done) { try { call.wait(); // wait for the result } catch (InterruptedException ignored) { // save the fact that we were interrupted interrupted = true; } } ...... }

          People

            Unassigned Unassigned
            xufeng xufeng
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: