Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Abandoned
-
None
-
None
-
None
-
0.90.6
-
1
-
1
Description
if HBaseClient meet "unable to create new native thread" exception, the call will never complete because it be lost in calls queue.
Attachments
Attachments
- HBASE-5673-90.patch
- 0.5 kB
- xufeng
- HBASE-5673-90-V2.patch
- 0.7 kB
- xufeng
Activity
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12520564/HBASE-5673-90-V2.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
-1 patch. The patch command could not apply the patch.
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2874//console
This message is automatically generated.
Integrated in HBase-0.92-security #104 (See https://builds.apache.org/job/HBase-0.92-security/104/)
HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT OF V2, A SECOND REVERT (Revision 1307516)
HBASE-5673 The OOM problem of IPC client call cause all handle block; REAPPLY; V2 OF PATCH (Revision 1307275)
HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT – APPLIED WRONG PATCH (Revision 1307272)
HBASE-5673 The OOM problem of IPC client call cause all handle block (Revision 1307246)
Result = FAILURE
stack :
Files :
- /hbase/branches/0.92/CHANGES.txt
- /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
stack :
Files :
- /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
stack :
Files :
- /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
stack :
Files :
- /hbase/branches/0.92/CHANGES.txt
- /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
Integrated in HBase-0.94-security #7 (See https://builds.apache.org/job/HBase-0.94-security/7/)
HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT OF V2, A SECOND REVERT (Revision 1307515)
Result = SUCCESS
stack :
Files :
- /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
@Stack @Ted
I analyze the problem of my patch.
this is the result:
I wrap all exception in IOException,this IOException can not be handled in CatalogTracker#private HRegionInterface getCachedConnection(ServerName sn)
so the master will abort,the cases will fail.
In the future,I will submit the patch with the test result.
Integrated in HBase-TRUNK-security #155 (See https://builds.apache.org/job/HBase-TRUNK-security/155/)
HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT OF V2, A SECOND REVERT (Revision 1307513)
HBASE-5673 The OOM problem of IPC client call cause all handle block; REAPPLY; V2 OF PATCH (Revision 1307277)
HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT – APPLIED WRONG PATCH (Revision 1307270)
Result = SUCCESS
stack :
Files :
- /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
stack :
Files :
- /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
stack :
Files :
- /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
@Xufeng:
Use the following command:
mvn test -P localTests -Dtest=TestMultiVersions
@Stack
I will check why it happened.
@Ted
How to run a single test case by maven?
I run the test in 0.94 by following commandline,
mvn clean -Dtest=TestMultiVersionstest test
but I get this reslut:
Results :
Tests run: 0, Failures: 0, Errors: 0, Skipped: 0
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12-TRUNK-HBASE-2:test (default-test) on project hbase: No tests were executed! (Set -DfailIfNoTests=false to ignore this error.) -> [Help 1]
Integrated in HBase-TRUNK #2699 (See https://builds.apache.org/job/HBase-TRUNK/2699/)
HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT OF V2, A SECOND REVERT (Revision 1307513)
Result = FAILURE
stack :
Files :
- /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
Integrated in HBase-0.92 #347 (See https://builds.apache.org/job/HBase-0.92/347/)
HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT OF V2, A SECOND REVERT (Revision 1307516)
Result = SUCCESS
stack :
Files :
- /hbase/branches/0.92/CHANGES.txt
- /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
Integrated in HBase-0.94 #70 (See https://builds.apache.org/job/HBase-0.94/70/)
HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT OF V2, A SECOND REVERT (Revision 1307515)
Result = SUCCESS
stack :
Files :
- /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
It was imprudent to integrate the patch without going through QA cycle.
I got a little carried-away... It happens.
@xufeng Can you figure why the test fail? Test is probably doing silly depdendency on returned exception.
Also, was thinking later that you perhaps should check the Throwable. If its already an IOE, don't wrap it in a new IOE?
Good stuff.
The patch applies cleanly to TRUNK.
It was imprudent to integrate the patch without going through QA cycle.
Now all branches are broken.
With the patch reverted, the test finished quickly:
Running org.apache.hadoop.hbase.TestMultiVersions Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 17.566 sec Results : Tests run: 1, Failures: 0, Errors: 0, Skipped: 0 [INFO] [INFO] --- maven-surefire-plugin:2.10:test (secondPartTestsExecution) @ hbase --- [INFO] Tests are skipped. [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 52.152s
I saw the following in jstack when running TestMultiVersions:
"main" prio=10 tid=0x0000000040cc7800 nid=0x2b36 waiting on condition [0x00007fd703f95000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:211) at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:424) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:200) at org.apache.hadoop.hbase.MiniHBaseCluster.<init>(MiniHBaseCluster.java:80) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:638) at org.apache.hadoop.hbase.TestMultiVersions.testGetRowVersions(TestMultiVersions.java:143)
Integrated in HBase-TRUNK #2698 (See https://builds.apache.org/job/HBase-TRUNK/2698/)
HBASE-5673 The OOM problem of IPC client call cause all handle block; REAPPLY; V2 OF PATCH (Revision 1307277)
HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT – APPLIED WRONG PATCH (Revision 1307270)
HBASE-5673 The OOM problem of IPC client call cause all handle block (Revision 1307240)
Result = FAILURE
stack :
Files :
- /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
stack :
Files :
- /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
stack :
Files :
- /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
Integrated in HBase-0.94-security #6 (See https://builds.apache.org/job/HBase-0.94-security/6/)
HBASE-5673 The OOM problem of IPC client call cause all handle block; REAPPLY; V2 OF PATCH (Revision 1307276)
HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT – APPLIED WRONG PATCH (Revision 1307271)
HBASE-5673 The OOM problem of IPC client call cause all handle block (Revision 1307242)
Result = SUCCESS
stack :
Files :
- /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
stack :
Files :
- /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
stack :
Files :
- /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
Integrated in HBase-0.92 #346 (See https://builds.apache.org/job/HBase-0.92/346/)
HBASE-5673 The OOM problem of IPC client call cause all handle block; REAPPLY; V2 OF PATCH (Revision 1307275)
HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT – APPLIED WRONG PATCH (Revision 1307272)
Result = FAILURE
stack :
Files :
- /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
stack :
Files :
- /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
Integrated in HBase-0.94 #69 (See https://builds.apache.org/job/HBase-0.94/69/)
HBASE-5673 The OOM problem of IPC client call cause all handle block; REAPPLY; V2 OF PATCH (Revision 1307276)
HBASE-5673 The OOM problem of IPC client call cause all handle block; REVERT – APPLIED WRONG PATCH (Revision 1307271)
Result = FAILURE
stack :
Files :
- /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
stack :
Files :
- /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
No problem Xufeng.
I reverted the v1 patch from all places and then applied everywhere your v2 patch.
Integrated in HBase-0.94 #68 (See https://builds.apache.org/job/HBase-0.94/68/)
HBASE-5673 The OOM problem of IPC client call cause all handle block (Revision 1307242)
Result = FAILURE
stack :
Files :
- /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
Integrated in HBase-0.92 #345 (See https://builds.apache.org/job/HBase-0.92/345/)
HBASE-5673 The OOM problem of IPC client call cause all handle block (Revision 1307246)
Result = FAILURE
stack :
Files :
- /hbase/branches/0.92/CHANGES.txt
- /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
Integrated in HBase-TRUNK-security #154 (See https://builds.apache.org/job/HBase-TRUNK-security/154/)
HBASE-5673 The OOM problem of IPC client call cause all handle block (Revision 1307240)
Result = FAILURE
stack :
Files :
- /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
@Stack
Sorry,I attach a wrong patch.
I updated it.name:HBASE-5673-90-V2.patch
Committed to 0.90, 0.92, 0.94 and trunk. Thanks for the patch Xufeng
this patch for 0.90
unit test is running now.
I have checked the code of trunk and 0.92 and also have this issue.
pls review the 90 patch and give me some suggestions,thanks.
Step 4 miss some logs info:
java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:640) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:351)
I found this issue in my cluster.
1.I found any regionserver call not report to master because sockettimeout.
[2012-03-26 14:48:09,815] [INFO ] [regionserver20020] [org.apache.hadoop.hbase.regionserver.HRegionServer 1469] Attempting connect to Master server at DDB03:20000 [2012-03-26 14:49:09,818] [INFO ] [regionserver20020] [org.apache.hadoop.ipc.HbaseRPC 360] Problem connecting to server: DDB03/192.168.28.53:20000 [2012-03-26 14:49:09,819] [WARN ] [regionserver20020] [org.apache.hadoop.hbase.regionserver.HRegionServer 1483] Unable to connect to master. Retrying. Error was: java.net.SocketTimeoutException: Call to DDB03/192.168.28.53:20000 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.28.53:59520 remote=DDB03/192.168.28.53:20000]
2.through the jstack log of master,I found that one handle is waitting and others is blocked(waitForMeta).
。。。。。。。。。。。。 "IPC Server handler 90 on 20000" daemon prio=10 tid=0x00007f219c540000 nid=0x4c3f in Object.wait() [0x00007f21963a7000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:485) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:757) 。。。。。。。。。 "IPC Server handler 87 on 20000" daemon prio=10 tid=0x00007f219c53a000 nid=0x4c37 waiting for monitor entry [0x00007f21966aa000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:397) - waiting to lock <0x0000000612486960> (a java.util.concurrent.atomic.AtomicBoolean) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:437) 。。。。。。。。。。。
3.I also ensure that the waitting handle cause the others blocked,the waitting handle is waitting for the call to complete.
4.But the unable to create new native thread” happened, the IOException can not caught it.
protected synchronized void setupIOstreams() throws IOException { 。。。。 start(); } catch (IOException e) { markClosed(e); close(); throw e; } 。。。。。
5.thus the call will be lost in call queue and never to complete.
public Writable call(......) { ...... synchronized (call) { while (!call.done) { try { call.wait(); // wait for the result } catch (InterruptedException ignored) { // save the fact that we were interrupted interrupted = true; } } ...... }
Patch doesn't apply anymore, unmarking as available.