Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.99.0
-
None
-
Reviewed
-
HideIf hbase.ipc.client.allowsInterrupt is set to true (default being false), the writes are performed in a different thread. This workarounds a Java limitation with interruptions and i/o; and limits the impact of interrupting a client call. It's strongly recommended to activate this parameter when using tables with multiple replicas.ShowIf hbase.ipc.client.allowsInterrupt is set to true (default being false), the writes are performed in a different thread. This workarounds a Java limitation with interruptions and i/o; and limits the impact of interrupting a client call. It's strongly recommended to activate this parameter when using tables with multiple replicas.
Description
This is an issue in the HBASE-10070 context, but as well more generally if you want to interrupt an operation with a limited cost.
I will attach a doc with a more detailed explanation.
This adds a thread per region server; so it's otional. The first patch activates it by default to see how it behaves on a full hadoop-qa run. The target is to be unset by default.
Attachments
Attachments
- 10525.v1.patch
- Delete this attachment
- 9 kB
- Nicolas Liochon
- 10525.v2.patch
- Delete this attachment
- 17 kB
- Nicolas Liochon
- 10525.v3.patch
- Delete this attachment
- 17 kB
- Nicolas Liochon
- 10525.v4.patch
- Delete this attachment
- 29 kB
- Nicolas Liochon
- 10525.v5.patch
- Delete this attachment
- 28 kB
- Nicolas Liochon
- 10525.v6.patch
- Delete this attachment
- 37 kB
- Nicolas Liochon
- 10525.v7.patch
- Delete this attachment
- 40 kB
- Nicolas Liochon
- HBaseclient-EventualConsistency.pdf
- Delete this attachment
- 160 kB
- Nicolas Liochon
Issue Links
- is related to
-
HBASE-10355 Failover RPC's from client using region replicas
- Closed
- is required by
-
HBASE-10070 HBase read high-availability using timeline-consistent region replicas
- Closed
Activity
DEATH_PILL smile
Something wrong here N:
+ * We can either write the call directly on the socket, either delegate this to
+ * a different threads. Ultimately, this will allow to have a set of writer & reader
+ * for the whole cluster.
+ * Using a different thread allows the client thread to be interrupted w/o any impact. If
+ * the client threads writes on the sockets, an interruption will close this socket.
I dont follow the above exactly? On the face of it, yet another thread in client seems crazy! But your reasoning above seems good
ThreadCallSender should be CallRunner or Call Executor? Or CallNanny Can it be a static class?
Imlement Closeable since it has a close? Not necessary at all but...
No spaces in here: + setName(name + " - writer");
What will 'name' be?
+ callsToWrite.remove(cts); Do you have to cancel the future itself too?
What happens to outstanding 'calls' when cancel called? How they stopped?
Can you write up a big class comment on the mechanism you are instituting here so it is clear what is going on both for reviewers and for the folks who come along afterward trying to make sense of it all. Good on you Nicolas Liochon
I dont follow the above exactly?
It's basically a Java issue. Interrupting a process that is doing some i/o is complex, and difficult to do in the same way on all platforms. So the JVM guys went for the solution that was implementable on all platforms: close the socket when there is an interrupt.
What will 'name' be?
The same name as the reader, postfixed with 'writer').
yet another thread in client seems crazy
Yeah. But I think that we're not far from being able to put a single pool for all the region servers, instead of 1 thread per region server as today (and with this new option on). There is this nasty rpcTimeout, and then we're clean enough to do the change I think. It's something I would like to do with a direct buffer write. Ir's more or less mandatory to scale on large sized cluster imho.
ThreadCallSender should be CallRunner or Call Executor
I don't know. It does not execute much, it just writes on the socket. It uses the output stream of the connection, so it has to be an instance if I want to limit the impact on the existing code.
+ callsToWrite.remove(cts); Do you have to cancel the future itself too?
setting call.done allows the reading thread to skip this part
removing it from the callsToWrite list allows us to not send the call to the remote server at all.
So we do both.
Now if the call was sent to the server, it's not stopped (at least with this patch , just that we will skip the answer. Canceling on the server is interesting, but on the other hand it's another rpc call.
Imlement Closeable since it has a close? Not necessary at all but...
agreed. Will do.
Can you write up a big class comment on the mechanism you are instituting here so it is clear what is going on both for reviewers and for the folks who come along afterward trying to make sense of it all
Sure. I'm polishing a doc as well that I will put in this jira.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12628775/10525.v2.patch
against trunk revision .
ATTACHMENT ID: 12628775
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
+1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.
+1 hadoop1.1. The patch compiles against the hadoop 1.1 profile.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 lineLengths. The patch does not introduce lines longer than 100
-1 site. The patch appears to cause mvn site goal to fail.
-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.trace.TestHTraceHooks
Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8684//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8684//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8684//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8684//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8684//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8684//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8684//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8684//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8684//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8684//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8684//console
This message is automatically generated.
v2
comments above taken into account
htace hooks added (is that the right way to do that, Elliott Neil Clark ?)
optimization / race condition fixed in cleanupCalls
simplification in waitForWork
I will push a v3 with the option set to false to get an hadoop qa run with the two settings
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12629577/10525.v2.patch
against trunk revision .
ATTACHMENT ID: 12629577
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
+1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.
+1 hadoop1.1. The patch compiles against the hadoop 1.1 profile.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 lineLengths. The patch does not introduce lines longer than 100
+1 site. The mvn site goal succeeds with this patch.
-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.util.TestHBaseFsck
Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8735//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8735//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8735//console
This message is automatically generated.
same as v2 excepted that the new behavior is disabled by default.
It's what I would like to commit.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12629790/10525.v3.patch
against trunk revision .
ATTACHMENT ID: 12629790
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
+1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.
+1 hadoop1.1. The patch compiles against the hadoop 1.1 profile.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 lineLengths. The patch does not introduce lines longer than 100
+1 site. The mvn site goal succeeds with this patch.
-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.util.TestHBaseFsck
Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8747//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8747//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8747//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8747//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8747//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8747//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8747//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8747//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8747//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8747//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8747//console
This message is automatically generated.
I was testing this, and it seems that there is an issue when the connection is closed (RS killed). it maybe that the CallSender can still accept more calls even after Connection is closed and shouldCloseConnection is set. In case CallSender.cleanUp() runs first, and clears the queue, any call added to the CallSeender queue will not get notification, thus will hang. I think we may need to sync on CallSender accepting Calls with with the Connection.out. Other ways might also be possible.
Here is a dump where the RPC callee threads are just hanging forever:
"htable-pool15-t2" daemon prio=10 tid=0x0000000001f2e800 nid=0x1bc3 in Object.wait() [0x00007fca48f72000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1435) - locked <0x00000000bd6a0230> (a org.apache.hadoop.hbase.ipc.RpcClient$Call) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1655) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1713) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.multi(ClientProtos.java:29300) at org.apache.hadoop.hbase.client.MultiServerCallable.call(MultiServerCallable.java:125) at org.apache.hadoop.hbase.client.MultiServerCallable.call(MultiServerCallable.java:53) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:186) at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncProcess.java:658) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) "htable-pool23-t2" daemon prio=10 tid=0x00007fca4c5db800 nid=0x1bbb in Object.wait() [0x00007fca49779000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1435) - locked <0x00000000bd6a0578> (a org.apache.hadoop.hbase.ipc.RpcClient$Call) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1655) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1713) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.multi(ClientProtos.java:29300) at org.apache.hadoop.hbase.client.MultiServerCallable.call(MultiServerCallable.java:125) at org.apache.hadoop.hbase.client.MultiServerCallable.call(MultiServerCallable.java:53) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:186) at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncProcess.java:658) at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.sendMultiAction(AsyncProcess.java:850) at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.groupAndSendMultiAction(AsyncProcess.java:824) at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.logAndResubmit(AsyncProcess.java:998) at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.receiveGlobalFailure(AsyncProcess.java:952) at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl.access$1000(AsyncProcess.java:546) at org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncProcess.java:662) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) "HBaseWriterThread_13" prio=10 tid=0x00007fca5cf68000 nid=0x1b95 in Object.wait() [0x00007fca4bda0000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at org.apache.hadoop.hbase.client.AsyncProcess.waitForMaximumCurrentTasks(AsyncProcess.java:1383) - locked <0x00000000cc21a7e0> (a java.util.concurrent.atomic.AtomicLong) at org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1412) at org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:1027) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1298) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:959) at org.apache.hadoop.hbase.util.MultiThreadedWriter$HBaseWriterThread.insert(MultiThreadedWriter.java:143) at org.apache.hadoop.hbase.util.MultiThreadedWriter$HBaseWriterThread.run(MultiThreadedWriter.java:108)
Thanks a lot for the detailed analysis, Enis. v4 fixes it and adds a test.
This scenario was ok already:
(connection marked as close)
reader close - cleanup expected reads
writer close - cleanup calls to writes
There is no possible race condition between the two, because we check the connection status in writeRequest. If you did the cleanup, it means that connectionClosed was true, so the writeRequest will fail.
another scenario was NOT ok:
new client get connection - connection is opened
(connection marked as closed)
reader close - cleanup expected reads
writer close - cleanup calls to writes
client thread add call to calls to write
The solution is to add a check after we add the call, exactly as in the first scenario. This does not require another synchronization point.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12630060/10525.v4.patch
against trunk revision .
ATTACHMENT ID: 12630060
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified tests.
+1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.
+1 hadoop1.1. The patch compiles against the hadoop 1.1 profile.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 lineLengths. The patch does not introduce lines longer than 100
+1 site. The mvn site goal succeeds with this patch.
-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.client.TestHCM
org.apache.hadoop.hbase.util.TestHBaseFsck
Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8756//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8756//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8756//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8756//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8756//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8756//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8756//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8756//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8756//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8756//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8756//console
This message is automatically generated.
conflict on the name for testHCM. Will fix.
For Fsck, seems flaky these days. Message is: java.lang.AssertionError: expected: EXPIRED_TABLE_LOCK but was:UNKNOWN, NO_META_REGION, RS_CONNECT_FAILURE, RS_CONNECT_FAILURE, RS_CONNECT_FAILURE, EXPIRED_TABLE_LOCK at org.apache.hadoop.hbase.util.TestHBaseFsck.testCheckTableLocks (TestHBaseFsck.java:2072)
And the issue with TestHBaseFsck comes from the EnvironmentEdgeManager; the test changes the time, it impacts the RpcClient....
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12630090/10525.v5.patch
against trunk revision .
ATTACHMENT ID: 12630090
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified tests.
+1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.
+1 hadoop1.1. The patch compiles against the hadoop 1.1 profile.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 lineLengths. The patch does not introduce lines longer than 100
+1 site. The mvn site goal succeeds with this patch.
-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.util.TestHBaseFsck
Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8757//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8757//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8757//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8757//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8757//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8757//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8757//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8757//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8757//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8757//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8757//console
This message is automatically generated.
Thanks Nicolas.
The solution is to add a check after we add the call, exactly as in the first scenario. This does not require another synchronization point.
Agreed. The patch seems to be fixing this.
Is this sleep(1) intended, or left from debugging?
- connection.writeRequest(call, priority);
+ Thread.sleep(1);
This change might have some consequences, because the connection threads will be kept for much longer. Can we instead do this change only on the test?
- conf.getInt("hbase.ipc.client.connection.minIdleTimeBeforeClose", 120000); // 2 minutes + conf.getInt("hbase.ipc.client.connection.minIdleTimeBeforeClose", 1200000); // 20 minutes
Is this sleep(1) intended, or left from debugging?
Oops. left from debugging.
minIdleTimeBeforeClose", 120000
Yes, I've changed it to see if it was the issue with TestHBaseFsck: if the connection becomes idle, the TestHBaseFsck fails. I will write a test on the RpcClient alone: there is no test for this feature. I can update it for the test itself, but I also need to check what it means for Fsck itself: may be it relies on having the same tcp connection. Lastly, the control of the idle connection is new (previously we were mixing bugs and client pings to keep the connection alive), I don't know what is the best value (but 2 minutes seems much better than 20 for sure)
I've tested v5 (minus the sleep(1)) with
hbase org.apache.hadoop.hbase.IntegrationTestIngest -Dhbase.ipc.client.allowsInterrupt=true --monkey unbalance
which passes.
Ok the bad scenario for TestFsck was:
starts
idle time reached but connection not yet reclaimed
fsck does a new call
at this moment the connection is closed as idle
this should lead to a retry, and we would be done, but it's not enough for TestFsck
I solved this by managing this specific case explicitly: if the connection was idle but not yet closed and we receive a call, we don't close the connection
Test added as well.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12630287/10525.v6.patch
against trunk revision .
ATTACHMENT ID: 12630287
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified tests.
+1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.
+1 hadoop1.1. The patch compiles against the hadoop 1.1 profile.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
-1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 lineLengths. The patch does not introduce lines longer than 100
+1 site. The mvn site goal succeeds with this patch.
+1 core tests. The patch passed unit tests in .
Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8767//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8767//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8767//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8767//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8767//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8767//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8767//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8767//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8767//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8767//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8767//console
This message is automatically generated.
v7 removes the findbugs warning by adding it to the exclusion list. It's useless in this class.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12630316/10525.v7.patch
against trunk revision .
ATTACHMENT ID: 12630316
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified tests.
+1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.
+1 hadoop1.1. The patch compiles against the hadoop 1.1 profile.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 lineLengths. The patch does not introduce lines longer than 100
+1 site. The mvn site goal succeeds with this patch.
+1 core tests. The patch passed unit tests in .
Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8768//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8768//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8768//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8768//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8768//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8768//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8768//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8768//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8768//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8768//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8768//console
This message is automatically generated.
A green build. How cute.
Any +1 around ((I will commit with the option NOT activated)?
One quick question - when the connection.interrupt() is invoked the reader thread gets it.. What happens to the writer thread if it was waiting in the callsToWrite.take() call.
when the connection.interrupt() is invoked the reader thread gets it.. What happens to the writer thread if it was waiting in the callsToWrite.take() call.
In #markClosed, we put a Call to the queue (the DEATH_PILL), this way the writers exits the 'take' method. The reader thread calls #markClosed on any exception, interruptions included.
Committed to trunk, thanks for the reviews, all. (Devaraj, I understood your comment as 'ok if', I can obviously revert / amend if you want more time for the review.). Same goes for everyone who wants to chime in: this code is obviously critical & complex.
FAILURE: Integrated in HBase-TRUNK #4947 (See https://builds.apache.org/job/HBase-TRUNK/4947/)
HBASE-10525 Allow the client to use a different thread for writing to ease interrupt (nkeywal: rev 1571210)
- /hbase/trunk/dev-support/findbugs-exclude.xml
- /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionManager.java
- /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClient.java
- /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
- /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java
Yes, Nicolas Liochon, i just had that question. Thanks for the clarification.
What is idea of this:
-Dhbase.ipc.client.allowsInterrupt=true
Why would we not have interrupt on by default given you have done all this work to make it work?
Did a quick scan... looks good.
FAILURE: Integrated in HBase-TRUNK-on-Hadoop-1.1 #98 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/98/)
HBASE-10525 Allow the client to use a different thread for writing to ease interrupt (nkeywal: rev 1571210)
- /hbase/trunk/dev-support/findbugs-exclude.xml
- /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionManager.java
- /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClient.java
- /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
- /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java
Why would we not have interrupt on by default given you have done all this work to make it work?
The name may not be perfect: when the option is not activated, an interrupt may cut the tcp connection. activated, the write will be in a different thread, so it won't be cut. However, it adds a Thread per region server, so I didn't activated it by default. In interruption will still be possible, but if they cut the tcp connection, the calls in progress will have to be retried.
Now that the rpc timeout issue is fixed, I hope to be able to have a single thread pool, so then the option will become useless, but there are a few days of work ahead before removing this... The issue, as usual, is to have something that works well under failures...
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12628768/10525.v1.patch
against trunk revision .
ATTACHMENT ID: 12628768
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
+1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.
+1 hadoop1.1. The patch compiles against the hadoop 1.1 profile.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 lineLengths. The patch does not introduce lines longer than 100
-1 site. The patch appears to cause mvn site goal to fail.
-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.ipc.TestIPC
Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8682//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8682//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8682//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8682//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8682//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8682//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8682//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8682//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8682//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8682//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8682//console
This message is automatically generated.