Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
RPC layer should handle various communication abnormalities (e.g. connection timeout, server aborted connection, and so on). Ideally, the corresponding exceptions should be raised and propagated through handlers of pipeline in client.
Attachments
Attachments
- HBASE-18078.000.patch
- 12 kB
- Xiaobing Zhou
- HBASE-18078.001.patch
- 16 kB
- Xiaobing Zhou
- HBASE-18078.002.patch
- 14 kB
- Xiaobing Zhou
- HBASE-18078.003.patch
- 32 kB
- Xiaobing Zhou
- HBASE-18078.004.patch
- 14 kB
- Xiaobing Zhou
- HBASE-18078.005.patch
- 27 kB
- Xiaobing Zhou
- HBASE-18078.006.patch
- 10 kB
- Xiaobing Zhou
- HBASE-18078.007.patch
- 12 kB
- Xiaobing Zhou
- HBASE-18078.008.patch
- 21 kB
- Xiaobing Zhou
Issue Links
- relates to
-
HBASE-18245 Handle failed server in RpcClient
- Closed
Activity
Posted v0 patch to handle connection related issues, such as timeout.
The idea is to catch and propagate ConnectionException up. Therefore, from bottom of stack, whatever issues from establishing connection (i.e. Wangle pipeline) will be folly::AsyncSocketException wrapped within ConnectionException.
Note that there's no way to set socket options in Wangle ClientBootstrap, but fortunately, ClientBootstrap::connect supports setting our own timeout which is enough for now.
See also ConnectionFactory::Connect
try { /* any connection error (e.g. timeout) will be folly::AsyncSocketException */ auto pipeline = client->connect( SocketAddress(hostname, port, true), std::chrono::duration_cast<milliseconds>(connect_timeout_)).get(); auto dispatcher = std::make_shared<ClientDispatcher>(); dispatcher->setPipeline(pipeline); promise.setValue(dispatcher); } catch(const folly::AsyncSocketException &e) { promise.setException( folly::make_exception_wrapper<hbase::ConnectionException>( folly::make_exception_wrapper<folly::AsyncSocketException>(e))); }
Coming next are:
- apply the APIs (e.g. ConnectionFactory and ConnectionPool) changes to all related places.
- In order to simulate various abnormal behaviors, it necessitates building a TestRpcServer, which can be started from samples in BootstrapTest
- handle other abnormal cases such as defined in folly::AsyncSocketException
class AsyncSocketException : public std::runtime_error { public: enum AsyncSocketExceptionType { UNKNOWN = 0, NOT_OPEN = 1, ALREADY_OPEN = 2, TIMED_OUT = 3, END_OF_FILE = 4, INTERRUPTED = 5, BAD_ARGS = 6, CORRUPTED_DATA = 7, INTERNAL_ERROR = 8, NOT_SUPPORTED = 9, INVALID_STATE = 10, SSL_ERROR = 12, COULD_NOT_BIND = 13, SASL_HANDSHAKE_TIMEOUT = 14, NETWORK_ERROR = 15 };
- add unit tests to cover them.
- You are still calling Future::get() here in the ConnectionFactory::AsyncConnect, no?
+ auto pipeline = client->connect( + SocketAddress(hostname, port, true), + std::chrono::duration_cast<milliseconds>(connect_timeout_)).get();
- If this patch is not changing the blocking nature of our TCP connection establishment, maybe we should not introduce these methods, especially ConnectionPool::AsyncGetNewConnection which seems to be a copy of the other method.
Posted v3.
- fixed rebase conflicts
- added RPC test server skeleton
Enis Soztutar you are right, we are not going change behaviors of connection establishment. The initial attempt of AsyncConnect is to satisfy connection pool or pass the existing unit tests. I will get back by removing the async later on. Thanks.
This ticket will not contain anything related to TestRpcServer, it will be in HBASE-18338.
Posted v5:
- added two functions (i.e. ping and addr) implementation in test_rpc_service.proto
- rebased
HBASE-18338 - did some refactoring
v6 will come with error implementation that triggers ConnectionException.
Posted v6:
- Catch AsyncSocketException in ConnectionFactory::Connect, and throw it as ConnectionException, which will be propagated up to ConnectionPool::GetConnection and RpcClient::SendRequest, finally caller of RpcClient will get ConnectionException.
- While writing data down the pipeline, RpcConnection::SendRequest will also encounter AsyncSocketException, similarly, it's propagated to caller of RpcClient.
The patch considered two cases:
- AsyncSocketException as a result of the 1st time connection establishment.
- ConnectionException (i.e. AsyncSocketException as a cause) for Request/Response async call after the corresponding connection is established.
v7:
- added RpcClient::GetFutureWithException
- fixe broken tests due to v6
Extra tests come with next patch.
-1 overall |
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
0 | reexec | 0m 0s | Docker mode activated. |
-1 | patch | 0m 5s | |
Subsystem | Report/Notes |
---|---|
JIRA Issue | |
JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12880339/HBASE-18078.007.patch |
Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/7917/console |
Powered by | Apache Yetus 0.4.0 http://yetus.apache.org |
This message was automatically generated.
Thanks Xiaobing Zhou for the updated patch.
As we were talking offline, we are not gonna do ConnectionRetryPolicy. Upper level retrying from Rpc Retrying Callers will work just fine for re-establishing the connection.
+ * TODO: + * This function should plug in ConnectionRetryPolicy to handle: +
- We can do this as VLOG(1):
+ VLOG(3) << folly::sformat("RpcClient Exception: {}", ew.what());
- Do we need to catch the AsyncSocketException at this level as well:
+ return GetConnection(remote_id) + ->SendRequest(std::move(req)) + .onError([&, this](const folly::exception_wrapper& ew) {
or below layers already handle it in every case and rethrow it as ConnectionException.
Other than these, patch looks good. We need the unit tests for committing it.
Thanks for review. v8 is posted.
- removed ConnectionRetryPolicy comment
- used VLOG(3) for RpcClient Exception
In addition:
- added socketNotOpen function in test_rpc_service.proto
- added unit tests for socketNotOpen
In RpcClient::SendRequest, we need to handle two cases for ConnectionException:
- The first time connection establishment, i.e. GetConnection(remote_id), AsyncSocketException being a cause.
- Writing request down the pipeline, i.e. RpcConnection::SendRequest, AsyncSocketException being a cause as well.
The socketNotOpen covers the case of GetConnection(remote_id). Another test is needed to cover RpcConnection::SendRequest.
-1 overall |
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
0 | reexec | 0m 0s | Docker mode activated. |
-1 | patch | 1m 41s | |
Subsystem | Report/Notes |
---|---|
JIRA Issue | |
JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12880452/HBASE-18078.008.patch |
Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/7933/console |
Powered by | Apache Yetus 0.4.0 http://yetus.apache.org |
This message was automatically generated.
Posted v9:
- added fault injection infra, see also RpcFaultInjector and RpcClientFaultInjector
- inject fault to close pipeline in ClientDispatcher::operator() to simulate the scenario to get AsyncSocketException after creating connection and before sending request
- added unit test
AsyncSocketException is expected, however, Broken promise for type unique_ptr<Response> is returned, will debug into this.
Since the behavior of closing socket before sending request and after getting connection is beyond expectation (i.e.Broken promise for type unique_ptr<Response>), I've decided to move this case to separate ticket.
v8 is the work to be committed, v9 is dropped.
I have committed the v8 version which already addresses the review comments. HBASE-18204 builds on top of this. Thanks Xiaobing Zhou for the patch.
Xiaobing Zhou rpc-test.cc is failing with broken promise after this. Do you mind to take a look. Thanks.
Great. I think we can detect server going away due to TPC connection closure. However, we should have the RPC timeout as implemented in the Java client. I think wangle might already have something to build the timeouts on.