Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • HBASE-14850
    • None
    • None

    Description

      RPC layer should handle various communication abnormalities (e.g. connection timeout, server aborted connection, and so on). Ideally, the corresponding exceptions should be raised and propagated through handlers of pipeline in client.

      Attachments

        1. HBASE-18078.000.patch
          12 kB
          Xiaobing Zhou
        2. HBASE-18078.001.patch
          16 kB
          Xiaobing Zhou
        3. HBASE-18078.002.patch
          14 kB
          Xiaobing Zhou
        4. HBASE-18078.003.patch
          32 kB
          Xiaobing Zhou
        5. HBASE-18078.004.patch
          14 kB
          Xiaobing Zhou
        6. HBASE-18078.005.patch
          27 kB
          Xiaobing Zhou
        7. HBASE-18078.006.patch
          10 kB
          Xiaobing Zhou
        8. HBASE-18078.007.patch
          12 kB
          Xiaobing Zhou
        9. HBASE-18078.008.patch
          21 kB
          Xiaobing Zhou

        Issue Links

        Activity

          enis Enis Soztutar added a comment -

          Great. I think we can detect server going away due to TPC connection closure. However, we should have the RPC timeout as implemented in the Java client. I think wangle might already have something to build the timeouts on.

          enis Enis Soztutar added a comment - Great. I think we can detect server going away due to TPC connection closure. However, we should have the RPC timeout as implemented in the Java client. I think wangle might already have something to build the timeouts on.
          xiaobingo Xiaobing Zhou added a comment -

          Posted v0 patch to handle connection related issues, such as timeout.

          The idea is to catch and propagate ConnectionException up. Therefore, from bottom of stack, whatever issues from establishing connection (i.e. Wangle pipeline) will be folly::AsyncSocketException wrapped within ConnectionException.

          Note that there's no way to set socket options in Wangle ClientBootstrap, but fortunately, ClientBootstrap::connect supports setting our own timeout which is enough for now.

          See also ConnectionFactory::Connect

          try {
          	  /* any connection error (e.g. timeout) will be folly::AsyncSocketException */
              auto pipeline = client->connect(
                  SocketAddress(hostname, port, true),
                  std::chrono::duration_cast<milliseconds>(connect_timeout_)).get();
              auto dispatcher = std::make_shared<ClientDispatcher>();
              dispatcher->setPipeline(pipeline);
              promise.setValue(dispatcher);
          	} catch(const folly::AsyncSocketException &e) {
          	  promise.setException(
          	      folly::make_exception_wrapper<hbase::ConnectionException>(
          	          folly::make_exception_wrapper<folly::AsyncSocketException>(e)));
          	}
          
          xiaobingo Xiaobing Zhou added a comment - Posted v0 patch to handle connection related issues, such as timeout. The idea is to catch and propagate ConnectionException up. Therefore, from bottom of stack, whatever issues from establishing connection (i.e. Wangle pipeline) will be folly::AsyncSocketException wrapped within ConnectionException. Note that there's no way to set socket options in Wangle ClientBootstrap, but fortunately, ClientBootstrap::connect supports setting our own timeout which is enough for now. See also ConnectionFactory::Connect try { /* any connection error (e.g. timeout) will be folly::AsyncSocketException */ auto pipeline = client->connect( SocketAddress(hostname, port, true ), std::chrono::duration_cast<milliseconds>(connect_timeout_)).get(); auto dispatcher = std::make_shared<ClientDispatcher>(); dispatcher->setPipeline(pipeline); promise.setValue(dispatcher); } catch ( const folly::AsyncSocketException &e) { promise.setException( folly::make_exception_wrapper<hbase::ConnectionException>( folly::make_exception_wrapper<folly::AsyncSocketException>(e))); }
          xiaobingo Xiaobing Zhou added a comment -

          Coming next are:

          1. apply the APIs (e.g. ConnectionFactory and ConnectionPool) changes to all related places.
          2. In order to simulate various abnormal behaviors, it necessitates building a TestRpcServer, which can be started from samples in BootstrapTest
          3. handle other abnormal cases such as defined in folly::AsyncSocketException
            class AsyncSocketException : public std::runtime_error {
             public:
              enum AsyncSocketExceptionType {
                UNKNOWN = 0,
                NOT_OPEN = 1,
                ALREADY_OPEN = 2,
                TIMED_OUT = 3,
                END_OF_FILE = 4,
                INTERRUPTED = 5,
                BAD_ARGS = 6,
                CORRUPTED_DATA = 7,
                INTERNAL_ERROR = 8,
                NOT_SUPPORTED = 9,
                INVALID_STATE = 10,
                SSL_ERROR = 12,
                COULD_NOT_BIND = 13,
                SASL_HANDSHAKE_TIMEOUT = 14,
                NETWORK_ERROR = 15
              };
            
          4. add unit tests to cover them.
          xiaobingo Xiaobing Zhou added a comment - Coming next are: apply the APIs (e.g. ConnectionFactory and ConnectionPool) changes to all related places. In order to simulate various abnormal behaviors, it necessitates building a TestRpcServer, which can be started from samples in BootstrapTest handle other abnormal cases such as defined in folly::AsyncSocketException class AsyncSocketException : public std::runtime_error { public : enum AsyncSocketExceptionType { UNKNOWN = 0, NOT_OPEN = 1, ALREADY_OPEN = 2, TIMED_OUT = 3, END_OF_FILE = 4, INTERRUPTED = 5, BAD_ARGS = 6, CORRUPTED_DATA = 7, INTERNAL_ERROR = 8, NOT_SUPPORTED = 9, INVALID_STATE = 10, SSL_ERROR = 12, COULD_NOT_BIND = 13, SASL_HANDSHAKE_TIMEOUT = 14, NETWORK_ERROR = 15 }; add unit tests to cover them.
          xiaobingo Xiaobing Zhou added a comment -

          v1 resolves #1.

          xiaobingo Xiaobing Zhou added a comment - v1 resolves #1.
          xiaobingo Xiaobing Zhou added a comment -

          v2 fixed connection pool test failure.

          xiaobingo Xiaobing Zhou added a comment - v2 fixed connection pool test failure.
          enis Enis Soztutar added a comment -
          • You are still calling Future::get() here in the ConnectionFactory::AsyncConnect, no?
            +    auto pipeline = client->connect(
            +        SocketAddress(hostname, port, true),
            +        std::chrono::duration_cast<milliseconds>(connect_timeout_)).get();
            
          • If this patch is not changing the blocking nature of our TCP connection establishment, maybe we should not introduce these methods, especially ConnectionPool::AsyncGetNewConnection which seems to be a copy of the other method.
          enis Enis Soztutar added a comment - You are still calling Future::get() here in the ConnectionFactory::AsyncConnect , no? + auto pipeline = client->connect( + SocketAddress(hostname, port, true ), + std::chrono::duration_cast<milliseconds>(connect_timeout_)).get(); If this patch is not changing the blocking nature of our TCP connection establishment, maybe we should not introduce these methods, especially ConnectionPool::AsyncGetNewConnection which seems to be a copy of the other method.
          xiaobingo Xiaobing Zhou added a comment - - edited

          Posted v3.

          1. fixed rebase conflicts
          2. added RPC test server skeleton

          Enis Soztutar you are right, we are not going change behaviors of connection establishment. The initial attempt of AsyncConnect is to satisfy connection pool or pass the existing unit tests. I will get back by removing the async later on. Thanks.

          xiaobingo Xiaobing Zhou added a comment - - edited Posted v3. fixed rebase conflicts added RPC test server skeleton Enis Soztutar you are right, we are not going change behaviors of connection establishment. The initial attempt of AsyncConnect is to satisfy connection pool or pass the existing unit tests. I will get back by removing the async later on. Thanks.
          xiaobingo Xiaobing Zhou added a comment -

          This ticket will not contain anything related to TestRpcServer, it will be in HBASE-18338.

          xiaobingo Xiaobing Zhou added a comment - This ticket will not contain anything related to TestRpcServer, it will be in HBASE-18338 .
          xiaobingo Xiaobing Zhou added a comment -

          Posted v4:

          1. removed RPC test related pieces.
          2. did some clean work.
          xiaobingo Xiaobing Zhou added a comment - Posted v4: removed RPC test related pieces. did some clean work.
          xiaobingo Xiaobing Zhou added a comment -

          Posted v5:

          1. added two functions (i.e. ping and addr) implementation in test_rpc_service.proto
          2. rebased HBASE-18338
          3. did some refactoring

          v6 will come with error implementation that triggers ConnectionException.

          xiaobingo Xiaobing Zhou added a comment - Posted v5: added two functions (i.e. ping and addr) implementation in test_rpc_service.proto rebased HBASE-18338 did some refactoring v6 will come with error implementation that triggers ConnectionException.
          xiaobingo Xiaobing Zhou added a comment -

          Posted v6:

          1. Catch AsyncSocketException in ConnectionFactory::Connect, and throw it as ConnectionException, which will be propagated up to ConnectionPool::GetConnection and RpcClient::SendRequest, finally caller of RpcClient will get ConnectionException.
          2. While writing data down the pipeline, RpcConnection::SendRequest will also encounter AsyncSocketException, similarly, it's propagated to caller of RpcClient.

          The patch considered two cases:

          1. AsyncSocketException as a result of the 1st time connection establishment.
          2. ConnectionException (i.e. AsyncSocketException as a cause) for Request/Response async call after the corresponding connection is established.
          xiaobingo Xiaobing Zhou added a comment - Posted v6: Catch AsyncSocketException in ConnectionFactory::Connect, and throw it as ConnectionException, which will be propagated up to ConnectionPool::GetConnection and RpcClient::SendRequest, finally caller of RpcClient will get ConnectionException. While writing data down the pipeline, RpcConnection::SendRequest will also encounter AsyncSocketException, similarly, it's propagated to caller of RpcClient. The patch considered two cases: AsyncSocketException as a result of the 1st time connection establishment. ConnectionException (i.e. AsyncSocketException as a cause) for Request/Response async call after the corresponding connection is established.
          xiaobingo Xiaobing Zhou added a comment -

          v7:

          1. added RpcClient::GetFutureWithException
          2. fixe broken tests due to v6

          Extra tests come with next patch.

          xiaobingo Xiaobing Zhou added a comment - v7: added RpcClient::GetFutureWithException fixe broken tests due to v6 Extra tests come with next patch.
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          -1 patch 0m 5s HBASE-18078 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/0.4.0/precommit-patchnames for help.



          This message was automatically generated.

          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. -1 patch 0m 5s HBASE-18078 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/0.4.0/precommit-patchnames for help. Subsystem Report/Notes JIRA Issue HBASE-18078 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12880339/HBASE-18078.007.patch Console output https://builds.apache.org/job/PreCommit-HBASE-Build/7917/console Powered by Apache Yetus 0.4.0 http://yetus.apache.org This message was automatically generated.
          enis Enis Soztutar added a comment -

          Thanks Xiaobing Zhou for the updated patch.
          As we were talking offline, we are not gonna do ConnectionRetryPolicy. Upper level retrying from Rpc Retrying Callers will work just fine for re-establishing the connection.

          + * TODO:
          + * This function should plug in ConnectionRetryPolicy to handle:
          +
          
          • We can do this as VLOG(1):
            +          VLOG(3) << folly::sformat("RpcClient Exception: {}", ew.what());
            
          • Do we need to catch the AsyncSocketException at this level as well:
            +    return GetConnection(remote_id)
            +        ->SendRequest(std::move(req))
            +        .onError([&, this](const folly::exception_wrapper& ew) {
            

            or below layers already handle it in every case and rethrow it as ConnectionException.

          Other than these, patch looks good. We need the unit tests for committing it.

          enis Enis Soztutar added a comment - Thanks Xiaobing Zhou for the updated patch. As we were talking offline, we are not gonna do ConnectionRetryPolicy. Upper level retrying from Rpc Retrying Callers will work just fine for re-establishing the connection. + * TODO: + * This function should plug in ConnectionRetryPolicy to handle: + We can do this as VLOG(1): + VLOG(3) << folly::sformat( "RpcClient Exception: {}" , ew.what()); Do we need to catch the AsyncSocketException at this level as well: + return GetConnection(remote_id) + ->SendRequest(std::move(req)) + .onError([&, this ]( const folly::exception_wrapper& ew) { or below layers already handle it in every case and rethrow it as ConnectionException. Other than these, patch looks good. We need the unit tests for committing it.
          xiaobingo Xiaobing Zhou added a comment -

          Thanks for review. v8 is posted.

          1. removed ConnectionRetryPolicy comment
          2. used VLOG(3) for RpcClient Exception

          In addition:

          1. added socketNotOpen function in test_rpc_service.proto
          2. added unit tests for socketNotOpen

          In RpcClient::SendRequest, we need to handle two cases for ConnectionException:

          1. The first time connection establishment, i.e. GetConnection(remote_id), AsyncSocketException being a cause.
          2. Writing request down the pipeline, i.e. RpcConnection::SendRequest, AsyncSocketException being a cause as well.

          The socketNotOpen covers the case of GetConnection(remote_id). Another test is needed to cover RpcConnection::SendRequest.

          xiaobingo Xiaobing Zhou added a comment - Thanks for review. v8 is posted. removed ConnectionRetryPolicy comment used VLOG(3) for RpcClient Exception In addition: added socketNotOpen function in test_rpc_service.proto added unit tests for socketNotOpen In RpcClient::SendRequest, we need to handle two cases for ConnectionException: The first time connection establishment, i.e. GetConnection(remote_id), AsyncSocketException being a cause. Writing request down the pipeline, i.e. RpcConnection::SendRequest, AsyncSocketException being a cause as well. The socketNotOpen covers the case of GetConnection(remote_id). Another test is needed to cover RpcConnection::SendRequest.
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          -1 patch 1m 41s HBASE-18078 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/0.4.0/precommit-patchnames for help.



          This message was automatically generated.

          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. -1 patch 1m 41s HBASE-18078 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/0.4.0/precommit-patchnames for help. Subsystem Report/Notes JIRA Issue HBASE-18078 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12880452/HBASE-18078.008.patch Console output https://builds.apache.org/job/PreCommit-HBASE-Build/7933/console Powered by Apache Yetus 0.4.0 http://yetus.apache.org This message was automatically generated.
          xiaobingo Xiaobing Zhou added a comment -

          Posted v9:

          1. added fault injection infra, see also RpcFaultInjector and RpcClientFaultInjector
          2. inject fault to close pipeline in ClientDispatcher::operator() to simulate the scenario to get AsyncSocketException after creating connection and before sending request
          3. added unit test

          AsyncSocketException is expected, however, Broken promise for type unique_ptr<Response> is returned, will debug into this.

          xiaobingo Xiaobing Zhou added a comment - Posted v9: added fault injection infra, see also RpcFaultInjector and RpcClientFaultInjector inject fault to close pipeline in ClientDispatcher::operator() to simulate the scenario to get AsyncSocketException after creating connection and before sending request added unit test AsyncSocketException is expected, however, Broken promise for type unique_ptr<Response> is returned, will debug into this.
          xiaobingo Xiaobing Zhou added a comment -

          Since the behavior of closing socket before sending request and after getting connection is beyond expectation (i.e.Broken promise for type unique_ptr<Response>), I've decided to move this case to separate ticket.

          v8 is the work to be committed, v9 is dropped.

          xiaobingo Xiaobing Zhou added a comment - Since the behavior of closing socket before sending request and after getting connection is beyond expectation (i.e.Broken promise for type unique_ptr<Response>), I've decided to move this case to separate ticket. v8 is the work to be committed, v9 is dropped.
          enis Enis Soztutar added a comment -

          I have committed the v8 version which already addresses the review comments. HBASE-18204 builds on top of this. Thanks Xiaobing Zhou for the patch.

          enis Enis Soztutar added a comment - I have committed the v8 version which already addresses the review comments. HBASE-18204 builds on top of this. Thanks Xiaobing Zhou for the patch.
          enis Enis Soztutar added a comment -

          Xiaobing Zhou rpc-test.cc is failing with broken promise after this. Do you mind to take a look. Thanks.

          enis Enis Soztutar added a comment - Xiaobing Zhou rpc-test.cc is failing with broken promise after this. Do you mind to take a look. Thanks.

          People

            xiaobingo Xiaobing Zhou
            xiaobingo Xiaobing Zhou
            Votes:
            0 Vote for this issue
            Watchers:
            Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                In order to see discussions, first confirm access to your Slack account(s) in the following workspace(s): ASF