HBase
  1. HBase
  2. HBASE-11564

Improve cancellation management in the rpc layer

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0, 2.0.0
    • Fix Version/s: 0.99.0, 2.0.0
    • Component/s: Client
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The current client code depends on interrupting the thread for canceling a request. It's actually possible to rely on a callback in protobuf.

      The patch includes as well various performance improvements in replica management.

      On a version before HBASE-11492 the perf was ~35% better. I will redo the test with the last version.

      1. 11564.v1.patch
        39 kB
        Nicolas Liochon
      2. 11564.v2.patch
        40 kB
        Nicolas Liochon

        Issue Links

          Activity

          Hide
          Nicolas Liochon added a comment -

          v2 == I hijack this jira to fix a warning in BufferChain.

          Show
          Nicolas Liochon added a comment - v2 == I hijack this jira to fix a warning in BufferChain.
          Hide
          stack added a comment -

          Should we implement http://nick-lab.gs.washington.edu/java/jdk1.5b/api/java/util/concurrent/Cancellable.html ?

          If you make a second patch, remove the '-' in below since you went and edited javadoc

          + * @param cs - the completion service to use for submitting
          + * @param rl - the region locations
          + * @param min - the id of the first replica, inclusive
          + * @param max - the id of the last replica, inclusive.

          What does this mean client-side?

          + * This means as well that it can be used for a small set of tasks only.

          How small?

          Scanned patch. Looks great.

          Show
          stack added a comment - Should we implement http://nick-lab.gs.washington.edu/java/jdk1.5b/api/java/util/concurrent/Cancellable.html ? If you make a second patch, remove the '-' in below since you went and edited javadoc + * @param cs - the completion service to use for submitting + * @param rl - the region locations + * @param min - the id of the first replica, inclusive + * @param max - the id of the last replica, inclusive. What does this mean client-side? + * This means as well that it can be used for a small set of tasks only. How small? Scanned patch. Looks great.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12657136/11564.v2.patch
          against trunk revision .
          ATTACHMENT ID: 12657136

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          +1 site. The mvn site goal succeeds with this patch.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/10146//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10146//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10146//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10146//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10146//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10146//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10146//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10146//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10146//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10146//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/10146//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12657136/11564.v2.patch against trunk revision . ATTACHMENT ID: 12657136 +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 9 new or modified tests. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 javadoc . The javadoc tool appears to have generated 1 warning messages. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 site . The mvn site goal succeeds with this patch. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/10146//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10146//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10146//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10146//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10146//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10146//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10146//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10146//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10146//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10146//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/10146//console This message is automatically generated.
          Hide
          Nicolas Liochon added a comment -

          If you make a second patch, remove the '-' in below since you went and edited javadoc

          sure.

          How small?

          What a list can reasonably contains . As it's used in the replica path only it's 'enough'.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          Unrelated to this patch I think.

          Show
          Nicolas Liochon added a comment - If you make a second patch, remove the '-' in below since you went and edited javadoc sure. How small? What a list can reasonably contains . As it's used in the replica path only it's 'enough'. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. Unrelated to this patch I think.
          Hide
          Nicolas Liochon added a comment -

          I plan to commit this tomorrow if there is no objection.

          Show
          Nicolas Liochon added a comment - I plan to commit this tomorrow if there is no objection.
          Hide
          stack added a comment -

          I plan to commit this tomorrow if there is no objection.

          +1 if you've tested it Nicolas Liochon. Agree the javadoc unrelated (was fixed yesterday)

          Show
          stack added a comment - I plan to commit this tomorrow if there is no objection. +1 if you've tested it Nicolas Liochon . Agree the javadoc unrelated (was fixed yesterday)
          Hide
          Nicolas Liochon added a comment -

          Comparing the results with PE reads, 1m rows, 3 replicas and a 100 microseconds delay gives:

          • w/o the patch:
          • total: 210s
          • 95%: 426us
          • 99%: 550us
          • with the patch:
          • total: 174s (i.e. w/o the patch we're 20% slower)
          • 95%: 208us
          • 99%: 262us

          So it's nice. I still have some variation in the results so likely there is still room from improvement somewhere.

          Commit is on its way...

          Show
          Nicolas Liochon added a comment - Comparing the results with PE reads, 1m rows, 3 replicas and a 100 microseconds delay gives: w/o the patch: total: 210s 95%: 426us 99%: 550us with the patch: total: 174s (i.e. w/o the patch we're 20% slower) 95%: 208us 99%: 262us So it's nice. I still have some variation in the results so likely there is still room from improvement somewhere. Commit is on its way...
          Hide
          Nicolas Liochon added a comment -

          Committed to master & branch-1 (late ping to Enis Soztutar for this, tell me if you want more time to review it. I can revert if necessary)

          Show
          Nicolas Liochon added a comment - Committed to master & branch-1 (late ping to Enis Soztutar for this, tell me if you want more time to review it. I can revert if necessary)
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-TRUNK #5339 (See https://builds.apache.org/job/HBase-TRUNK/5339/)
          HBASE-11564 Improve cancellation management in the rpc layer (nkeywal: rev d8401c8e446dcef9ffb9c71f1d14413772f22c75)

          • hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCaller.java
          • hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/Subprocedure.java
          • hbase-client/src/main/java/org/apache/hadoop/hbase/client/ScannerCallableWithReplicas.java
          • hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/TimeLimitedRpcController.java
          • hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClient.java
          • hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestRegionReplicaPerf.java
          • hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCallerWithReadReplicas.java
          • hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java
          • hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/BufferChain.java
          • hbase-common/src/main/java/org/apache/hadoop/hbase/util/ExceptionUtil.java
          • hbase-server/src/test/java/org/apache/hadoop/hbase/ipc/TestIPC.java
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-TRUNK #5339 (See https://builds.apache.org/job/HBase-TRUNK/5339/ ) HBASE-11564 Improve cancellation management in the rpc layer (nkeywal: rev d8401c8e446dcef9ffb9c71f1d14413772f22c75) hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCaller.java hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/Subprocedure.java hbase-client/src/main/java/org/apache/hadoop/hbase/client/ScannerCallableWithReplicas.java hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/TimeLimitedRpcController.java hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClient.java hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestRegionReplicaPerf.java hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCallerWithReadReplicas.java hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/BufferChain.java hbase-common/src/main/java/org/apache/hadoop/hbase/util/ExceptionUtil.java hbase-server/src/test/java/org/apache/hadoop/hbase/ipc/TestIPC.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in HBase-1.0 #67 (See https://builds.apache.org/job/HBase-1.0/67/)
          HBASE-11564 Improve cancellation management in the rpc layer (nkeywal: rev d8562052a4f5c956a514becf6439442763387e86)

          • hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestRegionReplicaPerf.java
          • hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClient.java
          • hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/TimeLimitedRpcController.java
          • hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java
          • hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/Subprocedure.java
          • hbase-common/src/main/java/org/apache/hadoop/hbase/util/ExceptionUtil.java
          • hbase-server/src/test/java/org/apache/hadoop/hbase/ipc/TestIPC.java
          • hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCallerWithReadReplicas.java
          • hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/BufferChain.java
          • hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCaller.java
          • hbase-client/src/main/java/org/apache/hadoop/hbase/client/ScannerCallableWithReplicas.java
          Show
          Hudson added a comment - FAILURE: Integrated in HBase-1.0 #67 (See https://builds.apache.org/job/HBase-1.0/67/ ) HBASE-11564 Improve cancellation management in the rpc layer (nkeywal: rev d8562052a4f5c956a514becf6439442763387e86) hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestRegionReplicaPerf.java hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClient.java hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/TimeLimitedRpcController.java hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java hbase-server/src/main/java/org/apache/hadoop/hbase/procedure/Subprocedure.java hbase-common/src/main/java/org/apache/hadoop/hbase/util/ExceptionUtil.java hbase-server/src/test/java/org/apache/hadoop/hbase/ipc/TestIPC.java hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCallerWithReadReplicas.java hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/BufferChain.java hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCaller.java hbase-client/src/main/java/org/apache/hadoop/hbase/client/ScannerCallableWithReplicas.java
          Hide
          Enis Soztutar added a comment -

          This should be good for branch-1. But I am concerned about the change in ResultBoundedCompletionService that we changed from returning the first successful response from replicas to returning the first response (even though an exception). In testing, I think we ended up changing it to wait for all retries from all replicas to be consumed, because in some cases, the replica retries will just throw RetriesExhausted and we do not wait for success from other replicas. I might be mis-reading the patch though.

          Show
          Enis Soztutar added a comment - This should be good for branch-1. But I am concerned about the change in ResultBoundedCompletionService that we changed from returning the first successful response from replicas to returning the first response (even though an exception). In testing, I think we ended up changing it to wait for all retries from all replicas to be consumed, because in some cases, the replica retries will just throw RetriesExhausted and we do not wait for success from other replicas. I might be mis-reading the patch though.
          Hide
          Nicolas Liochon added a comment -

          I might be mis-reading the patch though.

          You're reading correctly
          I was not aware of this test issue. The RetriesExhausted should occur only after quite a lot of retries and sleep. I went for this change because the performance were better with it.

          Show
          Nicolas Liochon added a comment - I might be mis-reading the patch though. You're reading correctly I was not aware of this test issue. The RetriesExhausted should occur only after quite a lot of retries and sleep. I went for this change because the performance were better with it.
          Hide
          Enis Soztutar added a comment -

          If I remember correctly, I was able to repro this while running IntegrationTestTimeBoundedRequestsWithRegionReplicas with CM. We can come back to this if we see it again in testing.

          Show
          Enis Soztutar added a comment - If I remember correctly, I was able to repro this while running IntegrationTestTimeBoundedRequestsWithRegionReplicas with CM. We can come back to this if we see it again in testing.
          Hide
          Enis Soztutar added a comment -

          Closing this issue after 0.99.0 release.

          Show
          Enis Soztutar added a comment - Closing this issue after 0.99.0 release.

            People

            • Assignee:
              Nicolas Liochon
              Reporter:
              Nicolas Liochon
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development