HBase
  1. HBase
  2. HBASE-11565

Stale connection could stay for a while

    Details

    • Hadoop Flags:
      Reviewed

      Description

      In RpcClient, we cache the connection to each region server. When the connection goes bad, it stays in the cache till it's removed. Before it's removed, new calls will try to use it and just fail. The connection is a thread. It could be stuck in trying to receive some response. Before this receiving thread times out, it won't remove itself from the cache.

      It will be better to interrupt the receiving thread and let it clean up sooner.

      1. hbase-11565-0.98.patch
        0.7 kB
        Jimmy Xiang
      2. hbase-11565-v1.patch
        0.5 kB
        Jimmy Xiang

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          FAILURE: Integrated in hbase-0.96-hadoop2 #286 (See https://builds.apache.org/job/hbase-0.96-hadoop2/286/)
          HBASE-11565 Stale connection could stay for a while (jxiang: rev 4f3f7b56b9c0d215b947b034645341e8e49475d9)

          • hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClient.java
          Show
          Hudson added a comment - FAILURE: Integrated in hbase-0.96-hadoop2 #286 (See https://builds.apache.org/job/hbase-0.96-hadoop2/286/ ) HBASE-11565 Stale connection could stay for a while (jxiang: rev 4f3f7b56b9c0d215b947b034645341e8e49475d9) hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClient.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #396 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/396/)
          HBASE-11565 Stale connection could stay for a while (jxiang: rev 77f409582ab6079b91d435d3a4ffaea4bc73c796)

          • hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClient.java
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #396 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/396/ ) HBASE-11565 Stale connection could stay for a while (jxiang: rev 77f409582ab6079b91d435d3a4ffaea4bc73c796) hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClient.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-0.98 #417 (See https://builds.apache.org/job/HBase-0.98/417/)
          HBASE-11565 Stale connection could stay for a while (jxiang: rev 77f409582ab6079b91d435d3a4ffaea4bc73c796)

          • hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClient.java
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-0.98 #417 (See https://builds.apache.org/job/HBase-0.98/417/ ) HBASE-11565 Stale connection could stay for a while (jxiang: rev 77f409582ab6079b91d435d3a4ffaea4bc73c796) hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClient.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in HBase-0.94-JDK7 #157 (See https://builds.apache.org/job/HBase-0.94-JDK7/157/)
          HBASE-11565 Stale connection could stay for a while (jxiang: rev 8cee70ff8b9b49c8942629b143867b8308560529)

          • src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          Show
          Hudson added a comment - FAILURE: Integrated in HBase-0.94-JDK7 #157 (See https://builds.apache.org/job/HBase-0.94-JDK7/157/ ) HBASE-11565 Stale connection could stay for a while (jxiang: rev 8cee70ff8b9b49c8942629b143867b8308560529) src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in HBase-0.94 #1389 (See https://builds.apache.org/job/HBase-0.94/1389/)
          HBASE-11565 Stale connection could stay for a while (jxiang: rev 8cee70ff8b9b49c8942629b143867b8308560529)

          • src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          Show
          Hudson added a comment - FAILURE: Integrated in HBase-0.94 #1389 (See https://builds.apache.org/job/HBase-0.94/1389/ ) HBASE-11565 Stale connection could stay for a while (jxiang: rev 8cee70ff8b9b49c8942629b143867b8308560529) src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in HBase-0.94-security #502 (See https://builds.apache.org/job/HBase-0.94-security/502/)
          HBASE-11565 Stale connection could stay for a while (jxiang: rev 8cee70ff8b9b49c8942629b143867b8308560529)

          • src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          Show
          Hudson added a comment - FAILURE: Integrated in HBase-0.94-security #502 (See https://builds.apache.org/job/HBase-0.94-security/502/ ) HBASE-11565 Stale connection could stay for a while (jxiang: rev 8cee70ff8b9b49c8942629b143867b8308560529) src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in hbase-0.96 #413 (See https://builds.apache.org/job/hbase-0.96/413/)
          HBASE-11565 Stale connection could stay for a while (jxiang: rev 4f3f7b56b9c0d215b947b034645341e8e49475d9)

          • hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClient.java
          Show
          Hudson added a comment - FAILURE: Integrated in hbase-0.96 #413 (See https://builds.apache.org/job/hbase-0.96/413/ ) HBASE-11565 Stale connection could stay for a while (jxiang: rev 4f3f7b56b9c0d215b947b034645341e8e49475d9) hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClient.java
          Hide
          Jimmy Xiang added a comment -

          Thanks. Integrated into branch 0.94 and up

          Show
          Jimmy Xiang added a comment - Thanks. Integrated into branch 0.94 and up
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-TRUNK #5335 (See https://builds.apache.org/job/HBase-TRUNK/5335/)
          HBASE-11565 Stale connection could stay for a while (jxiang: rev b81302852f0dc0c26ce400d63ab9fe5716dad98f)

          • hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClient.java
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-TRUNK #5335 (See https://builds.apache.org/job/HBase-TRUNK/5335/ ) HBASE-11565 Stale connection could stay for a while (jxiang: rev b81302852f0dc0c26ce400d63ab9fe5716dad98f) hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClient.java
          Hide
          Andrew Purtell added a comment -

          +1 for 0.98

          Show
          Andrew Purtell added a comment - +1 for 0.98
          Hide
          Hudson added a comment -

          FAILURE: Integrated in HBase-1.0 #63 (See https://builds.apache.org/job/HBase-1.0/63/)
          HBASE-11565 Stale connection could stay for a while (jxiang: rev 61dced28a350e8a49bc2ef687f267defbc74ee5c)

          • hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClient.java
          Show
          Hudson added a comment - FAILURE: Integrated in HBase-1.0 #63 (See https://builds.apache.org/job/HBase-1.0/63/ ) HBASE-11565 Stale connection could stay for a while (jxiang: rev 61dced28a350e8a49bc2ef687f267defbc74ee5c) hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClient.java
          Hide
          Nicolas Liochon added a comment -

          Make sense. +1

          Show
          Nicolas Liochon added a comment - Make sense. +1
          Hide
          Jimmy Xiang added a comment -

          The synchronized (this) is safe? There is no risk to create a deadlock somehow?

          It is safe, no deadlock concern. Originally, it calls markClosed(e) directly, which synchronizes on this too. So there is no change here as to synchronized/deadlock here.

          Why do you think that interrupting the reader thread twice is bad? We would leak some resources?

          When the reader thread is interrupted for the first time, it could get out the readResponse() call and move to close() call. So it could be calling IOUtils.closeStream. My concern is such a call can throw an InterruptedException and the remain part of the close() call is not executed. The synchrononized here is to make things safer.

          Show
          Jimmy Xiang added a comment - The synchronized (this) is safe? There is no risk to create a deadlock somehow? It is safe, no deadlock concern. Originally, it calls markClosed(e) directly, which synchronizes on this too. So there is no change here as to synchronized/deadlock here. Why do you think that interrupting the reader thread twice is bad? We would leak some resources? When the reader thread is interrupted for the first time, it could get out the readResponse() call and move to close() call. So it could be calling IOUtils.closeStream. My concern is such a call can throw an InterruptedException and the remain part of the close() call is not executed. The synchrononized here is to make things safer.
          Hide
          Nicolas Liochon added a comment -

          The synchronized (this) is safe? There is no risk to create a deadlock somehow?
          Why do you think that interrupting the reader thread twice is bad? We would leak some resources?

          Show
          Nicolas Liochon added a comment - The synchronized (this) is safe? There is no risk to create a deadlock somehow? Why do you think that interrupting the reader thread twice is bad? We would leak some resources?
          Hide
          Jimmy Xiang added a comment -

          Thanks a lot for the review. I committed the patch to branch 1 and master. Branch 0.98 and before has different code. I attached a separate patch for 0.98 and older branches. The idea is to make sure the connection is interrupted just once, and at a right spot. For example, if the connection is already in the middle of close(), we'd better leave it alone. Could you please take a look?

          Show
          Jimmy Xiang added a comment - Thanks a lot for the review. I committed the patch to branch 1 and master. Branch 0.98 and before has different code. I attached a separate patch for 0.98 and older branches. The idea is to make sure the connection is interrupted just once, and at a right spot. For example, if the connection is already in the middle of close(), we'd better leave it alone. Could you please take a look?
          Hide
          Nicolas Liochon added a comment -

          That's smart. +1

          Show
          Nicolas Liochon added a comment - That's smart. +1
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12657137/hbase-11565-v1.patch
          against trunk revision .
          ATTACHMENT ID: 12657137

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          +1 site. The mvn site goal succeeds with this patch.

          -1 core tests. The patch failed these unit tests:

          -1 core zombie tests. There are 1 zombie test(s): at org.apache.hadoop.hbase.http.TestHttpServerLifecycle.testStoppedServerIsNotAlive(TestHttpServerLifecycle.java:115)

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/10148//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10148//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10148//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10148//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10148//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10148//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10148//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10148//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10148//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10148//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/10148//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12657137/hbase-11565-v1.patch against trunk revision . ATTACHMENT ID: 12657137 +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 javadoc . The javadoc tool appears to have generated 1 warning messages. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 site . The mvn site goal succeeds with this patch. -1 core tests . The patch failed these unit tests: -1 core zombie tests . There are 1 zombie test(s): at org.apache.hadoop.hbase.http.TestHttpServerLifecycle.testStoppedServerIsNotAlive(TestHttpServerLifecycle.java:115) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/10148//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10148//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10148//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10148//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10148//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10148//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10148//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10148//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10148//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10148//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/10148//console This message is automatically generated.
          Hide
          Andrew Purtell added a comment - - edited

          +1

          Would be superseded by HBASE-11564, but this can be applied first, and possibly to more branches

          Show
          Andrew Purtell added a comment - - edited +1 Would be superseded by HBASE-11564 , but this can be applied first, and possibly to more branches

            People

            • Assignee:
              Jimmy Xiang
              Reporter:
              Jimmy Xiang
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development