HBase
  1. HBase
  2. HBASE-11374

RpcRetryingCaller#callWithoutRetries has a timeout of zero

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.98.3
    • Fix Version/s: 0.99.0, 0.98.4
    • Component/s: Client
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Previously, RPC multi operations had a timeout of 0, which was erroneously interpreted as infinity, and resulted in a fallback value of 2 seconds. RPC multi operations now use the value of hbase.rpc.timeout, as do other RPC operations. The default value is 60000, or 60 seconds.
      Show
      Previously, RPC multi operations had a timeout of 0, which was erroneously interpreted as infinity, and resulted in a fallback value of 2 seconds. RPC multi operations now use the value of hbase.rpc.timeout, as do other RPC operations. The default value is 60000, or 60 seconds.

      Description

      This code is called by the client on the "multi" path.
      As zero is detected as infinite value, we fallback to 2 seconds, which may not may correct.

      Typically, you can see this kind of message in the client (see the SocketTimeoutException: 2000)

      2014-08-08 17:22:43 o.a.h.h.c.AsyncProcess [INFO] #105158,
      table=rt_global_monthly_campaign_deliveries, attempt=10/35 failed 500 ops,
      last exception: java.net.SocketTimeoutException: Call to
      ip-10-201-128-23.us-west-1.compute.internal/10.201.128.23:60020 failed
      because java.net.SocketTimeoutException: 2000 millis timeout while waiting
      for channel to be ready for read. ch :
      java.nio.channels.SocketChannel[connected local=/10.248.130.152:46014
      remote=ip-10-201-128-23.us-west-1.compute.internal/10.201.128.23:60020] on
      ip-10-201-128-23.us-west-1.compute.internal,60020,1405642103651, tracking
      started Fri Aug 08 17:21:55 UTC 2014, retrying after 10043 ms, replay 500
      ops.
      
      1. 11374.v1.master.patch
        2 kB
        Nicolas Liochon
      2. 11374.98.v1.patch
        4 kB
        Nicolas Liochon

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          23h 36m 1 Nicolas Liochon 19/Jun/14 10:19
          Resolved Resolved Reopened Reopened
          44s 1 Nicolas Liochon 19/Jun/14 10:19
          Reopened Reopened Patch Available Patch Available
          3s 1 Nicolas Liochon 19/Jun/14 10:19
          Patch Available Patch Available Resolved Resolved
          23h 1 Nicolas Liochon 20/Jun/14 09:20
          Resolved Resolved Closed Closed
          246d 15h 10m 1 Enis Soztutar 21/Feb/15 23:30
          Enis Soztutar made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Hide
          Enis Soztutar added a comment -

          Closing this issue after 0.99.0 release.

          Show
          Enis Soztutar added a comment - Closing this issue after 0.99.0 release.
          Hide
          Nicolas Liochon added a comment -

          +1 on the latest version
          Thanks for all this work, Misty Stanley-Jones

          Show
          Nicolas Liochon added a comment - +1 on the latest version Thanks for all this work, Misty Stanley-Jones
          Misty Stanley-Jones made changes -
          Release Note Previously, RPC multi operations had a timeout of 0, which was erroneously interpreted as infinity, and resulted in a fallback value of 2 seconds. Multiget operations now use the value of hbase.rpc.timeout, as do other RPC operations. The default value is 60000, or 60 seconds. Previously, RPC multi operations had a timeout of 0, which was erroneously interpreted as infinity, and resulted in a fallback value of 2 seconds. RPC multi operations now use the value of hbase.rpc.timeout, as do other RPC operations. The default value is 60000, or 60 seconds.
          Misty Stanley-Jones made changes -
          Release Note Previously, multiget RPC operations had a timeout of 0, which was erroneously interpreted as infinity, and resulted in a fallback value of 2 seconds. Multiget operations now use the value of hbase.rpc.timeout, as do other RPC operations. The default value is 60000, or 60 seconds. Previously, RPC multi operations had a timeout of 0, which was erroneously interpreted as infinity, and resulted in a fallback value of 2 seconds. Multiget operations now use the value of hbase.rpc.timeout, as do other RPC operations. The default value is 60000, or 60 seconds.
          Misty Stanley-Jones made changes -
          Release Note Previously, multiget RPC operations had a timeout of 0, which was erroneously interpreted as infinity, and resulted in a fallback value of 2 seconds. Multiget operations now use the value of hbase.rpc.timeout, as do other RPC operations. The default value 60000, or 60 seconds. Previously, multiget RPC operations had a timeout of 0, which was erroneously interpreted as infinity, and resulted in a fallback value of 2 seconds. Multiget operations now use the value of hbase.rpc.timeout, as do other RPC operations. The default value is 60000, or 60 seconds.
          Misty Stanley-Jones made changes -
          Release Note Previously, multiget RPC operations had a timeout of 0, which was erroneously interpreted as infinity, and resulted in a fallback value of 2 seconds. A new configuration option, hbase.rpc.timeout, has been introduced to set the RPC timeout, and defaults to 6000, or 6 seconds. Previously, multiget RPC operations had a timeout of 0, which was erroneously interpreted as infinity, and resulted in a fallback value of 2 seconds. Multiget operations now use the value of hbase.rpc.timeout, as do other RPC operations. The default value 60000, or 60 seconds.
          Hide
          Misty Stanley-Jones added a comment -

          thanks for the clarification, I must have misread the patch. And thanks for the clarification on hbase.rpc.timeout. I'll correct the RN.

          Show
          Misty Stanley-Jones added a comment - thanks for the clarification, I must have misread the patch. And thanks for the clarification on hbase.rpc.timeout. I'll correct the RN.
          Hide
          Nicolas Liochon added a comment -

          Misty Stanley-Jones, for your release note changes:

          • you found these 6000 ms in the code? It should be 60000.
          • hbase.rpc.timeout is not new, hence not "introduced" in this patch. But it was not used for "multi" operations. It was already used for other rpc operations.
          Show
          Nicolas Liochon added a comment - Misty Stanley-Jones , for your release note changes: you found these 6000 ms in the code? It should be 60000. hbase.rpc.timeout is not new, hence not "introduced" in this patch. But it was not used for "multi" operations. It was already used for other rpc operations.
          Misty Stanley-Jones made changes -
          Release Note The multiget operations now use the hbase.rpc.timeout. Previously, multiget RPC operations had a timeout of 0, which was erroneously interpreted as infinity, and resulted in a fallback value of 2 seconds. A new configuration option, hbase.rpc.timeout, has been introduced to set the RPC timeout, and defaults to 6000, or 6 seconds.
          Nicolas Liochon made changes -
          Description This code is called by the client on the "multi" path.
          As zero is detected as infinite value, we fallback to 2 seconds, which may not may correct.

          The code is correct in 0.99+
          This code is called by the client on the "multi" path.
          As zero is detected as infinite value, we fallback to 2 seconds, which may not may correct.

          Typically, you can see this kind of message in the client (see the SocketTimeoutException: 2000)
          {noformat}
          2014-08-08 17:22:43 o.a.h.h.c.AsyncProcess [INFO] #105158,
          table=rt_global_monthly_campaign_deliveries, attempt=10/35 failed 500 ops,
          last exception: java.net.SocketTimeoutException: Call to
          ip-10-201-128-23.us-west-1.compute.internal/10.201.128.23:60020 failed
          because java.net.SocketTimeoutException: 2000 millis timeout while waiting
          for channel to be ready for read. ch :
          java.nio.channels.SocketChannel[connected local=/10.248.130.152:46014
          remote=ip-10-201-128-23.us-west-1.compute.internal/10.201.128.23:60020] on
          ip-10-201-128-23.us-west-1.compute.internal,60020,1405642103651, tracking
          started Fri Aug 08 17:21:55 UTC 2014, retrying after 10043 ms, replay 500
          ops.
          {noformat}
          Qiang Tian made changes -
          Link This issue is duplicated by HBASE-11714 [ HBASE-11714 ]
          Hide
          Andrew Purtell added a comment -

          Belated +1, thanks!

          Show
          Andrew Purtell added a comment - Belated +1, thanks!
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-TRUNK #5223 (See https://builds.apache.org/job/HBase-TRUNK/5223/)
          HBASE-11374 RpcRetryingCaller#callWithoutRetries has a timeout of zero (nkeywal: rev c75afc5b8f5385f331ddbc60e117e4b2d1956121)

          • hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncProcess.java
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-TRUNK #5223 (See https://builds.apache.org/job/HBase-TRUNK/5223/ ) HBASE-11374 RpcRetryingCaller#callWithoutRetries has a timeout of zero (nkeywal: rev c75afc5b8f5385f331ddbc60e117e4b2d1956121) hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncProcess.java
          Hide
          Nicolas Liochon added a comment -

          Committed on master, thanks for the review, Stack.

          Show
          Nicolas Liochon added a comment - Committed on master, thanks for the review, Stack.
          Nicolas Liochon made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Release Note The multiget operations now use the hbase.rpc.timeout.
          Fix Version/s 0.99.0 [ 12325675 ]
          Resolution Fixed [ 1 ]
          Hide
          stack added a comment -

          Ok +1. Add note maybe that needs looking at on commit.

          Show
          stack added a comment - Ok +1. Add note maybe that needs looking at on commit.
          Hide
          Nicolas Liochon added a comment -

          It is correct having the rpc timeout up here in the AsyncProcess layer?

          Good point. It's not perfect, but acceptable I would say, as we rely on the configuration...

          Show
          Nicolas Liochon added a comment - It is correct having the rpc timeout up here in the AsyncProcess layer? Good point. It's not perfect, but acceptable I would say, as we rely on the configuration...
          Hide
          stack added a comment -

          lgtm It is correct having the rpc timeout up here in the AsyncProcess layer?

          Show
          stack added a comment - lgtm It is correct having the rpc timeout up here in the AsyncProcess layer?
          Hide
          Hudson added a comment -

          FAILURE: Integrated in HBase-0.98 #346 (See https://builds.apache.org/job/HBase-0.98/346/)
          HBASE-11374 RpcRetryingCaller#callWithoutRetries has a timeout of zero (nkeywal: rev 174b59ff8f59643b6aacbbf269108432336e7116)

          • hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncProcess.java
          • hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestAsyncProcess.java
          • hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCaller.java
          Show
          Hudson added a comment - FAILURE: Integrated in HBase-0.98 #346 (See https://builds.apache.org/job/HBase-0.98/346/ ) HBASE-11374 RpcRetryingCaller#callWithoutRetries has a timeout of zero (nkeywal: rev 174b59ff8f59643b6aacbbf269108432336e7116) hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncProcess.java hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestAsyncProcess.java hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCaller.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #327 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/327/)
          HBASE-11374 RpcRetryingCaller#callWithoutRetries has a timeout of zero (nkeywal: rev 174b59ff8f59643b6aacbbf269108432336e7116)

          • hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestAsyncProcess.java
          • hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCaller.java
          • hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncProcess.java
          Show
          Hudson added a comment - SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #327 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/327/ ) HBASE-11374 RpcRetryingCaller#callWithoutRetries has a timeout of zero (nkeywal: rev 174b59ff8f59643b6aacbbf269108432336e7116) hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestAsyncProcess.java hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCaller.java hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncProcess.java
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12651391/11374.v1.master.patch
          against trunk revision .
          ATTACHMENT ID: 12651391

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          +1 site. The mvn site goal succeeds with this patch.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/9794//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9794//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9794//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9794//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9794//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9794//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9794//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9794//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9794//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9794//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9794//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12651391/11374.v1.master.patch against trunk revision . ATTACHMENT ID: 12651391 +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 site . The mvn site goal succeeds with this patch. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/9794//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9794//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9794//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9794//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9794//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9794//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9794//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9794//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9794//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9794//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9794//console This message is automatically generated.
          Nicolas Liochon made changes -
          Status Reopened [ 4 ] Patch Available [ 10002 ]
          Nicolas Liochon made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Nicolas Liochon made changes -
          Attachment 11374.v1.master.patch [ 12651391 ]
          Nicolas Liochon made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Resolution Fixed [ 1 ]
          Hide
          Nicolas Liochon added a comment -

          Committed to 0.98.
          I now realize that .99 has a different flavor of the same issue: it uses the operation timeout instead of the rpc timeout. I'm going to upload a patch for this as well.

          Show
          Nicolas Liochon added a comment - Committed to 0.98. I now realize that .99 has a different flavor of the same issue: it uses the operation timeout instead of the rpc timeout. I'm going to upload a patch for this as well.
          Hide
          Nick Dimiduk added a comment -

          +1

          Show
          Nick Dimiduk added a comment - +1
          Hide
          Ted Yu added a comment -

          lgtm

          Show
          Ted Yu added a comment - lgtm
          Hide
          Nicolas Liochon added a comment -

          tests went ok. Reviews welcome

          Show
          Nicolas Liochon added a comment - tests went ok. Reviews welcome
          Nicolas Liochon made changes -
          Summary RpcRetryingCaller#callWithRetries has a timeout of zero RpcRetryingCaller#callWithoutRetries has a timeout of zero
          Hide
          Nicolas Liochon added a comment -

          As the precommit does not work for patches on previous version, I'm running the small & medium tests locally. Will report back when it's done.

          Show
          Nicolas Liochon added a comment - As the precommit does not work for patches on previous version, I'm running the small & medium tests locally. Will report back when it's done.
          Hide
          Nicolas Liochon added a comment -

          patch for 0.98 only.

          Show
          Nicolas Liochon added a comment - patch for 0.98 only.
          Nicolas Liochon made changes -
          Field Original Value New Value
          Attachment 11374.98.v1.patch [ 12651104 ]
          Nicolas Liochon created issue -

            People

            • Assignee:
              Nicolas Liochon
              Reporter:
              Nicolas Liochon
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development