Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-14533

Thrift client gets "AsyncProcess: Failed to get region location .... closed"

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Patch Available
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.0.0
    • Fix Version/s: None
    • Component/s: REST, Thrift
    • Labels:
      None

      Description

      An internal python client has been getting below stack trace since HBASE-134347

      2015-09-30 11:27:31,670 runner                    ERROR   : scheduler executor error
      2015-09-30 11:27:31,674 runner                    ERROR   : Traceback (most recent call last):
        File "/opt/cops/cops-related-ticket-info-fetcher/fetcher/.virtenv/lib/python2.6/site-packages/CopsRtiFetcher-0.1-py2.6.egg/cops_rti/fetcher/runner.py", line 82, in run
          fetch_list = self.__scheduler_executor.run()
        File "/opt/cops/cops-related-ticket-info-fetcher/fetcher/.virtenv/lib/python2.6/site-packages/CopsRtiFetcher-0.1-py2.6.egg/cops_rti/fetcher/scheduler.py", line 35, in run
          with self.__fetch_db_dao.get_scanner() as scanner:
        File "/opt/cops/cops-related-ticket-info-fetcher/fetcher/.virtenv/lib/python2.6/site-packages/CopsHbaseCommon-f796bf2929be11c26536c3e8f3e9c0b0ecb382b3-py2.6.egg/cops/hbase/common/hbase_dao.py", line 57, in get_scanner
          caching=caching, field_filter_list=field_filter_list)
        File "/opt/cops/cops-related-ticket-info-fetcher/fetcher/.virtenv/lib/python2.6/site-packages/CopsHbaseCommon-f796bf2929be11c26536c3e8f3e9c0b0ecb382b3-py2.6.egg/cops/hbase/common/hbase_client_template.py", line 104, in get_entity_scanner
          self.__fix_cfs(self.__filter_columns(field_filter_list)), caching)
        File "/opt/cops/cops-related-ticket-info-fetcher/fetcher/.virtenv/lib/python2.6/site-packages/CopsHbaseCommon-f796bf2929be11c26536c3e8f3e9c0b0ecb382b3-py2.6.egg/cops/hbase/common/hbase_entity_scanner.py", line 81, in open
          self.__scanner_id = client.scannerOpenWithScan(table_name, scan)
        File "/opt/cops/cops-related-ticket-info-fetcher/.crepo/cops-hbase-common/ext-py/hbase/Hbase.py", line 1494, in scannerOpenWithScan
          return self.recv_scannerOpenWithScan()
        File "/opt/cops/cops-related-ticket-info-fetcher/.crepo/cops-hbase-common/ext-py/hbase/Hbase.py", line 1518, in recv_scannerOpenWithScan
          raise result.io
      IOError: IOError(message="org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the location\n\tat org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:308)\n\tat org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:149)\n\tat org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:57)\n\tat org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)\n\tat org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:293)\n\tat org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:268)\n\tat org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:140)\n\tat org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:135)\n\tat org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:888)\n\tat org.apache.hadoop.hbase.thrift.ThriftServerRunner$HBaseHandler.scannerOpenWithScan(ThriftServerRunner.java:1446)\n\tat sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:606)\n\tat org.apache.hadoop.hbase.thrift.HbaseHandlerMetricsProxy.invoke(HbaseHandlerMetricsProxy.java:67)\n\tat com.sun.proxy.$Proxy14.scannerOpenWithScan(Unknown Source)\n\tat org.apache.hadoop.hbase.thrift.generated.Hbase$Processor$scannerOpenWithScan.getResult(Hbase.java:4609)\n\tat org.apache.hadoop.hbase.thrift.generated.Hbase$Processor$scannerOpenWithScan.getResult(Hbase.java:4593)\n\tat org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)\n\tat org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)\n\tat org.apache.hadoop.hbase.thrift.ThriftServerRunner$3.process(ThriftServerRunner.java:502)\n\tat org.apache.hadoop.hbase.thrift.TBoundedThreadPoolServer$ClientConnnection.run(TBoundedThreadPoolServer.java:289)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)\n\tat java.lang.Thread.run(Thread.java:745)\nCaused by: java.io.IOException: hconnection-0xa8e1bf9 closed\n\tat org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1117)\n\tat org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:299)\n\t... 23 more\n")
      

      On the thrift server side we see this:

      2015-09-30 07:22:59,427 ERROR org.apache.hadoop.hbase.client.AsyncProcess: Failed to get region location
      java.io.IOException: hconnection-0x4142991e closed
              at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1117)
              at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:369)
              at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:320)
              at org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:206)
              at org.apache.hadoop.hbase.client.BufferedMutatorImpl.flush(BufferedMutatorImpl.java:183)
              at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1496)
              at org.apache.hadoop.hbase.client.HTable.put(HTable.java:1107)
              at org.apache.hadoop.hbase.thrift.ThriftServerRunner$HBaseHandler.mutateRowTs(ThriftServerRunner.java:1256)
              at org.apache.hadoop.hbase.thrift.ThriftServerRunner$HBaseHandler.mutateRow(ThriftServerRunner.java:1209)
              at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:606)
              at org.apache.hadoop.hbase.thrift.HbaseHandlerMetricsProxy.invoke(HbaseHandlerMetricsProxy.java:67)
              at com.sun.proxy.$Proxy14.mutateRow(Unknown Source)
              at org.apache.hadoop.hbase.thrift.generated.Hbase$Processor$mutateRow.getResult(Hbase.java:4334)
              at org.apache.hadoop.hbase.thrift.generated.Hbase$Processor$mutateRow.getResult(Hbase.java:4318)
              at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
              at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
              at org.apache.hadoop.hbase.thrift.ThriftServerRunner$3.process(ThriftServerRunner.java:502)
              at org.apache.hadoop.hbase.thrift.TBoundedThreadPoolServer$ClientConnnection.run(TBoundedThreadPoolServer.java:289)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
              at java.lang.Thread.run(Thread.java:745)
      

      HBASE-13437 has us actual execute a close on timeout – before we'd mark connection closed but would never call close on it.

      A background chore is going around stamping Connections in the ConnectionCache as 'closed' if they have not been used in ten minutes. The 'close' can come in at any time..... In particular between the point at which we get the table/connection and when we go to use it: i.e. flush puts. It is at the flush puts point that we get the above 'AsyncProcess: Failed to get region location' (It is not a failure to find region location but rather our noticing that the connection has been closed).

      Attempts at reproducing this issue locally letting the Connection timeout can generate the above exception if a certain dance is done but it is hard to do; I am not reproducing the actual usage by the aforementioned client.

      Next steps would be setting up python client talking via thrift and then try using connection after it has been evicted from the connection cache. Another thing to try is a pool of connections on the python side...connections are identified by user and table.

        Attachments

        1. 14533v2.branch-1.patch
          7 kB
          stack
        2. 14533.test.patch
          7 kB
          stack
        3. test.patch
          4 kB
          stack

          Activity

            People

            • Assignee:
              stack stack
              Reporter:
              stack stack
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated: