Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-9321

Contention getting the current user in RpcClient$Connection.writeRequest

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 0.95.2
    • 0.98.0, 0.96.0
    • None
    • None
    • Reviewed

    Description

      I've been running tests on clusters with "lots" of regions, about 400, and I'm seeing weird contention in the client.

      This one I see a lot, hundreds and sometimes thousands of threads are blocked like this:

      "htable-pool4-t74" daemon prio=10 tid=0x00007f2254114000 nid=0x2a99 waiting for monitor entry [0x00007f21f9e94000]
         java.lang.Thread.State: BLOCKED (on object monitor)
      	at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:466)
      	- waiting to lock <0x00000000fb5ad000> (a java.lang.Class for org.apache.hadoop.security.UserGroupInformation)
      	at org.apache.hadoop.hbase.ipc.RpcClient$Connection.writeRequest(RpcClient.java:1013)
      	at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1407)
      	at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1634)
      	at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1691)
      	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.multi(ClientProtos.java:27339)
      	at org.apache.hadoop.hbase.client.MultiServerCallable.call(MultiServerCallable.java:105)
      	at org.apache.hadoop.hbase.client.MultiServerCallable.call(MultiServerCallable.java:43)
      	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:183)
      

      While the holder is doing this:

      "htable-pool17-t55" daemon prio=10 tid=0x00007f2244408000 nid=0x2a98 runnable [0x00007f21f9f95000]
         java.lang.Thread.State: RUNNABLE
      	at java.security.AccessController.getStackAccessControlContext(Native Method)
      	at java.security.AccessController.getContext(AccessController.java:487)
      	at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:466)
      	- locked <0x00000000fb5ad000> (a java.lang.Class for org.apache.hadoop.security.UserGroupInformation)
      	at org.apache.hadoop.hbase.ipc.RpcClient$Connection.writeRequest(RpcClient.java:1013)
      	at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1407)
      	at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1634)
      	at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1691)
      	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.multi(ClientProtos.java:27339)
      	at org.apache.hadoop.hbase.client.MultiServerCallable.call(MultiServerCallable.java:105)
      	at org.apache.hadoop.hbase.client.MultiServerCallable.call(MultiServerCallable.java:43)
      	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:183)
      

      Attachments

        1. trunk-9321_v2.patch
          79 kB
          Jimmy Xiang
        2. trunk-9321_v3.patch
          81 kB
          Jimmy Xiang
        3. trunk-9321_v4.patch
          83 kB
          Jimmy Xiang
        4. trunk-9321.patch
          5 kB
          Jimmy Xiang

        Issue Links

          Activity

            People

              jxiang Jimmy Xiang
              jdcryans Jean-Daniel Cryans
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: