Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-10895

unassign a region fails due to the hosting region server is in FailedServerList

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.96.1, 0.98.1, 0.99.0
    • 0.99.0, 0.98.2, 0.96.3
    • Region Assignment
    • None
    • Reviewed

    Description

      This issue is similar as HBASE-10833 which deal with the sendRegionOpen RPC while the JIRA issue happens with sendRegionClose.

      Once a RS in in failed server list due to a network hiccup, AM quickly exhausted all retries and failed the whole region assignment later. Below is a sample stack trace:

      2014-03-31 13:39:10,056 INFO  [AM.-pool1-t8] master.AssignmentManager: Server hor16n09.gq1.ygridcore.net,60020,1396270942046 returned org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server is in the failed servers list: hor16n09.gq1.ygridcore.net/68.142.246.220:60020 for loadtest_d1,59999994,1396261861562.fcef8d691632e99948fbf876d24f907e., try=20 of 20
      org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server is in the failed servers list: hor16n09.gq1.ygridcore.net/68.142.246.220:60020
              at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:880)
              at org.apache.hadoop.hbase.ipc.RpcClient$Connection.writeRequest(RpcClient.java:1065)
              at org.apache.hadoop.hbase.ipc.RpcClient$Connection.tracedWriteRequest(RpcClient.java:1032)
              at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1474)
              at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1684)
              at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1737)
              at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.closeRegion(AdminProtos.java:20854)
              at org.apache.hadoop.hbase.protobuf.ProtobufUtil.closeRegion(ProtobufUtil.java:1656)
              at org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:693)
              at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1685)
              at org.apache.hadoop.hbase.master.AssignmentManager.forceRegionStateToOffline(AssignmentManager.java:1786)
              at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1436)
              at org.apache.hadoop.hbase.master.AssignCallable.call(AssignCallable.java:45)
      ....
      2014-03-31 13:39:10,056 WARN  [AM.-pool1-t8] master.RegionStates: Failed to open/close fcef8d691632e99948fbf876d24f907e on hor16n09.gq1.ygridcore.net,60020,1396270942046, set to FAILED_CLOSE
      2014-03-31 13:39:10,056 INFO  [AM.-pool1-t8] master.RegionStates: Transitioned {fcef8d691632e99948fbf876d24f907e state=PENDING_OPEN, ts=1396273149814, server=hor16n09.gq1.ygridcore.net,60020,1396270942046} to {fcef8d691632e99948fbf876d24f907e state=FAILED_CLOSE, ts=1396273150056, server=hor16n09.gq1.ygridcore.net,60020,1396270942046}
      2014-03-31 13:39:10,056 INFO  [AM.-pool1-t8] master.AssignmentManager: Skip assigning {ENCODED => fcef8d691632e99948fbf876d24f907e, NAME => 'loadtest_d1,59999994,1396261861562.fcef8d691632e99948fbf876d24f907e.', STARTKEY => '59999994', ENDKEY => '66666660'}, we couldn't close it: {fcef8d691632e99948fbf876d24f907e state=FAILED_CLOSE, ts=1396273150056, server=hor16n09.gq1.ygridcore.net,60020,1396270942046}
      

      Attachments

        1. hbase-10895.patch
          4 kB
          Jeffrey Zhong
        2. hbase-10895-trunk.patch
          5 kB
          Jeffrey Zhong

        Activity

          People

            jeffreyz Jeffrey Zhong
            jeffreyz Jeffrey Zhong
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: