Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-6721 RegionServer Group based Assignment
  3. HBASE-17570

rsgroup server move can get stuck if unassigning fails

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • None
    • 2.0.0
    • regionserver
    • None

    Description

      This is pretty easy to repro in a standalone setup on master branch. Master branch has the 'fake' Master regionserver. It is showing as a regionserver in the rsgroup 'default' group. If I create a new group and then try moving servers to the new group, it will usually get stuck in the below loop... and it will never break out (have to kill master).

      Looking at code, the RSGroupAdminServer#moveServers has a loop in it that will just go on for ever; there is no timeout nor maximum tries.

      Maybe we don't see this much in a 'real' cluster. Filing this issue in meantime because needs to not keep trying for ever and fail the move.

      2017-01-30 21:34:46,340 INFO  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141] rsgroup.RSGroupAdminServer: Unassigning 1 regions from server localhost:50143 for move to xx
      2017-01-30 21:34:46,341 INFO  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141] master.RegionStates: Transition {8ebaa5bd7a2e906429a7b91bb2bee333 state=OPEN, ts=1485840806167, server=localhost,50143,1485840800161} to {8ebaa5bd7a2e906429a7b91bb2bee333 state=PENDING_CLOSE, ts=1485840886341, server=localhost,50143,1485840800161}
      2017-01-30 21:34:46,341 INFO  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141] master.RegionStateStore: Updating hbase:meta row hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333. with state=PENDING_CLOSE
      2017-01-30 21:34:46,347 INFO  [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=50143] regionserver.RSRpcServices: Close 8ebaa5bd7a2e906429a7b91bb2bee333 without moving
      2017-01-30 21:34:46,348 INFO  [RS_CLOSE_REGION-localhost:50143-0] regionserver.HRegion: Flushing 1/1 column families, memstore=431 B
      2017-01-30 21:34:46,406 INFO  [RS_CLOSE_REGION-localhost:50143-0] regionserver.DefaultStoreFlusher: Flushed, sequenceid=7, memsize=431, hasBloomFilter=true, into tmp file file:/var/folders/d8/8lyxycpd129d4fj7lb684dwh0000gp/T/hbase-stack/hbase/data/hbase/rsgroup/8ebaa5bd7a2e906429a7b91bb2bee333/.tmp/m/999d93adf36b4406bb73dc64e0158a05
      2017-01-30 21:34:46,422 INFO  [RS_CLOSE_REGION-localhost:50143-0] regionserver.HStore: Added file:/var/folders/d8/8lyxycpd129d4fj7lb684dwh0000gp/T/hbase-stack/hbase/data/hbase/rsgroup/8ebaa5bd7a2e906429a7b91bb2bee333/m/999d93adf36b4406bb73dc64e0158a05, entries=2, sequenceid=7, filesize=4.9 K
      2017-01-30 21:34:46,422 INFO  [RS_CLOSE_REGION-localhost:50143-0] regionserver.HRegion: Finished memstore flush of ~431 B/431, currentsize=0 B/0 for region hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333. in 74ms, sequenceid=7, compaction requested=false
      2017-01-30 21:34:46,425 INFO  [StoreCloserThread-hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333.-1] regionserver.HStore: Closed m
      2017-01-30 21:34:46,437 INFO  [RS_CLOSE_REGION-localhost:50143-0] regionserver.HRegion: Closed hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333.
      2017-01-30 21:34:46,440 INFO  [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=50141] master.RegionStates: Transition {8ebaa5bd7a2e906429a7b91bb2bee333 state=PENDING_CLOSE, ts=1485840886341, server=localhost,50143,1485840800161} to {8ebaa5bd7a2e906429a7b91bb2bee333 state=CLOSED, ts=1485840886440, server=localhost,50143,1485840800161}
      2017-01-30 21:34:46,440 INFO  [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=50141] master.RegionStateStore: Updating hbase:meta row hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333. with state=CLOSED
      2017-01-30 21:34:46,442 WARN  [AM.-pool3-t1] balancer.BaseLoadBalancer: Wanted to do retain assignment but no servers to assign to
      2017-01-30 21:34:46,442 WARN  [AM.-pool3-t1] master.AssignmentManager: Can't find a destination for 8ebaa5bd7a2e906429a7b91bb2bee333
      2017-01-30 21:34:46,442 WARN  [AM.-pool3-t1] master.AssignmentManager: Unable to determine a plan to assign {ENCODED => 8ebaa5bd7a2e906429a7b91bb2bee333, NAME => 'hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333.', STARTKEY => '', ENDKEY => ''}
      2017-01-30 21:34:46,442 WARN  [AM.-pool3-t1] master.RegionStates: Failed to open/close 8ebaa5bd7a2e906429a7b91bb2bee333 on localhost,50143,1485840800161, set to FAILED_OPEN
      2017-01-30 21:34:46,442 INFO  [AM.-pool3-t1] master.RegionStates: Transition {8ebaa5bd7a2e906429a7b91bb2bee333 state=CLOSED, ts=1485840886440, server=localhost,50143,1485840800161} to {8ebaa5bd7a2e906429a7b91bb2bee333 state=FAILED_OPEN, ts=1485840886442, server=localhost,50143,1485840800161}
      2017-01-30 21:34:46,442 INFO  [AM.-pool3-t1] master.RegionStateStore: Updating hbase:meta row hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333. with state=FAILED_OPEN
      2017-01-30 21:34:46,990 INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxnFactory: Accepted socket connection from /0:0:0:0:0:0:0:1:50272
      2017-01-30 21:34:46,990 INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.ZooKeeperServer: Refusing session request for client /0:0:0:0:0:0:0:1:50272 as it has seen zxid 0x25e our last zxid is 0xae client must try another server
      2017-01-30 21:34:46,990 INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: Closed socket connection for client /0:0:0:0:0:0:0:1:50272 (no session established for client)
      2017-01-30 21:34:47,353 INFO  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141] rsgroup.RSGroupAdminServer: Unassigning 2 regions from server localhost:50143 for move to xx
      2017-01-30 21:34:47,353 INFO  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141] master.RegionStates: Transition {8ebaa5bd7a2e906429a7b91bb2bee333 state=FAILED_OPEN, ts=1485840886442, server=localhost,50143,1485840800161} to {8ebaa5bd7a2e906429a7b91bb2bee333 state=OFFLINE, ts=1485840887353, server=localhost,50143,1485840800161}
      2017-01-30 21:34:47,353 INFO  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141] master.RegionStateStore: Updating hbase:meta row hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333. with state=OFFLINE
      2017-01-30 21:34:47,355 WARN  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141] balancer.BaseLoadBalancer: Wanted to do retain assignment but no servers to assign to
      2017-01-30 21:34:47,355 WARN  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141] master.AssignmentManager: Can't find a destination for 8ebaa5bd7a2e906429a7b91bb2bee333
      2017-01-30 21:34:47,355 WARN  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141] master.AssignmentManager: Unable to determine a plan to assign {ENCODED => 8ebaa5bd7a2e906429a7b91bb2bee333, NAME => 'hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333.', STARTKEY => '', ENDKEY => ''}
      2017-01-30 21:34:47,355 WARN  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141] master.RegionStates: Failed to open/close 8ebaa5bd7a2e906429a7b91bb2bee333 on localhost,50143,1485840800161, set to FAILED_OPEN
      2017-01-30 21:34:47,355 INFO  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141] master.RegionStates: Transition {8ebaa5bd7a2e906429a7b91bb2bee333 state=OFFLINE, ts=1485840887353, server=localhost,50143,1485840800161} to {8ebaa5bd7a2e906429a7b91bb2bee333 state=FAILED_OPEN, ts=1485840887355, server=localhost,50143,1485840800161}
      2017-01-30 21:34:47,355 INFO  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141] master.RegionStateStore: Updating hbase:meta row hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333. with state=FAILED_OPEN
      2017-01-30 21:34:47,356 INFO  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141] master.RegionStates: Transition {8ebaa5bd7a2e906429a7b91bb2bee333 state=FAILED_OPEN, ts=1485840887355, server=localhost,50143,1485840800161} to {8ebaa5bd7a2e906429a7b91bb2bee333 state=OFFLINE, ts=1485840887356, server=localhost,50143,1485840800161}
      2017-01-30 21:34:47,356 INFO  [RpcServer.deafult.FPBQ.Fifo.handler=29,queue=2,port=50141] master.RegionStateStore: Updating hbase:meta row hbase:rsgroup,,1485840805941.8ebaa5bd7a2e906429a7b91bb2bee333. with state=OFFLINE
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            stack Michael Stack
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: