HBase
  1. HBase
  2. HBASE-5086

Reopening a region on a RS can leave it in PENDING_OPEN

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Invalid
    • Affects Version/s: 0.92.0
    • Fix Version/s: 0.92.3
    • Component/s: None
    • Labels:
      None

      Description

      I got this twice during the same test.

      If the region servers are slow enough and you run an online alter, it's possible for the RS to change the znode status to CLOSED and have the master send an OPEN before the region server is able to remove the region from it's list of RITs.

      This is what the master sees:

      011-12-21 22:24:09,498 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. (offlining)
      2011-12-21 22:24:09,498 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x134589d3db033f7 Creating unassigned node for 43123e2e3fc83ec25fe2a76b4f09077f in a CLOSING state
      2011-12-21 22:24:09,524 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to sv4r25s44,62023,1324494325099 for region test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
      2011-12-21 22:24:15,656 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSED, server=sv4r25s44,62023,1324494325099, region=43123e2e3fc83ec25fe2a76b4f09077f
      2011-12-21 22:24:15,656 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for 43123e2e3fc83ec25fe2a76b4f09077f
      2011-12-21 22:24:15,656 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. state=CLOSED, ts=1324506255629, server=sv4r25s44,62023,1324494325099
      2011-12-21 22:24:15,656 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:62003-0x134589d3db033f7 Creating (or updating) unassigned node for 43123e2e3fc83ec25fe2a76b4f09077f with OFFLINE state
      2011-12-21 22:24:15,663 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. destination server is + sv4r25s44,62023,1324494325099
      2011-12-21 22:24:15,663 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.; plan=hri=test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., src=, dest=sv4r25s44,62023,1324494325099
      2011-12-21 22:24:15,663 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. to sv4r25s44,62023,1324494325099
      2011-12-21 22:24:15,664 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment in: sv4r25s44,62023,1324494325099 due to org.apache.hadoop.hbase.regionserver.RegionAlreadyInTransitionException: Received:OPEN for the region:test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. ,which we are already trying to CLOSE.

      After that the master abandons.

      And the region server:

      2011-12-21 22:24:09,523 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received close region: test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
      2011-12-21 22:24:09,523 DEBUG org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing close of test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
      2011-12-21 22:24:09,524 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.: disabling compactions & flushes
      2011-12-21 22:24:09,524 INFO org.apache.hadoop.hbase.regionserver.HRegion: Running close preflush of test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
      2011-12-21 22:24:09,524 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., current region memstore size 40.5m
      2011-12-21 22:24:09,524 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished snapshotting test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., commencing wait for mvcc, flushsize=42482936
      2011-12-21 22:24:13,368 DEBUG org.apache.hadoop.hbase.regionserver.Store: Renaming flushed file at hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/.tmp/87d6944c54c7417e9a34a9f9542bcb72 to hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/actions/87d6944c54c7417e9a34a9f9542bcb72
      2011-12-21 22:24:13,568 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/actions/87d6944c54c7417e9a34a9f9542bcb72, entries=54209, sequenceid=31451012, memsize=40.5m, filesize=31.4m
      2011-12-21 22:24:14,381 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~40.5m/42482936, currentsize=218.9k/224128 for region test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. in 4856ms, sequenceid=31451012, compaction requested=true
      2011-12-21 22:24:15,267 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled for region test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
      2011-12-21 22:24:15,267 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., current region memstore size 218.9k
      2011-12-21 22:24:15,267 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished snapshotting test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f., commencing wait for mvcc, flushsize=224128
      2011-12-21 22:24:15,330 DEBUG org.apache.hadoop.hbase.regionserver.Store: Renaming flushed file at hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/.tmp/0a744b85cec5454e873a7c27bf9b3c53 to hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/actions/0a744b85cec5454e873a7c27bf9b3c53
      2011-12-21 22:24:15,346 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://sv4r11s38:9100/hbase/test1/43123e2e3fc83ec25fe2a76b4f09077f/actions/0a744b85cec5454e873a7c27bf9b3c53, entries=286, sequenceid=31451619, memsize=218.9k, filesize=170.2k
      2011-12-21 22:24:15,347 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~218.9k/224128, currentsize=0.0/0 for region test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. in 80ms, sequenceid=31451619, compaction requested=true
      2011-12-21 22:24:15,365 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.
      2011-12-21 22:24:15,365 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:62023-0x134589d3db03403 Attempting to transition node 43123e2e3fc83ec25fe2a76b4f09077f from M_ZK_REGION_CLOSING to RS_ZK_REGION_CLOSED
      2011-12-21 22:24:15,637 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:62023-0x134589d3db03403 Successfully transitioned node 43123e2e3fc83ec25fe2a76b4f09077f from M_ZK_REGION_CLOSING to RS_ZK_REGION_CLOSED
      2011-12-21 22:24:15,670 DEBUG org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: set region closed state in zk successfully for region test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f. sn name: sv4r25s44,62023,1324494325099
      2011-12-21 22:24:15,670 DEBUG org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Closed region test1,db6db6b4,1324501004642.43123e2e3fc83ec25fe2a76b4f09077f.

      Doing a force unassign in the shell fixes it.

      A small-ish fix would be to add to RegionAlreadyInTransitionException which state it's in so that we can detect this case and then just retry or open on another region server.

      This is critical for online altering to work, but I don't think it's likely to happen in other situations.

        Activity

          People

          • Assignee:
            ramkrishna.s.vasudevan
            Reporter:
            Jean-Daniel Cryans
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development