Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-24089

Rolling Upgrade: Regions are in RIT during enabling the table after restore_snapshot

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.3.4, 1.3.6
    • Fix Version/s: None
    • Component/s: amv2
    • Labels:
      None

      Description

      During Rolling upgrade, we performed some set of operations, which leads to regions were stuck in RIT.
      pre-requisites:
      configure the below properties in HBase 1.3.1 version

       <property>
          <name>hbase.assignment.usezk</name>
          <value>true</value>
        </property>
        <property>
          <name>hbase.assignment.usezk.migrating</name>
          <value>true</value>
        </property>
      

      configure the below properties in HBase 1.3.1 version

       <property>
          <name>hbase.mirror.table.state.to.zookeeper</name>
          <value>true</value>
        </property>
        <property>
          <name>hbase.migrate.table.state.from.zookeeper</name>
          <value>true</value>
        </property>
      

      Steps to reproduce the problem.

      1. start the hbase cluster with version 1.3.1 (1 master and 1 regionserver)
      2. start the regionserver 2.2.x version [ 1 regionserver]
      3. create the table with one region (ensure the table region with old version RS)
      4. write some data into the table
      5. flush the table.
      6. create snapshot for the table
      7. move the table region from old version to new version RS
      8. disable the table.
      9. restore snapshot on the table.
      10 enable table.
      

      After triggered the enable table operation, HBase 1.3.1 master assigned the region to HBase 1.3.1 Regionserver.
      RS failed to open the region.

      2020-03-18 21:10:45,103 WARN  [RS_OPEN_REGION-vm1:16040-17] zookeeper.ZKAssign: regionserver:16040-0x200431c58cf0012, quorum=vm1:2181,vm2:2181,vm3:2181, baseZNode=/hbase Attempt to transition the unassigned node for 505f0e1d96a2a06eb111bd8b923a5a87 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING failed, the server that tried to transition was vm1,16040,1584536385246 not the expected vm2,16040,1584536781189
      2020-03-18 21:10:45,104 WARN  [RS_OPEN_REGION-vm1:16040-17] coordination.ZkOpenRegionCoordination: Failed transition from OFFLINE to OPENING for region=505f0e1d96a2a06eb111bd8b923a5a87
      2020-03-18 21:10:45,104 WARN  [RS_OPEN_REGION-vm1:16040-17] handler.OpenRegionHandler: Region was hijacked? Opening cancelled for encodedName=505f0e1d96a2a06eb111bd8b923a5a87
      2020-03-18 21:10:45,104 INFO  [RS_OPEN_REGION-vm1:16040-17] coordination.ZkOpenRegionCoordination: Opening of region {ENCODED => 505f0e1d96a2a06eb111bd8b923a5a87, NAME => 'usertable01,user1089,1584535968752.505f0e1d96a2a06eb111bd8b923a5a87.', STARTKEY => 'user1089', ENDKEY => 'user1134'} failed, transitioning from OFFLINE to FAILED_OPEN in ZK, expecting version 0
      2020-03-18 21:10:45,104 DEBUG [RS_OPEN_REGION-vm1:16040-17] zookeeper.ZKAssign: regionserver:16040-0x200431c58cf0012, quorum=vm1:2181,vm2:2181,vm3:2181, baseZNode=/hbase Transitioning 505f0e1d96a2a06eb111bd8b923a5a87 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_FAILED_OPEN
      

      Looked little deeper into the problem, found that HMaster failed to delete the Znode, during the table disable operation.

      2020-03-18 21:10:02,219 DEBUG [RpcServer.FifoWFPBQ.default.handler=42,queue=2,port=16000] master.AssignmentManager: Table being disabled so deleting ZK node and removing from regions in transition, skipping assignment of region usertable01,user1089,1584535968752.505f0e1d96a2a06eb111bd8b923a5a87.
      2020-03-18 21:10:02,220 WARN  [RpcServer.FifoWFPBQ.default.handler=42,queue=2,port=16000] zookeeper.ZKAssign: master:16000-0x100431c3faf0012, quorum=vm1:2181,vm2:2181,vm3:2181, baseZNode=/hbase Attempting to delete unassigned node 505f0e1d96a2a06eb111bd8b923a5a87 in RS_ZK_REGION_CLOSED state but node is in M_ZK_REGION_CLOSING state
      2020-03-18 21:10:02,221 WARN  [RpcServer.FifoWFPBQ.default.handler=42,queue=2,port=16000] zookeeper.ZKAssign: master:16000-0x100431c3faf0012, quorum=vm1:2181,vm2:2181,vm3:2181, baseZNode=/hbase Attempting to delete unassigned node 505f0e1d96a2a06eb111bd8b923a5a87 in M_ZK_REGION_OFFLINE state but node is in M_ZK_REGION_CLOSING state
      2020-03-18 21:10:02,221 INFO  [RpcServer.FifoWFPBQ.default.handler=42,queue=2,port=16000] master.AssignmentManager: Failed to delete the closed node for 505f0e1d96a2a06eb111bd8b923a5a87. The node type may not match
      

      Region was closed successfully, and Latest RS sent RPC call back to the master about the region transition information, But master is expecting the Znode states modified by the RS, Based on those states HM will delete the ZNode.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              sreenivasulureddy Y. SREENIVASULU REDDY
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated: