Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-4400

.META. getting stuck if RS hosting it is dead and znode state is in RS_ZK_REGION_OPENED

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.90.5
    • None
    • None
    • Reviewed

    Description

      Start 2 RS.
      The .META. is being hosted by RS2 but while processing it goes down.

      Now restart the master and RS1. Master gets the RS name from the znode in RS_ZK_REGION_OPENED. But as RS2 is not online still the master is not able to process the META at all. Please find the logs

      2011-09-14 16:43:51,949 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux76,60020,1315998828523, region=70236052/-ROOT-
      2011-09-14 16:43:51,968 INFO org.apache.hadoop.hbase.master.HMaster: -ROOT- assigned=1, rit=false, location=linux76:60020
      2011-09-14 16:43:51,970 INFO org.apache.hadoop.hbase.master.AssignmentManager: Processing region .META.,,1.1028785192 in state RS_ZK_REGION_OPENED
      2011-09-14 16:43:51,970 INFO org.apache.hadoop.hbase.master.AssignmentManager: Failed to find linux146,60020,1315998414623 in list of online servers; skipping registration of open of .META.,,1.1028785192
      2011-09-14 16:43:51,971 INFO org.apache.hadoop.hbase.master.AssignmentManager: Waiting on 1028785192/.META.
      2011-09-14 16:43:51,983 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=linux76,60020,1315998828523, region=70236052/-ROOT-
      2011-09-14 16:43:51,986 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 70236052; deleting unassigned node
      2011-09-14 16:43:51,986 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x13267854032001d Deleting existing unassigned node for 70236052 that is in expected state RS_ZK_REGION_OPENED
      2011-09-14 16:43:51,998 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x13267854032001d Successfully deleted unassigned node for region 70236052 in expected state RS_ZK_REGION_OPENED
      2011-09-14 16:43:51,999 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region -ROOT-,,0.70236052 on linux76,60020,1315998828523
      2011-09-14 16:44:00,945 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=linux146,60020,1315998839724, regionCount=0, userLoad=false
      2011-09-14 16:46:20,003 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  .META.,,1.1028785192 state=OPEN, ts=0
      2011-09-14 16:46:20,004 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPEN for too long, we don't know where region was opened so can't do anything
      
              regionsInTransition.put(encodedRegionName, new RegionState(
                  regionInfo, RegionState.State.OPEN, data.getStamp()));
                ................
              } else {
                HServerInfo hsi = this.serverManager.getServerInfo(sn);
                if (hsi == null) {
                  LOG.info("Failed to find " + sn +
                    " in list of online servers; skipping registration of open of " +
                    regionInfo.getRegionNameAsString());
                } else {
                  new OpenedRegionHandler(master, this, regionInfo, hsi).process();
                }
              }
      

      So timeout monitor is not able to do anything here

                LOG.error("Region has been OPEN for too long, " +
                "we don't know where region was opened so can't do anything");
                synchronized(regionState) {
                  regionState.update(regionState.getState());
                }
      

      Attachments

        1. HBASE-4400_trunk.patch
          5 kB
          ramkrishna.s.vasudevan
        2. HBASE-4400_0.90.patch
          5 kB
          ramkrishna.s.vasudevan
        3. HBASE-4400_trunk_1.patch
          7 kB
          ramkrishna.s.vasudevan
        4. HBASE-4400_0.90_1.patch
          7 kB
          ramkrishna.s.vasudevan

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ram_krish ramkrishna.s.vasudevan
            ram_krish ramkrishna.s.vasudevan
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment