HBase
  1. HBase
  2. HBASE-5155

ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

    Details

    • Type: Bug Bug
    • Status: Reopened
    • Priority: Blocker Blocker
    • Resolution: Unresolved
    • Affects Version/s: 0.90.4
    • Fix Version/s: 0.90.6
    • Component/s: master
    • Labels:
      None
    • Release Note:
      Hide
      This issue is an incompatible change.
      If an HBase client with the changes for HBASE-5155 and a server (master) without the changes for HBASE-5155 is used, then the is_enabled (from HBase Shell) or isTableEnabled() (from HBaseAdmin) will return false though the table is already enabled as per the master.

      If the HBase client does have the changes for HBASE-5155 and the server does not have the changes for HBASE-5155, then if we try to Enable a table then the client will hang.

      The reason is because,
      Prior to HBASE-5155 once the table is enabled the znode in the zookeeper created for the table is deleted.
      After HBASE-5155 once the table is enabled the znode in the zookeeper created for the table is not deleted, whereas the same node is updated with the status ENABLED.

      The client also expects the status of the znode in the zookeeper to be in the ENABLED state if the table has been enabled successfully.
      The above changes makes the client behaviour incompatible if the client does not have this fix whereas the server has this fix.
      If both the client and the server does not have this fix, then the behaviour is as expected.
      Show
      This issue is an incompatible change. If an HBase client with the changes for HBASE-5155 and a server (master) without the changes for HBASE-5155 is used, then the is_enabled (from HBase Shell) or isTableEnabled() (from HBaseAdmin) will return false though the table is already enabled as per the master. If the HBase client does have the changes for HBASE-5155 and the server does not have the changes for HBASE-5155 , then if we try to Enable a table then the client will hang. The reason is because, Prior to HBASE-5155 once the table is enabled the znode in the zookeeper created for the table is deleted. After HBASE-5155 once the table is enabled the znode in the zookeeper created for the table is not deleted, whereas the same node is updated with the status ENABLED. The client also expects the status of the znode in the zookeeper to be in the ENABLED state if the table has been enabled successfully. The above changes makes the client behaviour incompatible if the client does not have this fix whereas the server has this fix. If both the client and the server does not have this fix, then the behaviour is as expected.

      Description

      ServerShutDownHandler and disable/delete table handler races. This is not an issue due to TM.
      -> A regionserver goes down. In our cluster the regionserver holds lot of regions.
      -> A region R1 has two daughters D1 and D2.
      -> The ServerShutdownHandler gets called and scans the META and gets all the user regions
      -> Parallely a table is disabled. (No problem in this step).
      -> Delete table is done.
      -> The tables and its regions are deleted including R1, D1 and D2.. (So META is cleaned)
      -> Now ServerShutdownhandler starts to processTheDeadRegion

       if (hri.isOffline() && hri.isSplit()) {
            LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
              "; checking daughter presence");
            fixupDaughters(result, assignmentManager, catalogTracker);
      

      As part of fixUpDaughters as the daughers D1 and D2 is missing for R1

          if (isDaughterMissing(catalogTracker, daughter)) {
            LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString());
            MetaEditor.addDaughter(catalogTracker, daughter, null);
      
            // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
            // there then something wonky about the split -- things will keep going
            // but could be missing references to parent region.
      
            // And assign it.
            assignmentManager.assign(daughter, true);
      

      we call assign of the daughers.
      Now after this we again start with the below code.

              if (processDeadRegion(e.getKey(), e.getValue(),
                  this.services.getAssignmentManager(),
                  this.server.getCatalogTracker())) {
                this.services.getAssignmentManager().assign(e.getKey(), true);
      

      Now when the SSH scanned the META it had R1, D1 and D2.
      So as part of the above code D1 and D2 which where assigned by fixUpDaughters
      is again assigned by

      this.services.getAssignmentManager().assign(e.getKey(), true);
      

      Thus leading to a zookeeper issue due to bad version and killing the master.
      The important part here is the regions that were deleted are recreated which i think is more critical.

      1. HBASE-5155_latest.patch
        19 kB
        ramkrishna.s.vasudevan
      2. hbase-5155_6.patch
        25 kB
        ramkrishna.s.vasudevan
      3. HBASE-5155_1.patch
        25 kB
        ramkrishna.s.vasudevan
      4. HBASE-5155_2.patch
        25 kB
        ramkrishna.s.vasudevan
      5. HBASE-5155_3.patch
        25 kB
        ramkrishna.s.vasudevan

        Activity

        Hide
        ramkrishna.s.vasudevan added a comment -

        Can we prevent disable and delete table from happening if ServerShutDownHandler is in progress?

        Show
        ramkrishna.s.vasudevan added a comment - Can we prevent disable and delete table from happening if ServerShutDownHandler is in progress?
        Hide
        Ted Yu added a comment -

        Then we need to detect whether the table being deleted/disabled has region on the underlying server.

        Show
        Ted Yu added a comment - Then we need to detect whether the table being deleted/disabled has region on the underlying server.
        Hide
        Ted Yu added a comment -

        I think Ram's question @ 09/Jan/12 17:23 hints at introducing synchronization between DeleteTableHandler and ServerShutdownhandler.

        Show
        Ted Yu added a comment - I think Ram's question @ 09/Jan/12 17:23 hints at introducing synchronization between DeleteTableHandler and ServerShutdownhandler.
        Hide
        stack added a comment -

        @Ram Nice one. Do you have a snippet of log that shows this?

        So, ServerShutdownHandler should be checking if table is disabled before it does either fixup or assign? (Thats what the check of (hri.isOffline()...) is supposed to be doing only the enable/disable semantic changed so that now when a table is disabled, we now set a flag for the table in zk rather than do it individually on each region; i.e. offline it).

        Or, are you saying the table was completely deleted when servershutdownhandler started to run? If so, then the create of the region should fail; we should make sure that if the parent table directory not present, then the we should not be able to create region subdirs. We'd need a mkdir that did not do a recursive create (we need newer hadoop/hdfs for this?)

        On the question of synchronization between DeleteTableHandler and ServerShutdownHandler, yes, we need to have all threads in master coordinate around state changes whether the balancer thread, servershutdownhander executor thread, incoming splits, etc. I'd like to put up a harness in which we can repro all these race conditions... HBase-3154 helps with this (the test included shows how to mock a balance and a server shutdown handler – would need to make them interleave or have them reproduce this issue – the log would help with reproducing the event sequence).

        Show
        stack added a comment - @Ram Nice one. Do you have a snippet of log that shows this? So, ServerShutdownHandler should be checking if table is disabled before it does either fixup or assign? (Thats what the check of (hri.isOffline()...) is supposed to be doing only the enable/disable semantic changed so that now when a table is disabled, we now set a flag for the table in zk rather than do it individually on each region; i.e. offline it). Or, are you saying the table was completely deleted when servershutdownhandler started to run? If so, then the create of the region should fail; we should make sure that if the parent table directory not present, then the we should not be able to create region subdirs. We'd need a mkdir that did not do a recursive create (we need newer hadoop/hdfs for this?) On the question of synchronization between DeleteTableHandler and ServerShutdownHandler, yes, we need to have all threads in master coordinate around state changes whether the balancer thread, servershutdownhander executor thread, incoming splits, etc. I'd like to put up a harness in which we can repro all these race conditions... HBase-3154 helps with this (the test included shows how to mock a balance and a server shutdown handler – would need to make them interleave or have them reproduce this issue – the log would help with reproducing the event sequence).
        Hide
        ramkrishna.s.vasudevan added a comment -
        2012-01-10 11:43:34,303 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: j9t6,,1326109762514.adcbae41a5024c60c72f5752c6e1c8d4.: Daughters; j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7., j9t6,23443]5767435g,1326176002507.0b96b5ed4c0426d3b3f13e586179c9bc. from linux-129,60020,1326175677339
        
        
        
        
        2012-01-10 12:05:19,122 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs for linux-129,60020,1326175677339
        2012-01-10 12:06:07,153 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [linux-129,60020,1326175677339]
        2012-01-10 12:09:57,865 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Reassigning 7 region(s) that linux-129,60020,1326175677339 was carrying (skipping 0 regions(s) that are already in transition)
        
        
        
        
        2012-01-10 12:11:30,988 INFO org.apache.hadoop.hbase.master.handler.DisableTableHandler: Attemping to disable table j9t6
        2012-01-10 12:12:21,513 INFO org.apache.hadoop.hbase.master.handler.DisableTableHandler: Disabled table is done=true
        
        
        
        
        
        2012-01-10 12:13:41,624 INFO org.apache.hadoop.hbase.master.handler.TableEventHandler: Handling table operation C_M_DELETE_TABLE on table j9t6
        2012-01-10 12:14:00,811 DEBUG org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Deleting region j9t6,,1326109762514.adcbae41a5024c60c72f5752c6e1c8d4. from META and FS
        2012-01-10 12:14:02,230 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted region j9t6,,1326109762514.adcbae41a5024c60c72f5752c6e1c8d4. from META
        2012-01-10 12:14:07,330 DEBUG org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Deleting region j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7. from META and FS
        2012-01-10 12:14:07,521 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted region j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7. from META
        2012-01-10 12:14:09,860 DEBUG org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Deleting region j9t6,23443]5767435g,1326176002507.0b96b5ed4c0426d3b3f13e586179c9bc. from META and FS
        2012-01-10 12:14:10,096 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted region j9t6,23443]5767435g,1326176002507.0b96b5ed4c0426d3b3f13e586179c9bc. from META
        
        
        
        
        
        
        
        
        2012-01-10 12:18:11,081 DEBUG org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Offlined and split region j9t6,,1326109762514.adcbae41a5024c60c72f5752c6e1c8d4.; checking daughter presence
        2012-01-10 12:18:46,450 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Fixup; missing daughter j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7.
        2012-01-10 12:18:46,775 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Added daughter j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7. in region .META.,,1, serverInfo=null
        2012-01-10 12:18:47,135 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x134c5dbd0a60000 Creating (or updating) unassigned node for 49c3665a4bc656f3f6473659b64798f7 with OFFLINE state
        2012-01-10 12:18:47,142 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7. so generated a random one; hri=j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7., src=, dest=linux146,60020,1326169560093; 1 (online=1, exclude=null) available servers
        2012-01-10 12:18:47,143 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7. to linux146,60020,1326169560093
        2012-01-10 12:18:47,155 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handle region called from node nodeDataChanged
        2012-01-10 12:18:47,155 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux146,60020,1326169560093, region=49c3665a4bc656f3f6473659b64798f7
        2012-01-10 12:18:47,202 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handle region called from node nodeDataChanged
        2012-01-10 12:18:47,202 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux146,60020,1326169560093, region=49c3665a4bc656f3f6473659b64798f7
        2012-01-10 12:18:47,221 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handle region called from node nodeDataChanged
        2012-01-10 12:18:47,221 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=linux146,60020,1326169560093, region=49c3665a4bc656f3f6473659b64798f7
        2012-01-10 12:18:47,222 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7. from serverName=linux146,60020,1326169560093, load=(requests=0, regions=7, usedHeap=30, maxHeap=996); deleting unassigned node
        2012-01-10 12:18:47,222 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x134c5dbd0a60000 Deleting existing unassigned node for 49c3665a4bc656f3f6473659b64798f7 that is in expected state RS_ZK_REGION_OPENED
        2012-01-10 12:18:47,230 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x134c5dbd0a60000 Successfully deleted unassigned node for region 49c3665a4bc656f3f6473659b64798f7 in expected state RS_ZK_REGION_OPENED
        2012-01-10 12:18:47,232 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has opened the region j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7. that was online on serverName=linux146,60020,1326169560093, load=(requests=0, regions=7, usedHeap=30, maxHeap=996)
        
        2012-01-10 12:19:01,801 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Fixup; missing daughter j9t6,23443]5767435g,1326176002507.0b96b5ed4c0426d3b3f13e586179c9bc.
        2012-01-10 12:19:02,261 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Added daughter j9t6,23443]5767435g,1326176002507.0b96b5ed4c0426d3b3f13e586179c9bc. in region .META.,,1, serverInfo=null
        2012-01-10 12:19:02,984 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x134c5dbd0a60000 Creating (or updating) unassigned node for 0b96b5ed4c0426d3b3f13e586179c9bc with OFFLINE state
        2012-01-10 12:19:02,992 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for j9t6,23443]5767435g,1326176002507.0b96b5ed4c0426d3b3f13e586179c9bc. so generated a random one; hri=j9t6,23443]5767435g,1326176002507.0b96b5ed4c0426d3b3f13e586179c9bc., src=, dest=linux146,60020,1326169560093; 1 (online=1, exclude=null) available servers
        2012-01-10 12:19:02,992 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region j9t6,23443]5767435g,1326176002507.0b96b5ed4c0426d3b3f13e586179c9bc. to linux146,60020,1326169560093
        2012-01-10 12:19:03,062 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handle region called from node nodeDataChanged
        2012-01-10 12:19:03,062 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux146,60020,1326169560093, region=0b96b5ed4c0426d3b3f13e586179c9bc
        2012-01-10 12:19:03,107 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handle region called from node nodeDataChanged
        2012-01-10 12:19:03,108 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux146,60020,1326169560093, region=0b96b5ed4c0426d3b3f13e586179c9bc
        2012-01-10 12:19:03,164 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handle region called from node nodeDataChanged
        2012-01-10 12:19:03,164 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=linux146,60020,1326169560093, region=0b96b5ed4c0426d3b3f13e586179c9bc
        2012-01-10 12:19:03,165 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for j9t6,23443]5767435g,1326176002507.0b96b5ed4c0426d3b3f13e586179c9bc. from serverName=linux146,60020,1326169560093, load=(requests=11, regions=8, usedHeap=33, maxHeap=996); deleting unassigned node
        2012-01-10 12:19:03,165 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x134c5dbd0a60000 Deleting existing unassigned node for 0b96b5ed4c0426d3b3f13e586179c9bc that is in expected state RS_ZK_REGION_OPENED
        2012-01-10 12:19:03,169 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x134c5dbd0a60000 Successfully deleted unassigned node for region 0b96b5ed4c0426d3b3f13e586179c9bc in expected state RS_ZK_REGION_OPENED
        
        Show
        ramkrishna.s.vasudevan added a comment - 2012-01-10 11:43:34,303 INFO org.apache.hadoop.hbase.master.ServerManager: Received REGION_SPLIT: j9t6,,1326109762514.adcbae41a5024c60c72f5752c6e1c8d4.: Daughters; j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7., j9t6,23443]5767435g,1326176002507.0b96b5ed4c0426d3b3f13e586179c9bc. from linux-129,60020,1326175677339 2012-01-10 12:05:19,122 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs for linux-129,60020,1326175677339 2012-01-10 12:06:07,153 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because processing dead regionserver(s): [linux-129,60020,1326175677339] 2012-01-10 12:09:57,865 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Reassigning 7 region(s) that linux-129,60020,1326175677339 was carrying (skipping 0 regions(s) that are already in transition) 2012-01-10 12:11:30,988 INFO org.apache.hadoop.hbase.master.handler.DisableTableHandler: Attemping to disable table j9t6 2012-01-10 12:12:21,513 INFO org.apache.hadoop.hbase.master.handler.DisableTableHandler: Disabled table is done= true 2012-01-10 12:13:41,624 INFO org.apache.hadoop.hbase.master.handler.TableEventHandler: Handling table operation C_M_DELETE_TABLE on table j9t6 2012-01-10 12:14:00,811 DEBUG org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Deleting region j9t6,,1326109762514.adcbae41a5024c60c72f5752c6e1c8d4. from META and FS 2012-01-10 12:14:02,230 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted region j9t6,,1326109762514.adcbae41a5024c60c72f5752c6e1c8d4. from META 2012-01-10 12:14:07,330 DEBUG org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Deleting region j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7. from META and FS 2012-01-10 12:14:07,521 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted region j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7. from META 2012-01-10 12:14:09,860 DEBUG org.apache.hadoop.hbase.master.handler.DeleteTableHandler: Deleting region j9t6,23443]5767435g,1326176002507.0b96b5ed4c0426d3b3f13e586179c9bc. from META and FS 2012-01-10 12:14:10,096 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted region j9t6,23443]5767435g,1326176002507.0b96b5ed4c0426d3b3f13e586179c9bc. from META 2012-01-10 12:18:11,081 DEBUG org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Offlined and split region j9t6,,1326109762514.adcbae41a5024c60c72f5752c6e1c8d4.; checking daughter presence 2012-01-10 12:18:46,450 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Fixup; missing daughter j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7. 2012-01-10 12:18:46,775 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Added daughter j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7. in region .META.,,1, serverInfo= null 2012-01-10 12:18:47,135 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x134c5dbd0a60000 Creating (or updating) unassigned node for 49c3665a4bc656f3f6473659b64798f7 with OFFLINE state 2012-01-10 12:18:47,142 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7. so generated a random one; hri=j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7., src=, dest=linux146,60020,1326169560093; 1 (online=1, exclude= null ) available servers 2012-01-10 12:18:47,143 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7. to linux146,60020,1326169560093 2012-01-10 12:18:47,155 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handle region called from node nodeDataChanged 2012-01-10 12:18:47,155 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux146,60020,1326169560093, region=49c3665a4bc656f3f6473659b64798f7 2012-01-10 12:18:47,202 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handle region called from node nodeDataChanged 2012-01-10 12:18:47,202 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux146,60020,1326169560093, region=49c3665a4bc656f3f6473659b64798f7 2012-01-10 12:18:47,221 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handle region called from node nodeDataChanged 2012-01-10 12:18:47,221 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=linux146,60020,1326169560093, region=49c3665a4bc656f3f6473659b64798f7 2012-01-10 12:18:47,222 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7. from serverName=linux146,60020,1326169560093, load=(requests=0, regions=7, usedHeap=30, maxHeap=996); deleting unassigned node 2012-01-10 12:18:47,222 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x134c5dbd0a60000 Deleting existing unassigned node for 49c3665a4bc656f3f6473659b64798f7 that is in expected state RS_ZK_REGION_OPENED 2012-01-10 12:18:47,230 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x134c5dbd0a60000 Successfully deleted unassigned node for region 49c3665a4bc656f3f6473659b64798f7 in expected state RS_ZK_REGION_OPENED 2012-01-10 12:18:47,232 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has opened the region j9t6,,1326176002507.49c3665a4bc656f3f6473659b64798f7. that was online on serverName=linux146,60020,1326169560093, load=(requests=0, regions=7, usedHeap=30, maxHeap=996) 2012-01-10 12:19:01,801 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Fixup; missing daughter j9t6,23443]5767435g,1326176002507.0b96b5ed4c0426d3b3f13e586179c9bc. 2012-01-10 12:19:02,261 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Added daughter j9t6,23443]5767435g,1326176002507.0b96b5ed4c0426d3b3f13e586179c9bc. in region .META.,,1, serverInfo= null 2012-01-10 12:19:02,984 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x134c5dbd0a60000 Creating (or updating) unassigned node for 0b96b5ed4c0426d3b3f13e586179c9bc with OFFLINE state 2012-01-10 12:19:02,992 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for j9t6,23443]5767435g,1326176002507.0b96b5ed4c0426d3b3f13e586179c9bc. so generated a random one; hri=j9t6,23443]5767435g,1326176002507.0b96b5ed4c0426d3b3f13e586179c9bc., src=, dest=linux146,60020,1326169560093; 1 (online=1, exclude= null ) available servers 2012-01-10 12:19:02,992 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region j9t6,23443]5767435g,1326176002507.0b96b5ed4c0426d3b3f13e586179c9bc. to linux146,60020,1326169560093 2012-01-10 12:19:03,062 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handle region called from node nodeDataChanged 2012-01-10 12:19:03,062 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux146,60020,1326169560093, region=0b96b5ed4c0426d3b3f13e586179c9bc 2012-01-10 12:19:03,107 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handle region called from node nodeDataChanged 2012-01-10 12:19:03,108 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux146,60020,1326169560093, region=0b96b5ed4c0426d3b3f13e586179c9bc 2012-01-10 12:19:03,164 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handle region called from node nodeDataChanged 2012-01-10 12:19:03,164 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=linux146,60020,1326169560093, region=0b96b5ed4c0426d3b3f13e586179c9bc 2012-01-10 12:19:03,165 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for j9t6,23443]5767435g,1326176002507.0b96b5ed4c0426d3b3f13e586179c9bc. from serverName=linux146,60020,1326169560093, load=(requests=11, regions=8, usedHeap=33, maxHeap=996); deleting unassigned node 2012-01-10 12:19:03,165 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x134c5dbd0a60000 Deleting existing unassigned node for 0b96b5ed4c0426d3b3f13e586179c9bc that is in expected state RS_ZK_REGION_OPENED 2012-01-10 12:19:03,169 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x134c5dbd0a60000 Successfully deleted unassigned node for region 0b96b5ed4c0426d3b3f13e586179c9bc in expected state RS_ZK_REGION_OPENED
        Hide
        ramkrishna.s.vasudevan added a comment -

        @Stack
        After analysing the code found one thing. May be avoiding SSH and DisableTableHandler and DeleteTableHandler parallely is a bigger discussion.
        But the above problem can be solved.
        In SSH

          public static boolean processDeadRegion(HRegionInfo hri, Result result,
              AssignmentManager assignmentManager, CatalogTracker catalogTracker)
          throws IOException {
            // If table is not disabled but the region is offlined,
            boolean disabled = assignmentManager.getZKTable().isDisabledTable(
                hri.getTableDesc().getNameAsString());
        

        we check if the table is disabled. But if you look at the above logs it is the DeleteTableHandler that has already deleted the region and also removed the cache from ZkTable.

        am.getZKTable().setEnabledTable(Bytes.toString(tableName));
        

        Currently setEnabledTable means removing the entry from the map. So we do not have a differentiation between enabled table and delete the table because both places we remove from the cache map.

        So can we use the unused TableState.ENABLED in case of enable table handler and only delete table handler will remove it.
        This will ensure that in SSH.processDeadRegion() we can first check if the table is not present in the map and then proceed. If not present we can ensure that the table is already deleted.
        Pls give your opinion.

        Show
        ramkrishna.s.vasudevan added a comment - @Stack After analysing the code found one thing. May be avoiding SSH and DisableTableHandler and DeleteTableHandler parallely is a bigger discussion. But the above problem can be solved. In SSH public static boolean processDeadRegion(HRegionInfo hri, Result result, AssignmentManager assignmentManager, CatalogTracker catalogTracker) throws IOException { // If table is not disabled but the region is offlined, boolean disabled = assignmentManager.getZKTable().isDisabledTable( hri.getTableDesc().getNameAsString()); we check if the table is disabled. But if you look at the above logs it is the DeleteTableHandler that has already deleted the region and also removed the cache from ZkTable. am.getZKTable().setEnabledTable(Bytes.toString(tableName)); Currently setEnabledTable means removing the entry from the map. So we do not have a differentiation between enabled table and delete the table because both places we remove from the cache map. So can we use the unused TableState.ENABLED in case of enable table handler and only delete table handler will remove it. This will ensure that in SSH.processDeadRegion() we can first check if the table is not present in the map and then proceed. If not present we can ensure that the table is already deleted. Pls give your opinion.
        Hide
        Ted Yu added a comment -

        +1 on utilizing TableState.ENABLED
        Nice finding, Ram.

        Show
        Ted Yu added a comment - +1 on utilizing TableState.ENABLED Nice finding, Ram.
        Hide
        chunhui shen added a comment -

        I think using TableState.ENABLED is helpful and HMaster.TableDescriptors has similar function.

        But in this issue should we consider the situation of the deleted table is created again?
        Maybe SSH could differentiate the above situation.

        Show
        chunhui shen added a comment - I think using TableState.ENABLED is helpful and HMaster.TableDescriptors has similar function. But in this issue should we consider the situation of the deleted table is created again? Maybe SSH could differentiate the above situation.
        Hide
        stack added a comment -

        Ugh, I wrote a comment and lost it.

        So, when you say above that '-> The tables and its regions are deleted including R1, D1 and D2.. (So META is cleaned)', you are saying that this happens AFTER SSH has scanned .META. and that we're doing the region processing AFTER the deletes (I was going to say its odd that we fixup a daughter when parent is missing but checking .META. and filesystem before each daughter fixup would still have a hole during which a delete could come in....)

        On keeping a TableState.ENABLED up in zk, that could work (Can't remember why didn't do it that way originally – only thought is that was trying to save on the state kept up in zk which is a pretty pathetic reason). You'll need to add an AM.isEnabledTable method to match the isDisabledTable, etc., stuff that is already there. Good stuff Ram.

        Show
        stack added a comment - Ugh, I wrote a comment and lost it. So, when you say above that '-> The tables and its regions are deleted including R1, D1 and D2.. (So META is cleaned)', you are saying that this happens AFTER SSH has scanned .META. and that we're doing the region processing AFTER the deletes (I was going to say its odd that we fixup a daughter when parent is missing but checking .META. and filesystem before each daughter fixup would still have a hole during which a delete could come in....) On keeping a TableState.ENABLED up in zk, that could work (Can't remember why didn't do it that way originally – only thought is that was trying to save on the state kept up in zk which is a pretty pathetic reason). You'll need to add an AM.isEnabledTable method to match the isDisabledTable, etc., stuff that is already there. Good stuff Ram.
        Hide
        ramkrishna.s.vasudevan added a comment -

        I could not upload the patch today as still some test case is failing. Will upload it tomorrow.

        Show
        ramkrishna.s.vasudevan added a comment - I could not upload the patch today as still some test case is failing. Will upload it tomorrow.
        Hide
        ramkrishna.s.vasudevan added a comment -

        @Stack
        I am afraid will this cause any compatability issues? Because now we try to create the enabled node and enabled state. If the master fails over we still go with the node presence in zk to form the zkTable.cache.
        So if we have a rolling restart scenario then this will be a problem right ? Previously the table node will not be present for the Enabled state but now we will create it.

        Show
        ramkrishna.s.vasudevan added a comment - @Stack I am afraid will this cause any compatability issues? Because now we try to create the enabled node and enabled state. If the master fails over we still go with the node presence in zk to form the zkTable.cache. So if we have a rolling restart scenario then this will be a problem right ? Previously the table node will not be present for the Enabled state but now we will create it.
        Hide
        ramkrishna.s.vasudevan added a comment -

        TestRollingRestart is passing.
        I have tried to handle the different scenarios. TestMasterFailOver related scenarios also handled.

        Show
        ramkrishna.s.vasudevan added a comment - TestRollingRestart is passing. I have tried to handle the different scenarios. TestMasterFailOver related scenarios also handled.
        Hide
        Ted Yu added a comment -

        In AssignmentManager.java, setEnabledTable():

        +      LOG.error("Unable to ensure that the table will be"
        +          + " enabled because of a ZooKeeper issue");
        

        Please include tableName in the log.

        In bulkAssignUserRegions():

        +    List<HRegionInfo> regionsList = java.util.Arrays.asList(regions);
        +    for (HRegionInfo regionInfo : regionsList) {
        

        Can we directly iterate over regions array ?

        In ZKTable.java:

        -      if (!isEnabledOrDisablingTable(tableName)) {
        +      if (isEnabledOrDisablingTable(tableName)) {
                 LOG.warn("Moving table " + tableName + " state to disabling but was " +
                   "not first in enabled state: " + this.cache.get(tableName));
        

        Why was the above change necessary ? Now the warning doesn't match the check.

        I see some long line:

              TEST_UTIL.createTable(TABLENAME, FAMILYNAME);
        +     assertTrue(m.assignmentManager.getZKTable().isEnabledTable(Bytes.toString(TABLENAME)));
        

        Overall, this patch looks very good.
        Thanks for plugging a hole w.r.t. cache in ZkTable.

        Show
        Ted Yu added a comment - In AssignmentManager.java, setEnabledTable(): + LOG.error( "Unable to ensure that the table will be" + + " enabled because of a ZooKeeper issue" ); Please include tableName in the log. In bulkAssignUserRegions(): + List<HRegionInfo> regionsList = java.util.Arrays.asList(regions); + for (HRegionInfo regionInfo : regionsList) { Can we directly iterate over regions array ? In ZKTable.java: - if (!isEnabledOrDisablingTable(tableName)) { + if (isEnabledOrDisablingTable(tableName)) { LOG.warn( "Moving table " + tableName + " state to disabling but was " + "not first in enabled state: " + this .cache.get(tableName)); Why was the above change necessary ? Now the warning doesn't match the check. I see some long line: TEST_UTIL.createTable(TABLENAME, FAMILYNAME); + assertTrue(m.assignmentManager.getZKTable().isEnabledTable(Bytes.toString(TABLENAME))); Overall, this patch looks very good. Thanks for plugging a hole w.r.t. cache in ZkTable.
        Hide
        ramkrishna.s.vasudevan added a comment -

        @Ted
        Thanks for your review. I will address all the comments. I will do some more testing tomorrow and then submit an updated patch.
        Currently i can verify in 0.90. Trunk will be doing sometime later next week.

        Show
        ramkrishna.s.vasudevan added a comment - @Ted Thanks for your review. I will address all the comments. I will do some more testing tomorrow and then submit an updated patch. Currently i can verify in 0.90. Trunk will be doing sometime later next week.
        Hide
        Ted Yu added a comment -

        Test suite passed based on Ram's patch.

        Show
        Ted Yu added a comment - Test suite passed based on Ram's patch.
        Hide
        ramkrishna.s.vasudevan added a comment -

        The latest patch addresses the rolling restart scenarios also. One thing is as HBASE-4083 is not checked into 0.90 the scenario pertaining to that defect will not be supported.
        Tested the following
        -> Master fail over with and with out patch
        -> RS fail over
        -> RS with parital disable state

        One thing is this patch should get applied on the master for the patch to take into effect because enabling, disabling of tables is started by the master.

        Show
        ramkrishna.s.vasudevan added a comment - The latest patch addresses the rolling restart scenarios also. One thing is as HBASE-4083 is not checked into 0.90 the scenario pertaining to that defect will not be supported. Tested the following -> Master fail over with and with out patch -> RS fail over -> RS with parital disable state One thing is this patch should get applied on the master for the patch to take into effect because enabling, disabling of tables is started by the master.
        Hide
        ramkrishna.s.vasudevan added a comment -

        Pls provide your comments.

        Show
        ramkrishna.s.vasudevan added a comment - Pls provide your comments.
        Hide
        Ted Yu added a comment -

        +1.

        Minor comments:

        +        if (true == checkIfRegionBelongsToDisabled(regionInfo)) {
        +          disabled = true;
        +        }
        

        Can the above be written as:

          disabled = checkIfRegionBelongsToDisabled(regionInfo);
        
        +        // need to enable the table if not disable or disabling
        

        Should read 'not disabled ...'

        Show
        Ted Yu added a comment - +1. Minor comments: + if ( true == checkIfRegionBelongsToDisabled(regionInfo)) { + disabled = true ; + } Can the above be written as: disabled = checkIfRegionBelongsToDisabled(regionInfo); + // need to enable the table if not disable or disabling Should read 'not disabled ...'
        Hide
        Ted Yu added a comment -

        I tried to port the patch to trunk.
        It turns out that AssignmentManager.java is quite different between 0.90 and trunk.
        e.g. the following code in rebuildUserRegions() of 0.90:

        Set<String> disablingTables = new HashSet<String>(1);
        

        But in trunk, disablingTables is a field in AssignmentManager

        Show
        Ted Yu added a comment - I tried to port the patch to trunk. It turns out that AssignmentManager.java is quite different between 0.90 and trunk. e.g. the following code in rebuildUserRegions() of 0.90: Set< String > disablingTables = new HashSet< String >(1); But in trunk, disablingTables is a field in AssignmentManager
        Hide
        ramkrishna.s.vasudevan added a comment -

        Yes Ted. Some of that part was refactored by some defect by Ming.

        Also in trunk as HBASE-4083 is there we have disablingTables and also enablingTables. So for trunk may be we may have to apply the changes considering the code there.

        Show
        ramkrishna.s.vasudevan added a comment - Yes Ted. Some of that part was refactored by some defect by Ming. Also in trunk as HBASE-4083 is there we have disablingTables and also enablingTables. So for trunk may be we may have to apply the changes considering the code there.
        Hide
        ramkrishna.s.vasudevan added a comment -

        Updated patch addressing Ted's comments.

        Show
        ramkrishna.s.vasudevan added a comment - Updated patch addressing Ted's comments.
        Hide
        stack added a comment -

        So if we have a rolling restart scenario then this will be a problem right ? Previously the table node will not be present for the Enabled state but now we will create it.

        Have you tried it? In rolling restart we'll upgrade the master first usually. Won't it know how to deal w/ new zk node for ENABLED state?

        FYI, don't do these kinda changes in future:

        -      for (HRegionInfo region: regions) {
        +      for (HRegionInfo region : regions) {
        

        What was there previous was fine... It adds bulk to your patch.

        This looks like a method used internally by AM only. Does it need to be public?

        +  public void setEnabledTable(String tableName) {
        

        In processDeadRegion, should we check parent exists before doing daughter fixups? (It could have been deleted?)

        I don't undersand this comment:

        +    // Enable the ROOT table if on process fail over the RS containing ROOT
        +    // was active.
        

        Same for the one on .meta.

        Why we have to enable the meta and root tables? Aren't they always on?

        Is this right:

        +   * Check if the table is in DISABLED state in cache
        

        Is it just checking cache? This class gets updated when the zk changes right? So its not just a 'cache'? I think should drop 'from cache' in your public javadoc.

        Same for isDisabling, etc

        Is this right below:

           public boolean isEnabledTable(String tableName) {
        -    synchronized (this.cache) {
        -      // No entry in cache means enabled table.
        -      return !this.cache.containsKey(tableName);
        -    }
        +    return isTableState(tableName, TableState.ENABLED);
        

        Else patch looks good to me. Was afraid it too much for 0.90.6 but its looking ok.

        Show
        stack added a comment - So if we have a rolling restart scenario then this will be a problem right ? Previously the table node will not be present for the Enabled state but now we will create it. Have you tried it? In rolling restart we'll upgrade the master first usually. Won't it know how to deal w/ new zk node for ENABLED state? FYI, don't do these kinda changes in future: - for (HRegionInfo region: regions) { + for (HRegionInfo region : regions) { What was there previous was fine... It adds bulk to your patch. This looks like a method used internally by AM only. Does it need to be public? + public void setEnabledTable( String tableName) { In processDeadRegion, should we check parent exists before doing daughter fixups? (It could have been deleted?) I don't undersand this comment: + // Enable the ROOT table if on process fail over the RS containing ROOT + // was active. Same for the one on .meta. Why we have to enable the meta and root tables? Aren't they always on? Is this right: + * Check if the table is in DISABLED state in cache Is it just checking cache? This class gets updated when the zk changes right? So its not just a 'cache'? I think should drop 'from cache' in your public javadoc. Same for isDisabling, etc Is this right below: public boolean isEnabledTable( String tableName) { - synchronized ( this .cache) { - // No entry in cache means enabled table. - return ! this .cache.containsKey(tableName); - } + return isTableState(tableName, TableState.ENABLED); Else patch looks good to me. Was afraid it too much for 0.90.6 but its looking ok.
        Hide
        Ted Yu added a comment -

        Minor comment:

        +    boolean istableEnabled = this.zkTable.isEnabledTable(tableName);
        

        istableEnabled should be named isTableEnabled.

        @Stack:
        w.r.t the following comment:

        +    // Enable the ROOT table if on process fail over the RS containing ROOT
        +    // was active.
        

        AssignmentManager delegates to this.zkTable.setEnabledTable(). This is to set the meta tables enabled in ZkTable cache.

        Show
        Ted Yu added a comment - Minor comment: + boolean istableEnabled = this .zkTable.isEnabledTable(tableName); istableEnabled should be named isTableEnabled. @Stack: w.r.t the following comment: + // Enable the ROOT table if on process fail over the RS containing ROOT + // was active. AssignmentManager delegates to this.zkTable.setEnabledTable(). This is to set the meta tables enabled in ZkTable cache.
        Hide
        ramkrishna.s.vasudevan added a comment -

        This looks like a method used internally by AM only. Does it need to be public?

        +  public void setEnabledTable(String tableName) {
        

        I did not have this as public in the beginning. But later in HMaster.rebuildUserRegions() i had to set enabled table. So thought of exposing this from AM so that i can use it there instead of repeating the same code in HMaster.

        Have you tried it? In rolling restart we'll upgrade the master first usually. Won't it know how to deal w/ new zk node for ENABLED state?

        If master is restarted first even then the above changes will be necessary as when the master builds the table state he will not find the ENABLED state in zk. So the above changes in Master will help him to build that state. Yes rolling restart was tested.

        FYI, don't do these kinda changes in future:

        When applying a formatter it happened. Sure Stack i will take care of those changes.

        public boolean isEnabledTable(String tableName) {
        -    synchronized (this.cache) {
        -      // No entry in cache means enabled table.
        -      return !this.cache.containsKey(tableName);
        -    }
        +    return isTableState(tableName, TableState.ENABLED);
        

        The isTableState will anyway have a synchronized(this.cache) so it should be ok?

        +   * Check if the table is in DISABLED state in cache
        

        My idea of adding 'in cache' was like the state is checked only in Memory and it is not going to zk to check the state . So i thought like the 'in cache' word will tell the user like to ZK is used in checking it.

        +    // Enable the ROOT table if on process fail over the RS containing ROOT
        +    // was active.
        

        This scenario comes when the master is restarted but the RS is still alive. Now the master should enable the ROOT and META also because when he comes up he should create the enabled node in zk.
        If we don't do this step then for ROOT and META we will not have a node in zk in the above scenario.
        But if the master explicitly assign ROOT and META then there will be a zk node. So to unify this i had to do the zkTable.setEnabledTable().
        @Stack
        Is it fine Stack? I can reprepare a patch based on your feedback and then upload a final one?
        @Ted
        You have any more comments or feedback so that i can incorporate in the next patch.

        Show
        ramkrishna.s.vasudevan added a comment - This looks like a method used internally by AM only. Does it need to be public? + public void setEnabledTable( String tableName) { I did not have this as public in the beginning. But later in HMaster.rebuildUserRegions() i had to set enabled table. So thought of exposing this from AM so that i can use it there instead of repeating the same code in HMaster. Have you tried it? In rolling restart we'll upgrade the master first usually. Won't it know how to deal w/ new zk node for ENABLED state? If master is restarted first even then the above changes will be necessary as when the master builds the table state he will not find the ENABLED state in zk. So the above changes in Master will help him to build that state. Yes rolling restart was tested. FYI, don't do these kinda changes in future: When applying a formatter it happened. Sure Stack i will take care of those changes. public boolean isEnabledTable( String tableName) { - synchronized ( this .cache) { - // No entry in cache means enabled table. - return ! this .cache.containsKey(tableName); - } + return isTableState(tableName, TableState.ENABLED); The isTableState will anyway have a synchronized(this.cache) so it should be ok? + * Check if the table is in DISABLED state in cache My idea of adding 'in cache' was like the state is checked only in Memory and it is not going to zk to check the state . So i thought like the 'in cache' word will tell the user like to ZK is used in checking it. + // Enable the ROOT table if on process fail over the RS containing ROOT + // was active. This scenario comes when the master is restarted but the RS is still alive. Now the master should enable the ROOT and META also because when he comes up he should create the enabled node in zk. If we don't do this step then for ROOT and META we will not have a node in zk in the above scenario. But if the master explicitly assign ROOT and META then there will be a zk node. So to unify this i had to do the zkTable.setEnabledTable(). @Stack Is it fine Stack? I can reprepare a patch based on your feedback and then upload a final one? @Ted You have any more comments or feedback so that i can incorporate in the next patch.
        Hide
        Ted Yu added a comment -

        will tell the user like to ZK is used in checking it.

        I think you wanted to say 'will tell the user like no ZK is used in checking it.'

        I don't have other comments, except the one @ 13/Jan/12 21:34

        If you have time, please port this to 0.92
        Otherwise we can open another JIRA.

        Good job, Ramkrishna.

        Show
        Ted Yu added a comment - will tell the user like to ZK is used in checking it. I think you wanted to say 'will tell the user like no ZK is used in checking it.' I don't have other comments, except the one @ 13/Jan/12 21:34 If you have time, please port this to 0.92 Otherwise we can open another JIRA. Good job, Ramkrishna.
        Hide
        ramkrishna.s.vasudevan added a comment -

        Addressing comments and also avoided unnecessary zookeeper check to enable a table under process fail over flow

        Show
        ramkrishna.s.vasudevan added a comment - Addressing comments and also avoided unnecessary zookeeper check to enable a table under process fail over flow
        Hide
        Ted Yu added a comment -

        HBASE-5155_2.patch looks good to me.

        Show
        Ted Yu added a comment - HBASE-5155 _2.patch looks good to me.
        Hide
        ramkrishna.s.vasudevan added a comment -

        I am planning to commit this today.

        Show
        ramkrishna.s.vasudevan added a comment - I am planning to commit this today.
        Hide
        ramkrishna.s.vasudevan added a comment -
        
             if (hri.isOffline() && hri.isSplit()) {
        -      LOG.debug("Offlined and split region " + hri.getRegionNameAsString() +
        -        "; checking daughter presence");
        +      LOG.debug("Offlined and split region " + hri.getRegionNameAsString()
        +          + "; checking daughter presence");
        +      if (MetaReader.getRegion(catalogTracker, hri.getRegionName()) == null) {
        +        return false;
        +      }
        
        

        Just added the above code as Stack commented.

        Show
        ramkrishna.s.vasudevan added a comment - if (hri.isOffline() && hri.isSplit()) { - LOG.debug( "Offlined and split region " + hri.getRegionNameAsString() + - "; checking daughter presence" ); + LOG.debug( "Offlined and split region " + hri.getRegionNameAsString() + + "; checking daughter presence" ); + if (MetaReader.getRegion(catalogTracker, hri.getRegionName()) == null ) { + return false ; + } Just added the above code as Stack commented.
        Hide
        ramkrishna.s.vasudevan added a comment -

        Committed to 0.90.
        Stack , Ted and Chunhui for the review.

        Show
        ramkrishna.s.vasudevan added a comment - Committed to 0.90. Stack , Ted and Chunhui for the review.
        Hide
        ramkrishna.s.vasudevan added a comment -

        committed to branch 0.90.

        Show
        ramkrishna.s.vasudevan added a comment - committed to branch 0.90.
        Hide
        Jean-Daniel Cryans added a comment -

        Please don't forget to set the assignee.

        Show
        Jean-Daniel Cryans added a comment - Please don't forget to set the assignee.
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #2680 (See https://builds.apache.org/job/HBase-TRUNK/2680/)
        HBASE-5206 port HBASE-5155 to TRUNK (Ashutosh Jindal) (Revision 1300711)

        Result = FAILURE
        tedyu :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/DeleteTableHandler.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterRestartAfterDisablingTable.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKTable.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK #2680 (See https://builds.apache.org/job/HBase-TRUNK/2680/ ) HBASE-5206 port HBASE-5155 to TRUNK (Ashutosh Jindal) (Revision 1300711) Result = FAILURE tedyu : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/DeleteTableHandler.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMaster.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterRestartAfterDisablingTable.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKTable.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK-security #138 (See https://builds.apache.org/job/HBase-TRUNK-security/138/)
        HBASE-5206 port HBASE-5155 to TRUNK (Ashutosh Jindal) (Revision 1300711)

        Result = SUCCESS
        tedyu :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/DeleteTableHandler.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterRestartAfterDisablingTable.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKTable.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK-security #138 (See https://builds.apache.org/job/HBase-TRUNK-security/138/ ) HBASE-5206 port HBASE-5155 to TRUNK (Ashutosh Jindal) (Revision 1300711) Result = SUCCESS tedyu : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/DeleteTableHandler.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMaster.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterRestartAfterDisablingTable.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKTable.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #2685 (See https://builds.apache.org/job/HBase-TRUNK/2685/)
        HBASE-5206 Port HBASE-5155 to trunk (Ashutosh Jindal) (Revision 1301709)

        Result = SUCCESS
        tedyu :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/DeleteTableHandler.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterRestartAfterDisablingTable.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKTable.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK #2685 (See https://builds.apache.org/job/HBase-TRUNK/2685/ ) HBASE-5206 Port HBASE-5155 to trunk (Ashutosh Jindal) (Revision 1301709) Result = SUCCESS tedyu : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/DeleteTableHandler.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMaster.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterRestartAfterDisablingTable.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKTable.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.94 #36 (See https://builds.apache.org/job/HBase-0.94/36/)
        HBASE-5206 port HBASE-5155 to 0.94 (Ashutosh Jindal) (Revision 1301737)

        Result = SUCCESS
        tedyu :
        Files :

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/handler/DeleteTableHandler.java
        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java
        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java
        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java
        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestMasterRestartAfterDisablingTable.java
        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKTable.java
        Show
        Hudson added a comment - Integrated in HBase-0.94 #36 (See https://builds.apache.org/job/HBase-0.94/36/ ) HBASE-5206 port HBASE-5155 to 0.94 (Ashutosh Jindal) (Revision 1301737) Result = SUCCESS tedyu : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/HMaster.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/handler/DeleteTableHandler.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestMaster.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestMasterRestartAfterDisablingTable.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKTable.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK-security #140 (See https://builds.apache.org/job/HBase-TRUNK-security/140/)
        HBASE-5206 Port HBASE-5155 to trunk (Ashutosh Jindal) (Revision 1301709)

        Result = SUCCESS
        tedyu :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/DeleteTableHandler.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterRestartAfterDisablingTable.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKTable.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK-security #140 (See https://builds.apache.org/job/HBase-TRUNK-security/140/ ) HBASE-5206 Port HBASE-5155 to trunk (Ashutosh Jindal) (Revision 1301709) Result = SUCCESS tedyu : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/DeleteTableHandler.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMaster.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterRestartAfterDisablingTable.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKTable.java
        Hide
        ramkrishna.s.vasudevan added a comment -

        I have added a release note on this issue. Pls review.
        Sorry about the problem introduced.

        Show
        ramkrishna.s.vasudevan added a comment - I have added a release note on this issue. Pls review. Sorry about the problem introduced.
        Hide
        ramkrishna.s.vasudevan added a comment -

        Will close this once the release note is reviewed.

        Show
        ramkrishna.s.vasudevan added a comment - Will close this once the release note is reviewed.
        Hide
        David S. Wang added a comment -

        Ram,

        > If the HBase client does not have the changes for HBASE-5155 and the server has the changes for HBASE-5155, then if we try to Enable a table then the client will hang.

        Actually, I noticed that the hang happens in the opposite case: when the client has the changes for HBASE-5155, and the server does not.

        Otherwise the release note looks OK to me.

        Show
        David S. Wang added a comment - Ram, > If the HBase client does not have the changes for HBASE-5155 and the server has the changes for HBASE-5155 , then if we try to Enable a table then the client will hang. Actually, I noticed that the hang happens in the opposite case: when the client has the changes for HBASE-5155 , and the server does not. Otherwise the release note looks OK to me.
        Hide
        ramkrishna.s.vasudevan added a comment -

        @David
        Updated the release notes. Thanks for your review.

        Show
        ramkrishna.s.vasudevan added a comment - @David Updated the release notes. Thanks for your review.
        Hide
        stack added a comment -

        Should we revert and roll a 0.90.7?

        Show
        stack added a comment - Should we revert and roll a 0.90.7?

          People

          • Assignee:
            ramkrishna.s.vasudevan
            Reporter:
            ramkrishna.s.vasudevan
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:

              Development