HBase
  1. HBase
  2. HBASE-6294

Detect leftover data in ZK after a user delete all its HBase data

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Incomplete
    • Affects Version/s: 0.94.0
    • Fix Version/s: 0.95.1
    • Component/s: None
    • Labels:
      None

      Description

      It seems we have a new failure mode when a user deletes the hbase root.dir but doesn't delete the ZK data. For example a user on IRC came with this log:

      2012-06-30 09:07:48,017 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: kw,,1340981821308.2e8a318837602c9c9961e9d690b7fd02.
      2012-06-30 09:07:48,017 WARN org.apache.hadoop.hbase.util.FSTableDescriptors: The following folder is in HBase's root directory and doesn't contain a table descriptor, do consider deleting it: kw
      2012-06-30 09:07:48,018 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:34193-0x1383bfe01b70001 Attempting to transition node 2e8a318837602c9c9961e9d690b7fd02 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING
      2012-06-30 09:07:48,018 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=localhost,50890,1341036299694, region=2e8a318837602c9c9961e9d690b7fd02
      2012-06-30 09:07:48,020 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_FAILED_OPEN, server=localhost,34193,1341036300138, region=b254af24c9127b8bb22cb6d24e523dad
      2012-06-30 09:07:48,020 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for b254af24c9127b8bb22cb6d24e523dad
      2012-06-30 09:07:48,020 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=kw_r,,1340981822374.b254af24c9127b8bb22cb6d24e523dad. state=CLOSED, ts=1341036467998, server=localhost,34193,1341036300138
      2012-06-30 09:07:48,020 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:50890-0x1383bfe01b70000 Creating (or updating) unassigned node for b254af24c9127b8bb22cb6d24e523dad with OFFLINE state
      2012-06-30 09:07:48,028 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:34193-0x1383bfe01b70001 Successfully transitioned node 2e8a318837602c9c9961e9d690b7fd02 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING
      2012-06-30 09:07:48,028 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Opening region: {NAME => 'kw,,1340981821308.2e8a318837602c9c9961e9d690b7fd02.', STARTKEY => '', ENDKEY => '', ENCODED => 2e8a318837602c9c9961e9d690b7fd02,}
      2012-06-30 09:07:48,029 ERROR org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open of region=kw,,1340981821308.2e8a318837602c9c9961e9d690b7fd02., starting to roll back the global memstore size.
      java.lang.IllegalStateException: Could not instantiate a region instance.
      	at org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:3490)
      	at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3628)
      	at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:332)
      	at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:108)
      	at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:169)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
      	at java.lang.Thread.run(Thread.java:679)
      Caused by: java.lang.reflect.InvocationTargetException
      	at sun.reflect.GeneratedConstructorAccessor15.newInstance(Unknown Source)
      	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
      	at java.lang.reflect.Constructor.newInstance(Constructor.java:532)
      	at org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:3487)
      	... 7 more
      Caused by: java.lang.NullPointerException
      	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.loadTableCoprocessors(RegionCoprocessorHost.java:133)
      	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.<init>(RegionCoprocessorHost.java:125)
      	at org.apache.hadoop.hbase.regionserver.HRegion.<init>(HRegion.java:411)
      	... 11 more
      2012-06-30 09:07:48,031 INFO org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opening of region {NAME => 'kw,,1340981821308.2e8a318837602c9c9961e9d690b7fd02.', STARTKEY => '', ENDKEY => '', ENCODED => 2e8a318837602c9c9961e9d690b7fd02,} failed, marking as FAILED_OPEN in ZK
      2012-06-30 09:07:48,032 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:34193-0x1383bfe01b70001 Attempting to transition node 2e8a318837602c9c9961e9d690b7fd02 from RS_ZK_REGION_OPENING to RS_ZK_REGION_FAILED_OPEN
      2012-06-30 09:07:48,031 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=localhost,34193,1341036300138, region=2e8a318837602c9c9961e9d690b7fd02
      2012-06-30 09:07:48,043 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=localhost,50890,1341036299694, region=b254af24c9127b8bb22cb6d24e523dad
      

      The exception itself is not very useful, nor is the NPE deep in the coproc stack. What was really useful was this:

      2012-06-30 09:07:48,017 WARN org.apache.hadoop.hbase.util.FSTableDescriptors: The following folder is in HBase's root directory and doesn't contain a table descriptor, do consider deleting it: kw
      

      So the HBase wants to assign a region from a table that doesn't exist and we fail in an obscure way. I told the user to shut down HBase, nuke /tmp/hbase-user as it will contain both the HBase data and the ZK data, and restart. It worked.

      This situation is new in 0.94, we need to detect it so our users have a better experience getting started with HBase.

        Activity

          People

          • Assignee:
            Unassigned
            Reporter:
            Jean-Daniel Cryans
          • Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development