HBase
  1. HBase
  2. HBASE-826

delete table followed by recreation results in honked table

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.2.1, 0.18.0
    • Component/s: None
    • Labels:
      None

      Description

      Daniel Leffel suspected that delete and then recreate causes issues. I tried it on our little cluster. I'm doing a MR load up into the newly created table and after a few million rows, the MR job just hangs. Its looking for a region that doesn't exist:

      2008-08-13 03:32:36,840 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=
      2008-08-13 03:32:36,940 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
      2008-08-13 03:32:37,420 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Found ROOT REGION => {NAME => '-ROOT-,,0', STARTKEY => '', ENDKEY => '', ENCODED => 70236052, TABLE => {{NAME => '-ROOT-', IS_ROOT => 'true', IS_META => 'true', FAMILIES => [{NAME => 'info', BLOOMFILTER => 'false', COMPRESSION => 'NONE', VERSIONS => '1', LENGTH => '2147483647', TTL => '-1', IN_MEMORY => 'false', BLOCKCACHE => 'false'}]}}
      2008-08-13 03:32:37,541 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: reloading table servers because: HRegionInfo was null or empty in .META.
      2008-08-13 03:32:37,541 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Removed .META.,,1 from cache because of TestTable,0008388608,99999999999999
      2008-08-13 03:32:37,544 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Found ROOT REGION => {NAME => '-ROOT-,,0', STARTKEY => '', ENDKEY => '', ENCODED => 70236052, TABLE => {{NAME => '-ROOT-', IS_ROOT => 'true', IS_META => 'true', FAMILIES => [{NAME => 'info', BLOOMFILTER => 'false', COMPRESSION => 'NONE', VERSIONS => '1', LENGTH => '2147483647', TTL => '-1', IN_MEMORY => 'false', BLOCKCACHE => 'false'}]}}
      2008-08-13 03:32:47,605 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: reloading table servers because: HRegionInfo was null or empty in .META.
      2008-08-13 03:32:47,606 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Removed .META.,,1 from cache because of TestTable,0008388608,99999999999999
      
      ....
      

      My guess is that its a region that was in the tables' previous incarnation with ghosts left over down inside .META.

      1. 826-v3.patch
        18 kB
        stack
      2. hbase-826_0.3.0.patch
        12 kB
        Jean-Daniel Cryans

        Activity

        Hide
        Jean-Daniel Cryans added a comment -

        We see this again. Tables that have been used for a long time are now giving us that kind of problem under a MR.

        Show
        Jean-Daniel Cryans added a comment - We see this again. Tables that have been used for a long time are now giving us that kind of problem under a MR.
        Hide
        stack added a comment -

        Commit of 808/809 did not fix this.

        Running a test, what I'm seeing is that the root cause is the reported empty HRegionInfo

        Show
        stack added a comment - Commit of 808/809 did not fix this. Running a test, what I'm seeing is that the root cause is the reported empty HRegionInfo
        Hide
        stack added a comment -

        I'm seeing that the getClosestAtOrBefore is finding rows from previous incarnations of a table. This triggers the stuck client over and over emitting 'reloading .. because HRI was null or empty in ".META."'.

        On table delete, not all cells on a row in .META. are removed. If a new table is created with the same name and schema as a just-deleted table, the first region with a null start and end row works fine for a while. We split and then we may get lucky for a while but inevitably, the getClosestAtOrBefore will be queried with a row whose just-before will be from a previous incarnation of the table, only its a cell from the historian column family or its some old startcode that wasn't properly removed; so getClosestAtOrBefore returns a closet row where there is no HRI. Client gets stuck because it keeps asking server for closest row for row 'X' and it keeps getting back the unsatisfactory answer.

        We've had this issue since before 0.2.0 RC1. Its what daniel and sebastien were seeing just before RC2 but we tried to accredit the issue first to a hdfs storm and to the running of an instance with hbase-751 not fixed. We've not had it in test because our unit tests are regular and split in same place every time (so no chance for ghost regions to shine through).

        I'm still trying to figure how we don't successfully deleteall. Seems like table has to be big for the problem to start occurring.

        After figuring the issue, we then need to make a tool that will fix up people's .META. tables removing any vestige of deleted tables.

        'Workaround' is not to reuse a table name until this is fixed.

        Show
        stack added a comment - I'm seeing that the getClosestAtOrBefore is finding rows from previous incarnations of a table. This triggers the stuck client over and over emitting 'reloading .. because HRI was null or empty in ".META."'. On table delete, not all cells on a row in .META. are removed. If a new table is created with the same name and schema as a just-deleted table, the first region with a null start and end row works fine for a while. We split and then we may get lucky for a while but inevitably, the getClosestAtOrBefore will be queried with a row whose just-before will be from a previous incarnation of the table, only its a cell from the historian column family or its some old startcode that wasn't properly removed; so getClosestAtOrBefore returns a closet row where there is no HRI. Client gets stuck because it keeps asking server for closest row for row 'X' and it keeps getting back the unsatisfactory answer. We've had this issue since before 0.2.0 RC1. Its what daniel and sebastien were seeing just before RC2 but we tried to accredit the issue first to a hdfs storm and to the running of an instance with hbase-751 not fixed. We've not had it in test because our unit tests are regular and split in same place every time (so no chance for ghost regions to shine through). I'm still trying to figure how we don't successfully deleteall. Seems like table has to be big for the problem to start occurring. After figuring the issue, we then need to make a tool that will fix up people's .META. tables removing any vestige of deleted tables. 'Workaround' is not to reuse a table name until this is fixed.
        Hide
        stack added a comment -

        After study, it looks like the deletes are in place; its just that when we do the getClosestAtOrBefore, when pulling from StoreFiles, AND we have candidate keys gotten from memcache, there is a code path we take that doesn't record deleted cells.

        The handleDeleted Memcache function needs to be used by HStore.rowAtOrBeforeWithCandidates. See the duplicated code that is in HStore and in handleDeleted? The bit that is missing from HStore is the deletes.add(k). Suggest moving the handleDeleted into HStore, make it a static and then have Memcache and HStore.rowAtOrBeforeWithCandidates use it.

        Doing more testing to verify this indeed fixes things.

        Show
        stack added a comment - After study, it looks like the deletes are in place; its just that when we do the getClosestAtOrBefore, when pulling from StoreFiles, AND we have candidate keys gotten from memcache, there is a code path we take that doesn't record deleted cells. The handleDeleted Memcache function needs to be used by HStore.rowAtOrBeforeWithCandidates. See the duplicated code that is in HStore and in handleDeleted? The bit that is missing from HStore is the deletes.add(k). Suggest moving the handleDeleted into HStore, make it a static and then have Memcache and HStore.rowAtOrBeforeWithCandidates use it. Doing more testing to verify this indeed fixes things.
        Hide
        stack added a comment -

        More testing would seem to indicate the above described fix works for at least the small testcase where small case was upload of 2M rows into a table, delete it, then try same upload. Was failing reliably before this fix.

        Trying with a bigger table.

        Debugging this issue, I made getStores public in HRegion and then in HStores, I made getReader also public. Doing this, I was able to do a little script that could enumerate all keys in the .META. table in a few lines. I think we should make these methods public to facilitate future easier debugging.

        Script looked like this:

        # Script that outputs all keys in .META. store files.
        # HBase MUST be shutdown when you run this script.
        #
        # To run it, do: ${HBASE_HOME}/bin/hbase org.jruby.Main iteratemeta.rb
        include Java
        import org.apache.hadoop.hbase.util.MetaUtils
        import org.apache.hadoop.hbase.HRegionInfo
        import org.apache.hadoop.hbase.HConstants
        import org.apache.hadoop.hbase.HBaseConfiguration
        import org.apache.hadoop.hbase.HStoreKey
        import org.apache.hadoop.hbase.io.ImmutableBytesWritable
        import org.apache.hadoop.hbase.regionserver.HLogEdit
        
        # Get configuration to use.
        c = HBaseConfiguration.new()
        # Set hadoop filesystem configuration using the hbase.rootdir.
        # Otherwise, we'll always use localhost though the hbase.rootdir
        # might be pointing at hdfs location.
        c.set("fs.default.name", c.get(HConstants::HBASE_DIR));
        meta = utils.getMetaRegion(HRegionInfo::FIRST_META_REGIONINFO)
        begin
          for columnDescriptor in meta.getRegionInfo().getTableDesc().getFamilies()
            store = meta.getStore(columnDescriptor.getName())  
            puts "Store: %s" % [store.toString()]
            for reader in store.getReaders()
              puts "Store: %s, reader %s" % [store.toString(), reader.toString()]
              key = HStoreKey.new()
              value = ImmutableBytesWritable.new()
              while (reader.next(key, value))
                puts "%s %s" % [key, (HLogEdit.isDeleted(value.get()) ? "d": "")]
              end 
            end
          end
        ensure
          utils.shutdown()
        end
        
        Show
        stack added a comment - More testing would seem to indicate the above described fix works for at least the small testcase where small case was upload of 2M rows into a table, delete it, then try same upload. Was failing reliably before this fix. Trying with a bigger table. Debugging this issue, I made getStores public in HRegion and then in HStores, I made getReader also public. Doing this, I was able to do a little script that could enumerate all keys in the .META. table in a few lines. I think we should make these methods public to facilitate future easier debugging. Script looked like this: # Script that outputs all keys in .META. store files. # HBase MUST be shutdown when you run this script. # # To run it, do : ${HBASE_HOME}/bin/hbase org.jruby.Main iteratemeta.rb include Java import org.apache.hadoop.hbase.util.MetaUtils import org.apache.hadoop.hbase.HRegionInfo import org.apache.hadoop.hbase.HConstants import org.apache.hadoop.hbase.HBaseConfiguration import org.apache.hadoop.hbase.HStoreKey import org.apache.hadoop.hbase.io.ImmutableBytesWritable import org.apache.hadoop.hbase.regionserver.HLogEdit # Get configuration to use. c = HBaseConfiguration. new () # Set hadoop filesystem configuration using the hbase.rootdir. # Otherwise, we'll always use localhost though the hbase.rootdir # might be pointing at hdfs location. c.set( "fs. default .name" , c.get(HConstants::HBASE_DIR)); meta = utils.getMetaRegion(HRegionInfo::FIRST_META_REGIONINFO) begin for columnDescriptor in meta.getRegionInfo().getTableDesc().getFamilies() store = meta.getStore(columnDescriptor.getName()) puts "Store: %s" % [store.toString()] for reader in store.getReaders() puts "Store: %s, reader %s" % [store.toString(), reader.toString()] key = HStoreKey. new () value = ImmutableBytesWritable. new () while (reader.next(key, value)) puts "%s %s" % [key, (HLogEdit.isDeleted(value.get()) ? "d" : "")] end end end ensure utils.shutdown() end
        Hide
        stack added a comment -

        One other note, if the deletes are actually all in place, then there will be no need of a repair script to fixup horked .META.s. Lets see. Have more tests to run in morning.

        Show
        stack added a comment - One other note, if the deletes are actually all in place, then there will be no need of a repair script to fixup horked .META.s. Lets see. Have more tests to run in morning.
        Hide
        Jean-Daniel Cryans added a comment -

        I will try the described fix on Sebastien's setup this morning since I can reliably recreate this bug for a really big table.

        Show
        Jean-Daniel Cryans added a comment - I will try the described fix on Sebastien's setup this morning since I can reliably recreate this bug for a really big table.
        Hide
        Jean-Daniel Cryans added a comment -

        Fixes the holes described by st^ack. This patch only applies to 0.3.0 for the moment, have to change row locks for 0.2.

        Show
        Jean-Daniel Cryans added a comment - Fixes the holes described by st^ack. This patch only applies to 0.3.0 for the moment, have to change row locks for 0.2.
        Hide
        Jean-Daniel Cryans added a comment -

        Wrong patch was uploaded.

        Show
        Jean-Daniel Cryans added a comment - Wrong patch was uploaded.
        Hide
        stack added a comment -

        Found another hole – need to make copy of HSK before putting into deletes list – but there is still another hole; a delete is not entered for a single server and startcode. Digging.

        Show
        stack added a comment - Found another hole – need to make copy of HSK before putting into deletes list – but there is still another hole; a delete is not entered for a single server and startcode. Digging.
        Hide
        stack added a comment -

        At first it was easy to provoke the issue. Two regions was sufficient. Now with the J-D patch in place, things are better for sure. On small tables the problem is gone.

        I'm testing with bigger tables to be sure.

        I don't think the fix is all in though, or, there is another issue lurking. Last night at least I found two lone edits from an old instance of a table shining through doing getClosest against new table (manifestation in MR job is the log snippet posted at the start of this issue).

        I'm testing by running sizeable imports – 4 to 32M – on our little cluster of 4 machines over and over again. I upload, then delete the table. I then stop hbase, memcache is flushed out to FileSystem (Issue seems to be related to travesal of many store files in FileSystem). Is on the subsequent import where I'd issues.

        I'm having trouble reliably replicating the errant edits this morning. Testing works most of the time.

        Repair of a table with old edits is tough too. I can't scan using an old timestamp to find the errant edits. I ain't sure why; my guess is as in how getFull works, the presence of newer deletes can overshadow older shells. If we knew the HStoreKey of the errant edits, a repair would just be a case of adding deletes against their coordinates but discovering the errant edit coordinates is tough; up to this I've only been able to do it by adding in super-intrusive logging into the getClosestAtOrBefore calculation or by sort/joins on the output of the iterate script above.

        Show
        stack added a comment - At first it was easy to provoke the issue. Two regions was sufficient. Now with the J-D patch in place, things are better for sure. On small tables the problem is gone. I'm testing with bigger tables to be sure. I don't think the fix is all in though, or, there is another issue lurking. Last night at least I found two lone edits from an old instance of a table shining through doing getClosest against new table (manifestation in MR job is the log snippet posted at the start of this issue). I'm testing by running sizeable imports – 4 to 32M – on our little cluster of 4 machines over and over again. I upload, then delete the table. I then stop hbase, memcache is flushed out to FileSystem (Issue seems to be related to travesal of many store files in FileSystem). Is on the subsequent import where I'd issues. I'm having trouble reliably replicating the errant edits this morning. Testing works most of the time. Repair of a table with old edits is tough too. I can't scan using an old timestamp to find the errant edits. I ain't sure why; my guess is as in how getFull works, the presence of newer deletes can overshadow older shells. If we knew the HStoreKey of the errant edits, a repair would just be a case of adding deletes against their coordinates but discovering the errant edit coordinates is tough; up to this I've only been able to do it by adding in super-intrusive logging into the getClosestAtOrBefore calculation or by sort/joins on the output of the iterate script above.
        Hide
        stack added a comment -

        Ok. I don't have enough confidence in the patch to commit it at this time though it improves . My most recent test failed in an odd way; meta shows a region deployed on a particular server but if I visit it, its not there. It could be the above patch though I doubt it.

        I'm gone for a week starting tomorrow. Will address this issue again then, unless anyone digs in meantime (One change to J-D patch I'd make is doing a 'new HStoreKey(key)' when adding to the deletes list).

        At a minimum we need to add hbase sufficient API such that we can build repair tools to fix this conditiion should it arise in the field; table becomes inoperable and no means of figuring which are the errant keys to delete.

        Show
        stack added a comment - Ok. I don't have enough confidence in the patch to commit it at this time though it improves . My most recent test failed in an odd way; meta shows a region deployed on a particular server but if I visit it, its not there. It could be the above patch though I doubt it. I'm gone for a week starting tomorrow. Will address this issue again then, unless anyone digs in meantime (One change to J-D patch I'd make is doing a 'new HStoreKey(key)' when adding to the deletes list). At a minimum we need to add hbase sufficient API such that we can build repair tools to fix this conditiion should it arise in the field; table becomes inoperable and no means of figuring which are the errant keys to delete.
        Hide
        stack added a comment -

        Here is J-Ds patch with addition of the above referenced 'new HStoreKey(deletekey)'.

        Show
        stack added a comment - Here is J-Ds patch with addition of the above referenced 'new HStoreKey(deletekey)'.
        Hide
        stack added a comment -

        Its looking like tables are not being fully deleted – which would seem to be an issue distinct from the items fixed by the J-D patch above.

        After a delete, I shut hbase down so all of the edits are out in store files. I then run the above iteratemeta.rb script to get the content of .META., sort the result and then with another hacked script, print out cells that do not have a corresponding delete record. Its always startcode and server cells that are missing their deletes. Trying to figure why these are not getting delete cell overrides.

        Show
        stack added a comment - Its looking like tables are not being fully deleted – which would seem to be an issue distinct from the items fixed by the J-D patch above. After a delete, I shut hbase down so all of the edits are out in store files. I then run the above iteratemeta.rb script to get the content of .META., sort the result and then with another hacked script, print out cells that do not have a corresponding delete record. Its always startcode and server cells that are missing their deletes. Trying to figure why these are not getting delete cell overrides.
        Hide
        stack added a comment -

        Ok, so figured the bug I was seeing AFTER application of above J-D patch. I thought the bug was more of what J-D's patch was supposed to be fixing but its something else, something equally as ugly.

        Here is how I consistently generated the problem with J-D's patch in place:

        Fill a table.
        Stop hbase so files are flushed.
        Start hbase.
        Remove table (disable/drop).
        Stop hbase so again flushed to filesystem.
        Then look at the content of the .META. using above iteratemeta script.
        Sort -u output then run below 'check' script to match delete and non-delete cells
        

        Here's the script:

                                                                                                                                                                                                                                     1,1            All
        #!/usr/bin/env ruby
        # Take on STDIN, sorted and uniqued output of the iteratemeta.rb
        lastline = nil
        for line in STDIN
          if line =~ /(.*)\s+d$/
            if lastline != nil
              puts lastline unless lastline.eql?($1)
            end
            lastline = nil
          else
            puts lastline unless lastline == nil
            lastline = line.rstrip
          end
        end
        

        Was finding a few keys that should have had overshadowing deletes but the deletes were not present. (If I attempted refilling table, eventually, we'd fail with the 'empty HRI' complaint).

        I thought the fail was because the above J-D patch was incomplete.

        Turns out its a problem in compactions. We see it since the compaction algorithm changed.

        Here is what is happening.

        Max versions by default in '.META.' table is 1. Version check looks at row and column component of a HStoreKey only: i.e. not at timestamp. info:serverstartcode and info:server are edited everytime we startup and when we offline (disable) and delete. The delete cell has same timestamp as the cell it would delete; i.e. it'll usually be older than offlining update.

        So, with our new smart compaction, we do not do every file when we compact. What I was seeing after the restarts was that the two most recent files would have been compacted on final restart. We'd discard the delete cell that was not the offlining of info:server and info:serverstartcode. The original cell would be back in the biggest and oldest storefile. It would on occasion get a chance to come through doing getClosestAtOrBefore.

        So, I think the fix is that we cannot remove anything compacting, not until we do a major compaction when all files are in play. Will talk to Billy over in hbase-834 about it. Meantime, will apply the J-D patch and close this issue.

        Show
        stack added a comment - Ok, so figured the bug I was seeing AFTER application of above J-D patch. I thought the bug was more of what J-D's patch was supposed to be fixing but its something else, something equally as ugly. Here is how I consistently generated the problem with J-D's patch in place: Fill a table. Stop hbase so files are flushed. Start hbase. Remove table (disable/drop). Stop hbase so again flushed to filesystem. Then look at the content of the .META. using above iteratemeta script. Sort -u output then run below 'check' script to match delete and non-delete cells Here's the script: 1,1 All #!/usr/bin/env ruby # Take on STDIN, sorted and uniqued output of the iteratemeta.rb lastline = nil for line in STDIN if line =~ /(.*)\s+d$/ if lastline != nil puts lastline unless lastline.eql?($1) end lastline = nil else puts lastline unless lastline == nil lastline = line.rstrip end end Was finding a few keys that should have had overshadowing deletes but the deletes were not present. (If I attempted refilling table, eventually, we'd fail with the 'empty HRI' complaint). I thought the fail was because the above J-D patch was incomplete. Turns out its a problem in compactions. We see it since the compaction algorithm changed. Here is what is happening. Max versions by default in '.META.' table is 1 . Version check looks at row and column component of a HStoreKey only: i.e. not at timestamp. info:serverstartcode and info:server are edited everytime we startup and when we offline (disable) and delete. The delete cell has same timestamp as the cell it would delete; i.e. it'll usually be older than offlining update. So, with our new smart compaction, we do not do every file when we compact. What I was seeing after the restarts was that the two most recent files would have been compacted on final restart. We'd discard the delete cell that was not the offlining of info:server and info:serverstartcode. The original cell would be back in the biggest and oldest storefile. It would on occasion get a chance to come through doing getClosestAtOrBefore. So, I think the fix is that we cannot remove anything compacting, not until we do a major compaction when all files are in play. Will talk to Billy over in hbase-834 about it. Meantime, will apply the J-D patch and close this issue.
        Hide
        stack added a comment -

        Committed branch and trunk. Off to talk to Billy about fixing compactions.

        Show
        stack added a comment - Committed branch and trunk. Off to talk to Billy about fixing compactions.

          People

          • Assignee:
            stack
            Reporter:
            stack
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development