HBase
  1. HBase
  2. HBASE-2819

hbck should have the ability to repair basic problems

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.90.0
    • Component/s: scripts
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Right now, the hbck utility can detect issues with region deployment but can't fix them.

      It should be able to handle basic things like closing one side of a double assignment, re-adding something to META, etc.

      1. HBASE-2819.patch
        52 kB
        Alexander Georgiev
      2. 2819-v10.txt
        47 kB
        stack
      3. 2819-v11.txt
        50 kB
        stack
      4. 2819-v12.txt
        54 kB
        stack
      5. 2819-addendum.txt
        6 kB
        stack
      6. 2819-v13.txt
        59 kB
        stack
      7. 2819-v14.txt
        67 kB
        stack

        Issue Links

          Activity

          Hide
          Nicolas Spiegelberg added a comment -

          Alexander is working on this problem for us. We're planning on a repair tool for: region double-assignment + unassigned regions

          Show
          Nicolas Spiegelberg added a comment - Alexander is working on this problem for us. We're planning on a repair tool for: region double-assignment + unassigned regions
          Hide
          Alexander Georgiev added a comment -

          This patch is for the 0.89 branch, adding functionality to HBase FSCK (HBCK) to fix some of the common issues that may arise while running HBase.
          New features are:

          • Added -fix option, which tries to repair duplicate assignments or unassigned regions in META.
          • Check if META and ROOT are good before starting the usual check process.
          • Check table consistency - if there are missing or overlapping regions
          • Added option for quiet run, which doesn't print most of the information
          • Added printing of summary of the check - number of tables, per-table number of regions, which tables are consistent and which are not and so on.

          It uses a RPC call which is no longer available after the master rewrite, so it might take some time to integrate. We are currently working on that.

          Show
          Alexander Georgiev added a comment - This patch is for the 0.89 branch, adding functionality to HBase FSCK (HBCK) to fix some of the common issues that may arise while running HBase. New features are: Added -fix option, which tries to repair duplicate assignments or unassigned regions in META. Check if META and ROOT are good before starting the usual check process. Check table consistency - if there are missing or overlapping regions Added option for quiet run, which doesn't print most of the information Added printing of summary of the check - number of tables, per-table number of regions, which tables are consistent and which are not and so on. It uses a RPC call which is no longer available after the master rewrite, so it might take some time to integrate. We are currently working on that.
          Hide
          stack added a comment -

          @Alexander I took a look at the patch. It looks great. I wouldn't sweat the integartion with new master. I can do it. It shouldn't take long (some of the new RPCs don't make sense in new master context while others added by this patch are present in new master).

          Show
          stack added a comment - @Alexander I took a look at the patch. It looks great. I wouldn't sweat the integartion with new master. I can do it. It shouldn't take long (some of the new RPCs don't make sense in new master context while others added by this patch are present in new master).
          Hide
          stack added a comment -

          Marking patch available... needs to be updated for TRUNK.

          Show
          stack added a comment - Marking patch available... needs to be updated for TRUNK.
          Hide
          Jonathan Gray added a comment -

          I spent some time putting this onto trunk. The repair stuff that we added to work on the old master doesn't make much sense anymore on the new master. For example, it should be impossible for a region to just go unassigned with no trace of how it happened. If it doesn't open properly there is a ZK node.

          This is not to say we don't need hbck with the new master. I just don't think we know what issues we'll run into, but they'll certainly be different. I could imagine having RIT nodes stuck in weird states and things like that.

          Should I put up a patch with what I have against trunk? It has some basic versions of the same repair stuff that worked on the old master, but like I said, may not make much sense.

          And then we can open a new jira for 0.92 to make hbck even gooder?

          Show
          Jonathan Gray added a comment - I spent some time putting this onto trunk. The repair stuff that we added to work on the old master doesn't make much sense anymore on the new master. For example, it should be impossible for a region to just go unassigned with no trace of how it happened. If it doesn't open properly there is a ZK node. This is not to say we don't need hbck with the new master. I just don't think we know what issues we'll run into, but they'll certainly be different. I could imagine having RIT nodes stuck in weird states and things like that. Should I put up a patch with what I have against trunk? It has some basic versions of the same repair stuff that worked on the old master, but like I said, may not make much sense. And then we can open a new jira for 0.92 to make hbck even gooder?
          Hide
          stack added a comment -

          We need a working hbck in 0.90... It can be read-only. If we don't know the issues yet, then yeah, can't have it do fixup...
          but yeah, get it in.

          Show
          stack added a comment - We need a working hbck in 0.90... It can be read-only. If we don't know the issues yet, then yeah, can't have it do fixup... but yeah, get it in.
          Hide
          Jonathan Gray added a comment -

          Will work on getting something up on RB tonight.

          Show
          Jonathan Gray added a comment - Will work on getting something up on RB tonight.
          Hide
          Jonathan Gray added a comment -

          Canceling current patch. Working on the remix.

          Show
          Jonathan Gray added a comment - Canceling current patch. Working on the remix.
          Hide
          Jonathan Gray added a comment -

          Latest patch up for review: https://review.cloudera.org/r/1036

          Show
          Jonathan Gray added a comment - Latest patch up for review: https://review.cloudera.org/r/1036
          Hide
          stack added a comment -

          Minor fix on top of patch posted to RB. Not done yet. Will post to RB when finished.

          Show
          stack added a comment - Minor fix on top of patch posted to RB. Not done yet. Will post to RB when finished.
          Hide
          stack added a comment -

          Minor fixes.

          Show
          stack added a comment - Minor fixes.
          Hide
          stack added a comment -

          Something to add to this patch – being able to deal with empty cells in .META. HBCK should fix these up.

          Show
          stack added a comment - Something to add to this patch – being able to deal with empty cells in .META. HBCK should fix these up.
          Hide
          stack added a comment -

          Found a few issues in RPC trying to make a basic unit tests where I run hbck in unit test context. The QoS stuff in HRegionServer presumed all invocations had an argument (getOnlineRegions has no param) and then getOnlineRegions returned a NavigableSet which we can't handle in HBaseObjectWritable.

          Upped the RPC version number again. This probably means REST will fail again up on hudson – because I believe there is an old server running across tests – but will deal with that in a different issue.

          Fixed HBaseFsck so it scans ROOT... was broke the way it was written.

          HBaseAdmin has handling of null HRegionInfo.

          All of above based on the last patch Jon posted to review.hbase.org.

          Show
          stack added a comment - Found a few issues in RPC trying to make a basic unit tests where I run hbck in unit test context. The QoS stuff in HRegionServer presumed all invocations had an argument (getOnlineRegions has no param) and then getOnlineRegions returned a NavigableSet which we can't handle in HBaseObjectWritable. Upped the RPC version number again. This probably means REST will fail again up on hudson – because I believe there is an old server running across tests – but will deal with that in a different issue. Fixed HBaseFsck so it scans ROOT ... was broke the way it was written. HBaseAdmin has handling of null HRegionInfo. All of above based on the last patch Jon posted to review.hbase.org.
          Hide
          stack added a comment -

          I committed v14 of this patch. It basically works. Regards what this issue is about, fixing breakage, well, thats a bit tougher. The original patch fixed issues in old master. Then this issue was all about update of hbck and of the attached patch to work with new master – the issue was hijacked. I'm going to close this issue on application of v14. Will open new issues to add fixup for new master (we're not too sure on how it breaks at mo. so how-to-fix is still to be worked out).

          I need this patch in place playing with enable/disable of big tables.

          Show
          stack added a comment - I committed v14 of this patch. It basically works. Regards what this issue is about, fixing breakage, well, thats a bit tougher. The original patch fixed issues in old master. Then this issue was all about update of hbck and of the attached patch to work with new master – the issue was hijacked. I'm going to close this issue on application of v14. Will open new issues to add fixup for new master (we're not too sure on how it breaks at mo. so how-to-fix is still to be worked out). I need this patch in place playing with enable/disable of big tables.

            People

            • Assignee:
              stack
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development