HBase
  1. HBase
  2. HBASE-4058

Extend TestHBaseFsck with a complete .META. recovery scenario

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: 0.94.0
    • Component/s: hbck
    • Labels:
      None

      Description

      We should have a unit test that launches a minicluster and constructs a few tables, then deletes META files on disk, then bounces the master, then recovers the result with HBCK. Perhaps it is possible to extend TestHBaseFsck to do this.

        Issue Links

          Activity

          Hide
          stack added a comment -

          This issue subsumed by Jon's work on uber-hbck.

          Show
          stack added a comment - This issue subsumed by Jon's work on uber-hbck.
          Hide
          Lars Hofhansl added a comment -

          @Stack: Are you still planning this for 0.94?

          Show
          Lars Hofhansl added a comment - @Stack: Are you still planning this for 0.94?
          Hide
          stack added a comment -

          Assigning myself. This is pretty critical one. The online merges I just added back to 0.92 is prereq for this one.

          Show
          stack added a comment - Assigning myself. This is pretty critical one. The online merges I just added back to 0.92 is prereq for this one.
          Hide
          Ted Yu added a comment -

          Moving to 0.94 since there is no owner for this issue at the moment.

          Show
          Ted Yu added a comment - Moving to 0.94 since there is no owner for this issue at the moment.
          Hide
          stack added a comment -

          So I see our fixup tool first running an evaluation on the state of meta and then it would offer admin choices. For overlapping regions, we should make it so hbck will run online merge of the regions that span the broken key area. If holes, offer to plug the hole. We should also offer the option to rebuild from the fs per table or all under hbase.rootdir.

          Show
          stack added a comment - So I see our fixup tool first running an evaluation on the state of meta and then it would offer admin choices. For overlapping regions, we should make it so hbck will run online merge of the regions that span the broken key area. If holes, offer to plug the hole. We should also offer the option to rebuild from the fs per table or all under hbase.rootdir.
          Hide
          stack added a comment -

          Dan Harvey who is still on 0.20.x had a similar issue this month. He added four new servers to his cluster. These new servers were not resolving properly. What we were seeing is that on startup, I believe, these new servers would be assigned their portion of the regions on checkin. Then, the basescanner would run – its 0.20.x hbase – and it would not recognize the address the new servers were writing .META. and it would then think the regions unassigned and would assign them elsewhere. So, we have double-assignment and at same time there was splitting and compactions running. His .META. had holes and overlaps.

          In his case, not all tables were honked. Just the big ones. I wonder if an improved add_table.rb would work in this case; i.e. do the same rewrite of the .META. content for a single table based off the content in the filesystem rather than trying fix up on .META. table.

          Let me try adding add_table.rb to hbck. Let me add option of running per table and then a global, restore all tables.

          Dan sent me the .META. dir content. It looks like this:

          -rw-r--r--@ 1 Stack  staff         0 Jul  7 08:26 281906331022358506
          -rw-r--r--@ 1 Stack  staff  94283152 Jul  7 08:26 5233066973300534672
          -rw-r--r--@ 1 Stack  staff         0 Jul  7 08:26 6803125877105432645
          -rw-r--r--@ 1 Stack  staff         0 Jul  7 08:26 8650632001596730954
          

          i.e. three zero-length files. I wonder how these were written (I asked him for a dir listing from actual cluster).

          Show
          stack added a comment - Dan Harvey who is still on 0.20.x had a similar issue this month. He added four new servers to his cluster. These new servers were not resolving properly. What we were seeing is that on startup, I believe, these new servers would be assigned their portion of the regions on checkin. Then, the basescanner would run – its 0.20.x hbase – and it would not recognize the address the new servers were writing .META. and it would then think the regions unassigned and would assign them elsewhere. So, we have double-assignment and at same time there was splitting and compactions running. His .META. had holes and overlaps. In his case, not all tables were honked. Just the big ones. I wonder if an improved add_table.rb would work in this case; i.e. do the same rewrite of the .META. content for a single table based off the content in the filesystem rather than trying fix up on .META. table. Let me try adding add_table.rb to hbck. Let me add option of running per table and then a global, restore all tables. Dan sent me the .META. dir content. It looks like this: -rw-r--r--@ 1 Stack staff 0 Jul 7 08:26 281906331022358506 -rw-r--r--@ 1 Stack staff 94283152 Jul 7 08:26 5233066973300534672 -rw-r--r--@ 1 Stack staff 0 Jul 7 08:26 6803125877105432645 -rw-r--r--@ 1 Stack staff 0 Jul 7 08:26 8650632001596730954 i.e. three zero-length files. I wonder how these were written (I asked him for a dir listing from actual cluster).
          Hide
          stack added a comment -

          I took a look at the logs Wayne posted. The master shows a few regionservers losing their leases and its having trouble connecting to a particular server. The regionserver snippet posted shows a regionserver aborting because it can't roll its wal log. It gets an EOFE. The datanode snippet shows connection refused trying to connect to the same server (130) that the master is trying to contact (NN?).

          Its hard to tell much from snippets posted.

          Show
          stack added a comment - I took a look at the logs Wayne posted. The master shows a few regionservers losing their leases and its having trouble connecting to a particular server. The regionserver snippet posted shows a regionserver aborting because it can't roll its wal log. It gets an EOFE. The datanode snippet shows connection refused trying to connect to the same server (130) that the master is trying to contact (NN?). Its hard to tell much from snippets posted.
          Hide
          stack added a comment -

          So, reading Wayne's blow-by-blow, he 'fix' his hdfs, he ran 'fsck -move' which moves corrupt files to /lost+found. I wonder how many of the 65 corrupt files found were from hbase and how many of these were from under .META. (65 corrupt files and 173 missing blocks.... thats a lot of 'missing' data). Assuming an extreme, that there missing blocks in .META., this would imply we need to be able to rebuild .META. by reading the filesystem content. It should be able to figure whats a daughter from whats a parent and it should write the .META. without overlaps and with holes plugged. Finally it should make some sort of report on the type of surgery effected listing put-aside regions that it could not make sense of.

          We currently don't have such a tool.

          Show
          stack added a comment - So, reading Wayne's blow-by-blow, he 'fix' his hdfs, he ran 'fsck -move' which moves corrupt files to /lost+found. I wonder how many of the 65 corrupt files found were from hbase and how many of these were from under .META. (65 corrupt files and 173 missing blocks.... thats a lot of 'missing' data). Assuming an extreme, that there missing blocks in .META., this would imply we need to be able to rebuild .META. by reading the filesystem content. It should be able to figure whats a daughter from whats a parent and it should write the .META. without overlaps and with holes plugged. Finally it should make some sort of report on the type of surgery effected listing put-aside regions that it could not make sense of. We currently don't have such a tool.
          Hide
          stack added a comment -

          Here is the thread that prompted this issue: http://search-hadoop.com/m/J27Y72CrGiD/%2522hbck+-fix%2522&subj=hbck+fix

          So, one though I had was rebuilding .META. from a scan of .META. with a timestamp behind the catastrophe. This is not going to be bullet-proof for the case where the .META. storefiles themselves have been damaged or lost.

          So, we need a new add_table type fixup. Wayne in the thread describes it as:

          Bugs and human error will bring on problems and nothing will
          ever change that, but not having tools to help recover out of the hole is
          where I think it is lacking...The hbase .META. table
          (and ROOT?) are the core how HBase manages things. If this gets out of
          whack all is lost...Something like a recovery mode that goes through and
          sees what is out there and rebuilds the meta based on it. With corrupted
          data and lost regions etc. etc. like any relational database there should be
          one or more recovery modes that goes through everything and rebuilds it
          consistently. Data may be lost but at least the cluster will be left in a
          100% consistent/clean state. Manual editing of .META. is not something
          anyone should do (especially me). It is prone to human error...it should be
          easy to have well tested recover tools that can do the hard work for us.

          Show
          stack added a comment - Here is the thread that prompted this issue: http://search-hadoop.com/m/J27Y72CrGiD/%2522hbck+-fix%2522&subj=hbck+fix So, one though I had was rebuilding .META. from a scan of .META. with a timestamp behind the catastrophe. This is not going to be bullet-proof for the case where the .META. storefiles themselves have been damaged or lost. So, we need a new add_table type fixup. Wayne in the thread describes it as: Bugs and human error will bring on problems and nothing will ever change that, but not having tools to help recover out of the hole is where I think it is lacking...The hbase .META. table (and ROOT ?) are the core how HBase manages things. If this gets out of whack all is lost...Something like a recovery mode that goes through and sees what is out there and rebuilds the meta based on it. With corrupted data and lost regions etc. etc. like any relational database there should be one or more recovery modes that goes through everything and rebuilds it consistently. Data may be lost but at least the cluster will be left in a 100% consistent/clean state. Manual editing of .META. is not something anyone should do (especially me). It is prone to human error...it should be easy to have well tested recover tools that can do the hard work for us.

            People

            • Assignee:
              stack
              Reporter:
              Andrew Purtell
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development