HBase
  1. HBase
  2. HBASE-4377

[hbck] Offline rebuild .META. from fs data only.

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.92.0
    • Fix Version/s: 0.90.5
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In a worst case situation, it may be helpful to have an offline .META. rebuilder that just looks at the file system's .regioninfos and rebuilds meta from scratch. Users could move bad regions out until there is a clean rebuild.

      It would likely fill in region split holes. Follow on work could given options to merge or select regions that overlap, or do online rebuilds.

      1. hbase-4377-trunk.v2.patch
        42 kB
        Jonathan Hsieh
      2. hbase-4377.trunk.v6.patch
        52 kB
        Jonathan Hsieh
      3. hbase-4377.trunk.v5.txt
        52 kB
        Ted Yu
      4. hbase-4377.trunk.v4.txt
        52 kB
        Sebastian Bauer
      5. hbase-4377.trunk.v3.txt
        51 kB
        Jonathan Hsieh
      6. hbase-4377.0.90.v6.patch
        51 kB
        Jonathan Hsieh
      7. EXT_ATU_05f84d32cbc0bdabf00e00bc2f3570f0.regioninfo
        2 kB
        Sebastian Bauer
      8. EXT_AC.regioninfo
        0.7 kB
        Sebastian Bauer
      9. 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.trunk.v3.patch
        53 kB
        Jonathan Hsieh
      10. 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.patch
        40 kB
        Jonathan Hsieh
      11. 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data.0.92.v2.patch
        52 kB
        Sebastian Bauer
      12. 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data.0.92.v1.patch
        51 kB
        Sebastian Bauer
      13. 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.0.90-v4.patch
        52 kB
        Jonathan Hsieh
      14. 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.0.90.v3.patch
        51 kB
        Jonathan Hsieh

        Issue Links

          Activity

          Hide
          Jonathan Hsieh added a comment -

          Todd too a quick look and mentioned that "fs.defaultFS" is a Hadoop 0.21+'ism. On a 0.20.x release nothing really happens. Any concerns about this on the 0.90 backport?

          
          

          + public static void main(String[] args) throws Exception {
          +
          + // create a fsck object
          + Configuration conf = HBaseConfiguration.create();
          + conf.set("fs.defaultFS", conf.get(HConstants.HBASE_DIR));
          + HBaseFsck fsck = new HBaseFsck(conf);
          +
          +
          {code{

          Show
          Jonathan Hsieh added a comment - Todd too a quick look and mentioned that "fs.defaultFS" is a Hadoop 0.21+'ism. On a 0.20.x release nothing really happens. Any concerns about this on the 0.90 backport? + public static void main(String[] args) throws Exception { + + // create a fsck object + Configuration conf = HBaseConfiguration.create(); + conf.set("fs.defaultFS", conf.get(HConstants.HBASE_DIR)); + HBaseFsck fsck = new HBaseFsck(conf); + + {code{
          Hide
          Jonathan Hsieh added a comment -

          @mingjian

          If there was a split that didn't complete cleanly, a parent region with daughters should look like an overlap. The tool will tell you where these overlaps are.

          One way to fix the problem is to keep the parent region and then move or remove the daughter regions from hdfs. Since it is in the middle of a split, the parent should have all the data. Alternately, you could copy the store files from the daughters into the dir of the parent and then run the offline rebuilder.

          I plan on writing a blog post and hopefully adding to the book on how to fix these problems.

          Show
          Jonathan Hsieh added a comment - @mingjian If there was a split that didn't complete cleanly, a parent region with daughters should look like an overlap. The tool will tell you where these overlaps are. One way to fix the problem is to keep the parent region and then move or remove the daughter regions from hdfs. Since it is in the middle of a split, the parent should have all the data. Alternately, you could copy the store files from the daughters into the dir of the parent and then run the offline rebuilder. I plan on writing a blog post and hopefully adding to the book on how to fix these problems.
          Hide
          mingjian added a comment -

          @Jonathan If a region is splitting how do we fix it without onlined parent and daughters?

          Show
          mingjian added a comment - @Jonathan If a region is splitting how do we fix it without onlined parent and daughters?
          Hide
          Hudson added a comment -

          Integrated in HBase-0.92 #98 (See https://builds.apache.org/job/HBase-0.92/98/)
          HBASE-4377 [hbck] Offline rebuild .META. from fs data only
          (Jonathan Hsieh)
          HBASE-4377 [hbck] Offline rebuild .META. from fs data only
          (Jonathan Hsieh) (detail)

          tedyu :
          Files :

          • /hbase/branches/0.92/CHANGES.txt

          tedyu :
          Files :

          • /hbase/branches/0.92/CHANGES.txt
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/hbck
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java
          • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java
          • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/hbck
          • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java
          • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRebuildTestCore.java
          • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java
          • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java
          • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java
          Show
          Hudson added a comment - Integrated in HBase-0.92 #98 (See https://builds.apache.org/job/HBase-0.92/98/ ) HBASE-4377 [hbck] Offline rebuild .META. from fs data only (Jonathan Hsieh) HBASE-4377 [hbck] Offline rebuild .META. from fs data only (Jonathan Hsieh) (detail) tedyu : Files : /hbase/branches/0.92/CHANGES.txt tedyu : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/hbck /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/hbck /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRebuildTestCore.java /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java
          Hide
          Ted Yu added a comment -

          Integrated to 0.90, 0.92 and TRUNK.

          Thanks for the patch Jonathan.

          Show
          Ted Yu added a comment - Integrated to 0.90, 0.92 and TRUNK. Thanks for the patch Jonathan.
          Hide
          Ted Yu added a comment -

          The failed tests were due to 'Too many open files'.

          Show
          Ted Yu added a comment - The failed tests were due to 'Too many open files'.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12501672/hbase-4377.trunk.v6.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 13 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -166 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.master.TestDistributedLogSplitting
          org.apache.hadoop.hbase.master.TestMasterFailover

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/114//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/114//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/114//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12501672/hbase-4377.trunk.v6.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 13 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -166 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestDistributedLogSplitting org.apache.hadoop.hbase.master.TestMasterFailover Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/114//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/114//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/114//console This message is automatically generated.
          Hide
          Jonathan Hsieh added a comment -

          Generated new patches using --no-prefix so robot can test.

          Show
          Jonathan Hsieh added a comment - Generated new patches using --no-prefix so robot can test.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12501666/hbase-4377.trunk.v6.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 13 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/113//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12501666/hbase-4377.trunk.v6.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 13 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/113//console This message is automatically generated.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2287/
          -----------------------------------------------------------

          (Updated 2011-10-31 21:03:52.791775)

          Review request for hbase and Ted Yu.

          Changes
          -------

          Updated to address stacks comments. I believe Seb's patch wasn't necessary in 0.90 since that code came in on HBASE-451 which isn't on the 0.90 branch.

          Summary
          -------

          Backport to 0.90

          commit 89862b73c6358e27220b87b0362599d86ab0fe4a
          Author: Jonathan Hsieh <jon@cloudera.com>
          Date: Wed Sep 28 10:18:11 2011 -0700

          HBASE-4377 [hbck] Offline rebuild .META. from fs data only

          This addresses bug HBASE-4377.
          https://issues.apache.org/jira/browse/HBASE-4377

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java ef246c3
          src/main/java/org/apache/hadoop/hbase/util/Bytes.java 13ad026
          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java e0bd77e
          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a981f72
          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java bd3b2f3
          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRebuildTestCore.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2287/diff

          Testing
          -------

          Note, the assertion test result is different in the failure cases due to HBASE-451 changes. (0.90 returns 0 tables since it does a meta scan on empty meta, trunk branch looks at hdfs dirs, and returns 1).

          This version passes after HBASE-4508 (backport HBASE-3777 to 0.90 branch) is applied.

          I believe if that patch is not applied, I could modify the test code to force some explicit HConnection deletions.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2287/ ----------------------------------------------------------- (Updated 2011-10-31 21:03:52.791775) Review request for hbase and Ted Yu. Changes ------- Updated to address stacks comments. I believe Seb's patch wasn't necessary in 0.90 since that code came in on HBASE-451 which isn't on the 0.90 branch. Summary ------- Backport to 0.90 commit 89862b73c6358e27220b87b0362599d86ab0fe4a Author: Jonathan Hsieh <jon@cloudera.com> Date: Wed Sep 28 10:18:11 2011 -0700 HBASE-4377 [hbck] Offline rebuild .META. from fs data only This addresses bug HBASE-4377 . https://issues.apache.org/jira/browse/HBASE-4377 Diffs (updated) src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java ef246c3 src/main/java/org/apache/hadoop/hbase/util/Bytes.java 13ad026 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java e0bd77e src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a981f72 src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java bd3b2f3 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRebuildTestCore.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java PRE-CREATION Diff: https://reviews.apache.org/r/2287/diff Testing ------- Note, the assertion test result is different in the failure cases due to HBASE-451 changes. (0.90 returns 0 tables since it does a meta scan on empty meta, trunk branch looks at hdfs dirs, and returns 1). This version passes after HBASE-4508 (backport HBASE-3777 to 0.90 branch) is applied. I believe if that patch is not applied, I could modify the test code to force some explicit HConnection deletions. Thanks, jmhsieh
          Hide
          Jonathan Hsieh added a comment -

          Updated patches addressing most of Stack's comments.

          Show
          Jonathan Hsieh added a comment - Updated patches addressing most of Stack's comments.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2126/
          -----------------------------------------------------------

          (Updated 2011-10-31 20:55:03.327832)

          Review request for hbase, Michael Stack and Andrew Purtell.

          Changes
          -------

          Addressed Stack's comments

          Summary
          -------

          commit fbf82c17be6b3ecca5a981f5270cf93aac26e479
          Author: Jonathan Hsieh <jon@cloudera.com>
          Date: Wed Sep 28 10:18:11 2011 -0700

          HBASE-4377 [hbck] Offline rebuild .META. from fs data only

          This patch rebuilds a new .META. table by reading all the .regioninfo files in the hbase main directory. It depends on the yet to be committed HBASE-4515 (either my verison or Gary's version), HBASE-4509, and HBASE-4506.

          Some follow on work includes backporting to 0.90, auto-patching true holes, and adding documentation.

          This addresses bug HBASE-4377.
          https://issues.apache.org/jira/browse/HBASE-4377

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/HRegionInfo.java ae068c7
          src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java 46ca765
          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 9e9e07b
          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java ca6dd4b
          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRebuildTestCore.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2126/diff

          Testing
          -------

          An earlier version of this code (backported to 0.90) was used to diagnose and repair a cluster that had 2700 inconsistencies due to failed splits (the cluster was underprovisioned memory-wise, and on restart, the some regions would start splitting and then die due to oome's). This was not actually used on a live cluster – it was used to reconstruct a .META. from .regioninfo's laid out in hbase's directory structure.

          Note also that this is not an automatic fix – whenever any problems are found, this bails out but dumps info on holes, suggests some fixes, and displays sets of overlapping regions. It is up to the user to merge regions, to create .regioninfo files to plug hole, and to do any potential data loosing operations.

          The tests demonstrate current expected behavior – rebuild meta if things line up, and fail without making modifications if holes or overlaps exist.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2126/ ----------------------------------------------------------- (Updated 2011-10-31 20:55:03.327832) Review request for hbase, Michael Stack and Andrew Purtell. Changes ------- Addressed Stack's comments Summary ------- commit fbf82c17be6b3ecca5a981f5270cf93aac26e479 Author: Jonathan Hsieh <jon@cloudera.com> Date: Wed Sep 28 10:18:11 2011 -0700 HBASE-4377 [hbck] Offline rebuild .META. from fs data only This patch rebuilds a new .META. table by reading all the .regioninfo files in the hbase main directory. It depends on the yet to be committed HBASE-4515 (either my verison or Gary's version), HBASE-4509 , and HBASE-4506 . Some follow on work includes backporting to 0.90, auto-patching true holes, and adding documentation. This addresses bug HBASE-4377 . https://issues.apache.org/jira/browse/HBASE-4377 Diffs (updated) src/main/java/org/apache/hadoop/hbase/HRegionInfo.java ae068c7 src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java 46ca765 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 9e9e07b src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java ca6dd4b src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRebuildTestCore.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java PRE-CREATION Diff: https://reviews.apache.org/r/2126/diff Testing ------- An earlier version of this code (backported to 0.90) was used to diagnose and repair a cluster that had 2700 inconsistencies due to failed splits (the cluster was underprovisioned memory-wise, and on restart, the some regions would start splitting and then die due to oome's). This was not actually used on a live cluster – it was used to reconstruct a .META. from .regioninfo's laid out in hbase's directory structure. Note also that this is not an automatic fix – whenever any problems are found, this bails out but dumps info on holes, suggests some fixes, and displays sets of overlapping regions. It is up to the user to merge regions, to create .regioninfo files to plug hole, and to do any potential data loosing operations. The tests demonstrate current expected behavior – rebuild meta if things line up, and fail without making modifications if holes or overlaps exist. Thanks, jmhsieh
          Hide
          Jonathan Hsieh added a comment -

          Addressed most of stack's comments:

          • Removed try-catch from deleteTable.
          • Updated comment related issues.
          • Renamed splits in populateTable to values (splits is for region splits, the latter is for creating values.)
          • Have separate patch for filling in holes.
          • Removed setTableName and added internal check code to getTableName().
          • Refactored the sidelining function to check rename returns.

          I'm going to punt on these two.

          • HRegion creation was done manually because the version that existed attempted to open stores and I didn't want or need that.
          • MetaReader was not used because at the time I was trying to figure out the different table existence semantics in 0.90 vs trunk.
          Show
          Jonathan Hsieh added a comment - Addressed most of stack's comments: Removed try-catch from deleteTable. Updated comment related issues. Renamed splits in populateTable to values (splits is for region splits, the latter is for creating values.) Have separate patch for filling in holes. Removed setTableName and added internal check code to getTableName(). Refactored the sidelining function to check rename returns. I'm going to punt on these two. HRegion creation was done manually because the version that existed attempted to open stores and I didn't want or need that. MetaReader was not used because at the time I was trying to figure out the different table existence semantics in 0.90 vs trunk.
          Hide
          Jonathan Hsieh added a comment -

          i'll do an update tomorrow or monday to
          address the nits and get the 0.90 version caught up again.

          Show
          Jonathan Hsieh added a comment - i'll do an update tomorrow or monday to address the nits and get the 0.90 version caught up again.
          Hide
          Ted Yu added a comment -

          The patch for 0.90 doesn't cleanly compile yet.
          We need to produce a clean patch for 0.90 and run test suite for it so that 0.90 can have this feature.

          Show
          Ted Yu added a comment - The patch for 0.90 doesn't cleanly compile yet. We need to produce a clean patch for 0.90 and run test suite for it so that 0.90 can have this feature.
          Hide
          stack added a comment -

          @Jon Sounds good.

          Do we want to make a v6 of these patch to address the minor comments above or do we want to commit this and do them in a different issue (The test fails in patch build are not because of v5).

          Show
          stack added a comment - @Jon Sounds good. Do we want to make a v6 of these patch to address the minor comments above or do we want to commit this and do them in a different issue (The test fails in patch build are not because of v5).
          Hide
          Jonathan Hsieh added a comment -

          @Stack

          I have a patch written that optionally handles filling in holes, but haven't polished it for review yet. I'll add it after this patch gets through. IIRC it adds this functionality to hbck and to the offline meta rebuilder.

          Show
          Jonathan Hsieh added a comment - @Stack I have a patch written that optionally handles filling in holes, but haven't polished it for review yet. I'll add it after this patch gets through. IIRC it adds this functionality to hbck and to the offline meta rebuilder.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12501458/hbase-4377.trunk.v5.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 18 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -166 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.master.TestDistributedLogSplitting
          org.apache.hadoop.hbase.master.TestMasterFailover

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/98//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/98//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/98//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12501458/hbase-4377.trunk.v5.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 18 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -166 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestDistributedLogSplitting org.apache.hadoop.hbase.master.TestMasterFailover Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/98//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/98//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/98//console This message is automatically generated.
          Hide
          stack added a comment -

          To reuse the code, we need to separate it into its own method.

          That would be a good thing I'd think.

          MetaReader.fullScanMetaAndPrint() doesn't return the number of rows in .META.

          Seems like a minor addition – otherwise, it does what the method in here does.

          Yes, catch IOE and check the rename return value I'd say (Not too long ago, a patch was added by an hdfs-er to check all boolean returns out of fs operations... we should try keep up the pattern).

          Good stuff.

          Show
          stack added a comment - To reuse the code, we need to separate it into its own method. That would be a good thing I'd think. MetaReader.fullScanMetaAndPrint() doesn't return the number of rows in .META. Seems like a minor addition – otherwise, it does what the method in here does. Yes, catch IOE and check the rename return value I'd say (Not too long ago, a patch was added by an hdfs-er to check all boolean returns out of fs operations... we should try keep up the pattern). Good stuff.
          Hide
          Ted Yu added a comment -

          The code in HRegion which creates region directory is scattered in createHRegion(), etc.
          To reuse the code, we need to separate it into its own method.

          MetaReader.fullScanMetaAndPrint() doesn't return the number of rows in .META.
          We can enhance it by returning the count.

          For fs.rename(), we end up calling DFSClient where the return value's javadoc says:

             * @return true if successful, or false if the old name does not exist
             * or if the new name already belongs to the namespace.
          

          We should add checking for return value from fs.rename(). But it seems catching IOException is useful defense against unexpected situation as well.

          Show
          Ted Yu added a comment - The code in HRegion which creates region directory is scattered in createHRegion(), etc. To reuse the code, we need to separate it into its own method. MetaReader.fullScanMetaAndPrint() doesn't return the number of rows in .META. We can enhance it by returning the count. For fs.rename(), we end up calling DFSClient where the return value's javadoc says: * @ return true if successful, or false if the old name does not exist * or if the new name already belongs to the namespace. We should add checking for return value from fs.rename(). But it seems catching IOException is useful defense against unexpected situation as well.
          Hide
          stack added a comment -

          We want this?

          +    } catch (Exception e) {
          +      // Do nothing.
          +    }
          

          Why not let the test fail if we can't delete the table?

          I see we do this deleteTable in a few places w/ above ignore of IOE (the deleteTable method is same in two places)

          Nit: Belwo is a little wonky. No biggie.

          +    HTableDescriptor[] htbls = // new HBaseAdmin(conf).listTables();
          +    TEST_UTIL.getHBaseAdmin().listTables();
          

          Nit: No biggie. Rewrite below if new patch made:

          + * This testing base class provides create a minicluster and testing table table
          + * and shutsdown the cluster afterwards. It provides methods to wipes out meta,
          + * inject errors into meta and the file system.
          

          This define is repeated:

          +  protected final static byte[][] splits = new byte[][] { Bytes.toBytes("A"),
          +      Bytes.toBytes("B"), Bytes.toBytes("C") };
          

          Then there is : + byte[] splits =

          { 'A', 'B', 'C', 'D' }

          ;

          Should we be looking for the missing region in the filesystem if we find a hole in meta before we go ahead and create a region to plug the hole? Do we do that?

          FYI, for future, I think there is utility in HRegion to do the below (with defines for .regioninfo) and for writing it (Maybe there is a reason you did the below manually?):

          +    Path p = new Path(rootDir + "/" + htd.getNameAsString(),
          +        hri.getEncodedName());
          +    fs.mkdirs(p);
          +    Path riPath = new Path(p, ".regioninfo");
          

          FYI, there is utility in MetaReader to do this;

          +  protected int scanMeta() throws IOException {
          +    int count = 0;
          +    HTable meta = new HTable(conf, HTableDescriptor.META_TABLEDESC.getName());
          +    ResultScanner scanner = meta.getScanner(new Scan());
          +    LOG.info("Table: " + Bytes.toString(meta.getTableName()));
          +    for (Result res : scanner) {
          +      LOG.info(Bytes.toString(res.getRow()));
          +      count++;
          +    }
          +    return count;
          +  }
          

          Do we need setTableName? Should below be moved into HRI?

          +      if (getTableName() == null || getTableName().length == 0) {
          +        byte [] newTableName = HRegionInfo.getTableName(this.getRegionName());
          +        LOG.debug(Bytes.toString(newTableName)+": .regioninfo doesn't have tableName value, but we are getting it from regionName :)");
          +        this.setTableName(newTableName);
          +      }
          

          This will be ok? We'll have perms to go here? If we don't we will just fail which should be fine.

          +    Path backupDir = new Path(rootDir.getParent(), rootDir.getName() + "-"
          +        + now);
          

          Next time, you could have made a method out of this and used it for meta and root passing in 'root' or 'meta' and backupRoot – its repeated code:

          +    if (fs.exists(root)) {
          +      fs.rename(root, backupRoot);
          +    } else {
          +      LOG.info("No previous -ROOT- exists.  Continuing.");
          +    }
          

          Should you test the result of fs.rename? It returns a boolean true if it succeeds and false if not?

          Thats enough for now.

          Show
          stack added a comment - We want this? + } catch (Exception e) { + // Do nothing. + } Why not let the test fail if we can't delete the table? I see we do this deleteTable in a few places w/ above ignore of IOE (the deleteTable method is same in two places) Nit: Belwo is a little wonky. No biggie. + HTableDescriptor[] htbls = // new HBaseAdmin(conf).listTables(); + TEST_UTIL.getHBaseAdmin().listTables(); Nit: No biggie. Rewrite below if new patch made: + * This testing base class provides create a minicluster and testing table table + * and shutsdown the cluster afterwards. It provides methods to wipes out meta, + * inject errors into meta and the file system. This define is repeated: + protected final static byte [][] splits = new byte [][] { Bytes.toBytes( "A" ), + Bytes.toBytes( "B" ), Bytes.toBytes( "C" ) }; Then there is : + byte[] splits = { 'A', 'B', 'C', 'D' } ; Should we be looking for the missing region in the filesystem if we find a hole in meta before we go ahead and create a region to plug the hole? Do we do that? FYI, for future, I think there is utility in HRegion to do the below (with defines for .regioninfo) and for writing it (Maybe there is a reason you did the below manually?): + Path p = new Path(rootDir + "/" + htd.getNameAsString(), + hri.getEncodedName()); + fs.mkdirs(p); + Path riPath = new Path(p, ".regioninfo" ); FYI, there is utility in MetaReader to do this; + protected int scanMeta() throws IOException { + int count = 0; + HTable meta = new HTable(conf, HTableDescriptor.META_TABLEDESC.getName()); + ResultScanner scanner = meta.getScanner( new Scan()); + LOG.info( "Table: " + Bytes.toString(meta.getTableName())); + for (Result res : scanner) { + LOG.info(Bytes.toString(res.getRow())); + count++; + } + return count; + } Do we need setTableName? Should below be moved into HRI? + if (getTableName() == null || getTableName().length == 0) { + byte [] newTableName = HRegionInfo.getTableName( this .getRegionName()); + LOG.debug(Bytes.toString(newTableName)+ ": .regioninfo doesn't have tableName value, but we are getting it from regionName :)" ); + this .setTableName(newTableName); + } This will be ok? We'll have perms to go here? If we don't we will just fail which should be fine. + Path backupDir = new Path(rootDir.getParent(), rootDir.getName() + "-" + + now); Next time, you could have made a method out of this and used it for meta and root passing in 'root' or 'meta' and backupRoot – its repeated code: + if (fs.exists(root)) { + fs.rename(root, backupRoot); + } else { + LOG.info( "No previous -ROOT- exists. Continuing." ); + } Should you test the result of fs.rename? It returns a boolean true if it succeeds and false if not? Thats enough for now.
          Hide
          Ted Yu added a comment -

          Submit for PreCommit build.

          Show
          Ted Yu added a comment - Submit for PreCommit build.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12501403/hbase-4377.trunk.v5.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 18 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -166 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.coprocessor.TestMasterObserver
          org.apache.hadoop.hbase.client.TestMultiParallel
          org.apache.hadoop.hbase.master.TestDefaultLoadBalancer
          org.apache.hadoop.hbase.TestRegionRebalancing
          org.apache.hadoop.hbase.master.TestMasterFailover
          org.apache.hadoop.hbase.master.TestDistributedLogSplitting
          org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithRemove

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/94//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/94//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/94//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12501403/hbase-4377.trunk.v5.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 18 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -166 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.coprocessor.TestMasterObserver org.apache.hadoop.hbase.client.TestMultiParallel org.apache.hadoop.hbase.master.TestDefaultLoadBalancer org.apache.hadoop.hbase.TestRegionRebalancing org.apache.hadoop.hbase.master.TestMasterFailover org.apache.hadoop.hbase.master.TestDistributedLogSplitting org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithRemove Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/94//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/94//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/94//console This message is automatically generated.
          Hide
          Ted Yu added a comment -

          hbase-4377.trunk.v5.txt didn't produce regression.
          The following isn't new:

          Tests in error: 
            testRegionTransitionOperations(org.apache.hadoop.hbase.coprocessor.TestMasterObserver): 9faf6fe48f36206644b7fd913cf7e229
          
          Show
          Ted Yu added a comment - hbase-4377.trunk.v5.txt didn't produce regression. The following isn't new: Tests in error: testRegionTransitionOperations(org.apache.hadoop.hbase.coprocessor.TestMasterObserver): 9faf6fe48f36206644b7fd913cf7e229
          Hide
          Ted Yu added a comment -

          Submit for PreCommit build.

          Show
          Ted Yu added a comment - Submit for PreCommit build.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12500832/EXT_ATU_05f84d32cbc0bdabf00e00bc2f3570f0.regioninfo
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/69//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12500832/EXT_ATU_05f84d32cbc0bdabf00e00bc2f3570f0.regioninfo against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/69//console This message is automatically generated.
          Hide
          Sebastian Bauer added a comment -

          If i remember both regioninfo cause problem with empty HRegionInfo.tableName. My test instance almost all time is running trunk, so many thinks could be broken

          Show
          Sebastian Bauer added a comment - If i remember both regioninfo cause problem with empty HRegionInfo.tableName. My test instance almost all time is running trunk, so many thinks could be broken
          Hide
          Sebastian Bauer added a comment -

          i think hbase-4377.trunk.v3.txt cannot comile because of removing doFsck(boolean) function and now we have doFsck(Configuration, boolean), so this patch correct this to. Apply clear to trunk and 0.92

          Show
          Sebastian Bauer added a comment - i think hbase-4377.trunk.v3.txt cannot comile because of removing doFsck(boolean) function and now we have doFsck(Configuration, boolean), so this patch correct this to. Apply clear to trunk and 0.92
          Hide
          Ted Yu added a comment -

          Some tests failed due to:

          Caused by: java.io.IOException: Too many open files
          	at sun.nio.ch.IOUtil.initPipe(Native Method)
          	at sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:49)
          	at sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:18)
          

          One test failure is tracked by HBASE-4675

          Show
          Ted Yu added a comment - Some tests failed due to: Caused by: java.io.IOException: Too many open files at sun.nio.ch.IOUtil.initPipe(Native Method) at sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:49) at sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:18) One test failure is tracked by HBASE-4675
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12500737/0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data.0.92.v2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 18 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -167 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.coprocessor.TestMasterObserver
          org.apache.hadoop.hbase.master.TestDistributedLogSplitting

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/66//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/66//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/66//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12500737/0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data.0.92.v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 18 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -167 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.coprocessor.TestMasterObserver org.apache.hadoop.hbase.master.TestDistributedLogSplitting Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/66//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/66//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/66//console This message is automatically generated.
          Hide
          Jonathan Hsieh added a comment -

          @Ted I'm basically ok wit it.

          @Seb can you post some of the bad .regioninfo files? I'm curious about what you did to need to use a full rebuild!

          Show
          Jonathan Hsieh added a comment - @Ted I'm basically ok wit it. @Seb can you post some of the bad .regioninfo files? I'm curious about what you did to need to use a full rebuild!
          Hide
          Ted Yu added a comment -

          Sebastian's latest patch applies to 0.92 and hbck related tests passed.
          I think we should include his enhancement.
          Test for his scenario can be added later.

          @Jonathan:
          What's your opinion ?

          Show
          Ted Yu added a comment - Sebastian's latest patch applies to 0.92 and hbck related tests passed. I think we should include his enhancement. Test for his scenario can be added later. @Jonathan: What's your opinion ?
          Hide
          Sebastian Bauer added a comment -

          patch with Ted comments and corrected tests

          PS. for python programer it's still strange that i need to use equals not == for objects

          Show
          Sebastian Bauer added a comment - patch with Ted comments and corrected tests PS. for python programer it's still strange that i need to use equals not == for objects
          Hide
          Jonathan Hsieh added a comment -

          Seb glad to hear that this basically worked for you.

          Would it make sense to add Seb's change as a separate jira after the original patch gets committed? IMO, it feels like it needs a test case as well.

          Show
          Jonathan Hsieh added a comment - Seb glad to hear that this basically worked for you. Would it make sense to add Seb's change as a separate jira after the original patch gets committed? IMO, it feels like it needs a test case as well.
          Hide
          Jonathan Hsieh added a comment -

          attached a non git style version of v3 of the patch. applies on trunk and 0.92.

          Show
          Jonathan Hsieh added a comment - attached a non git style version of v3 of the patch. applies on trunk and 0.92.
          Hide
          Ted Yu added a comment -

          @Sebastian:
          == shouldn't be used in comparing strings:

          +      if(Bytes.toString(getTableName())==""){
          +       byte [] newTableName = HRegionInfo.getTableName(this.getRegionName());
          +       System.out.println(Bytes.toString(newTableName)+": .regioninfo doesn't have tableName value, but we are getting it from regionName :)");
          +       this.setTableName(newTableName);
          +      }
          

          You can use:

           if (getTableName() == null || getTableName().length == 0) {
          

          System.out.println should be replaced by LOG.debug()

          Show
          Ted Yu added a comment - @Sebastian: == shouldn't be used in comparing strings: + if (Bytes.toString(getTableName())==""){ + byte [] newTableName = HRegionInfo.getTableName( this .getRegionName()); + System .out.println(Bytes.toString(newTableName)+ ": .regioninfo doesn't have tableName value, but we are getting it from regionName :)" ); + this .setTableName(newTableName); + } You can use: if (getTableName() == null || getTableName().length == 0) { System.out.println should be replaced by LOG.debug()
          Hide
          Ted Yu added a comment -

          For 0001-HBASE-4377hbck-Offline-rebuild.META.from-fs-data.trunk.v3.patch, I got same compilation error when I tried to run tests.

          Show
          Ted Yu added a comment - For 0001- HBASE-4377 hbck-Offline-rebuild .META. from-fs-data .trunk.v3.patch, I got same compilation error when I tried to run tests.
          Hide
          Ted Yu added a comment -

          Prepare for Jenkins patch testing.

          Show
          Ted Yu added a comment - Prepare for Jenkins patch testing.
          Hide
          Ted Yu added a comment -

          @Sebastian:
          0.92 is close to release.

          I got the following:

          [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.0.2:testCompile (default-testCompile) on project hbase: Compilation failure: Compilation failure:
          [ERROR] /Users/zhihyu/92hbase/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java:[277,21] doFsck(org.apache.hadoop.conf.Configuration,boolean) in org.apache.hadoop.hbase.util.hbck.HbckTestingUtil cannot be applied to (boolean)
          [ERROR] 
          [ERROR] /Users/zhihyu/92hbase/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java:[286,23] doFsck(org.apache.hadoop.conf.Configuration,boolean) in org.apache.hadoop.hbase.util.hbck.HbckTestingUtil cannot be applied to (boolean)
          

          Do you mind refresh patch for 0.92 ?

          Thanks

          Show
          Ted Yu added a comment - @Sebastian: 0.92 is close to release. I got the following: [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.0.2:testCompile ( default -testCompile) on project hbase: Compilation failure: Compilation failure: [ERROR] /Users/zhihyu/92hbase/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java:[277,21] doFsck(org.apache.hadoop.conf.Configuration, boolean ) in org.apache.hadoop.hbase.util.hbck.HbckTestingUtil cannot be applied to ( boolean ) [ERROR] [ERROR] /Users/zhihyu/92hbase/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java:[286,23] doFsck(org.apache.hadoop.conf.Configuration, boolean ) in org.apache.hadoop.hbase.util.hbck.HbckTestingUtil cannot be applied to ( boolean ) Do you mind refresh patch for 0.92 ? Thanks
          Hide
          Sebastian Bauer added a comment -

          It base from 0001-HBASE-4377hbck-Offline-rebuild.META.from-fs-data.trunk.v3.patch

          It have small enhancement in MetaEntry constructor, because i had realy broken testing env after recreating META i had empty TableName values, so this enhancement fill TableName from regionName, this need setTableName method in HRegionInfo class(now ith public, but its can be protect). After recreating META lastly i have testing env working again

          Show
          Sebastian Bauer added a comment - It base from 0001- HBASE-4377 hbck-Offline-rebuild .META. from-fs-data .trunk.v3.patch It have small enhancement in MetaEntry constructor, because i had realy broken testing env after recreating META i had empty TableName values, so this enhancement fill TableName from regionName, this need setTableName method in HRegionInfo class(now ith public, but its can be protect). After recreating META lastly i have testing env working again
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2287/
          -----------------------------------------------------------

          (Updated 2011-10-20 03:46:53.527042)

          Review request for hbase and Ted Yu.

          Changes
          -------

          New version for 0.90 that does not require HBASE-3777 / HBASE-4508.

          Summary
          -------

          Backport to 0.90

          commit 89862b73c6358e27220b87b0362599d86ab0fe4a
          Author: Jonathan Hsieh <jon@cloudera.com>
          Date: Wed Sep 28 10:18:11 2011 -0700

          HBASE-4377 [hbck] Offline rebuild .META. from fs data only

          This addresses bug HBASE-4377.
          https://issues.apache.org/jira/browse/HBASE-4377

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java ef246c3
          src/main/java/org/apache/hadoop/hbase/util/Bytes.java 13ad026
          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java b04aab6
          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java f792720
          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRebuildTestCore.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2287/diff

          Testing
          -------

          Note, the assertion test result is different in the failure cases due to HBASE-451 changes. (0.90 returns 0 tables since it does a meta scan on empty meta, trunk branch looks at hdfs dirs, and returns 1).

          This version passes after HBASE-4508 (backport HBASE-3777 to 0.90 branch) is applied.

          I believe if that patch is not applied, I could modify the test code to force some explicit HConnection deletions.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2287/ ----------------------------------------------------------- (Updated 2011-10-20 03:46:53.527042) Review request for hbase and Ted Yu. Changes ------- New version for 0.90 that does not require HBASE-3777 / HBASE-4508 . Summary ------- Backport to 0.90 commit 89862b73c6358e27220b87b0362599d86ab0fe4a Author: Jonathan Hsieh <jon@cloudera.com> Date: Wed Sep 28 10:18:11 2011 -0700 HBASE-4377 [hbck] Offline rebuild .META. from fs data only This addresses bug HBASE-4377 . https://issues.apache.org/jira/browse/HBASE-4377 Diffs (updated) src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java ef246c3 src/main/java/org/apache/hadoop/hbase/util/Bytes.java 13ad026 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java b04aab6 src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java f792720 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRebuildTestCore.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java PRE-CREATION Diff: https://reviews.apache.org/r/2287/diff Testing ------- Note, the assertion test result is different in the failure cases due to HBASE-451 changes. (0.90 returns 0 tables since it does a meta scan on empty meta, trunk branch looks at hdfs dirs, and returns 1). This version passes after HBASE-4508 (backport HBASE-3777 to 0.90 branch) is applied. I believe if that patch is not applied, I could modify the test code to force some explicit HConnection deletions. Thanks, jmhsieh
          Hide
          Jonathan Hsieh added a comment -

          0.90 v4 works without having HBASE-4508/HBASE-3777 on the 0.90 branch.

          Show
          Jonathan Hsieh added a comment - 0.90 v4 works without having HBASE-4508 / HBASE-3777 on the 0.90 branch.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2126/
          -----------------------------------------------------------

          (Updated 2011-10-20 03:22:39.708313)

          Review request for hbase, Michael Stack and Andrew Purtell.

          Changes
          -------

          Ported updates from comments from 0.90 branch to trunk/0.92 branch.

          Summary
          -------

          commit fbf82c17be6b3ecca5a981f5270cf93aac26e479
          Author: Jonathan Hsieh <jon@cloudera.com>
          Date: Wed Sep 28 10:18:11 2011 -0700

          HBASE-4377 [hbck] Offline rebuild .META. from fs data only

          This patch rebuilds a new .META. table by reading all the .regioninfo files in the hbase main directory. It depends on the yet to be committed HBASE-4515 (either my verison or Gary's version), HBASE-4509, and HBASE-4506.

          Some follow on work includes backporting to 0.90, auto-patching true holes, and adding documentation.

          This addresses bug HBASE-4377.
          https://issues.apache.org/jira/browse/HBASE-4377

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java b9c850d
          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 7409c9c
          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java f5be448
          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRebuildTestCore.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2126/diff

          Testing
          -------

          An earlier version of this code (backported to 0.90) was used to diagnose and repair a cluster that had 2700 inconsistencies due to failed splits (the cluster was underprovisioned memory-wise, and on restart, the some regions would start splitting and then die due to oome's). This was not actually used on a live cluster – it was used to reconstruct a .META. from .regioninfo's laid out in hbase's directory structure.

          Note also that this is not an automatic fix – whenever any problems are found, this bails out but dumps info on holes, suggests some fixes, and displays sets of overlapping regions. It is up to the user to merge regions, to create .regioninfo files to plug hole, and to do any potential data loosing operations.

          The tests demonstrate current expected behavior – rebuild meta if things line up, and fail without making modifications if holes or overlaps exist.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2126/ ----------------------------------------------------------- (Updated 2011-10-20 03:22:39.708313) Review request for hbase, Michael Stack and Andrew Purtell. Changes ------- Ported updates from comments from 0.90 branch to trunk/0.92 branch. Summary ------- commit fbf82c17be6b3ecca5a981f5270cf93aac26e479 Author: Jonathan Hsieh <jon@cloudera.com> Date: Wed Sep 28 10:18:11 2011 -0700 HBASE-4377 [hbck] Offline rebuild .META. from fs data only This patch rebuilds a new .META. table by reading all the .regioninfo files in the hbase main directory. It depends on the yet to be committed HBASE-4515 (either my verison or Gary's version), HBASE-4509 , and HBASE-4506 . Some follow on work includes backporting to 0.90, auto-patching true holes, and adding documentation. This addresses bug HBASE-4377 . https://issues.apache.org/jira/browse/HBASE-4377 Diffs (updated) src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java b9c850d src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 7409c9c src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java f5be448 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRebuildTestCore.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java PRE-CREATION Diff: https://reviews.apache.org/r/2126/diff Testing ------- An earlier version of this code (backported to 0.90) was used to diagnose and repair a cluster that had 2700 inconsistencies due to failed splits (the cluster was underprovisioned memory-wise, and on restart, the some regions would start splitting and then die due to oome's). This was not actually used on a live cluster – it was used to reconstruct a .META. from .regioninfo's laid out in hbase's directory structure. Note also that this is not an automatic fix – whenever any problems are found, this bails out but dumps info on holes, suggests some fixes, and displays sets of overlapping regions. It is up to the user to merge regions, to create .regioninfo files to plug hole, and to do any potential data loosing operations. The tests demonstrate current expected behavior – rebuild meta if things line up, and fail without making modifications if holes or overlaps exist. Thanks, jmhsieh
          Hide
          Jonathan Hsieh added a comment -

          0.90 version requires HBASE-4508

          Show
          Jonathan Hsieh added a comment - 0.90 version requires HBASE-4508
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2287/
          -----------------------------------------------------------

          (Updated 2011-10-20 03:21:33.922683)

          Review request for hbase and Ted Yu.

          Changes
          -------

          Addressed comments

          • added more logging and better error message
          • Handled exit properly.

          Summary
          -------

          Backport to 0.90

          commit 89862b73c6358e27220b87b0362599d86ab0fe4a
          Author: Jonathan Hsieh <jon@cloudera.com>
          Date: Wed Sep 28 10:18:11 2011 -0700

          HBASE-4377 [hbck] Offline rebuild .META. from fs data only

          This addresses bug HBASE-4377.
          https://issues.apache.org/jira/browse/HBASE-4377

          Diffs (updated)


          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java f792720
          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRebuildTestCore.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java ef246c3
          src/main/java/org/apache/hadoop/hbase/util/Bytes.java 13ad026
          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java b04aab6
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2287/diff

          Testing
          -------

          Note, the assertion test result is different in the failure cases due to HBASE-451 changes. (0.90 returns 0 tables since it does a meta scan on empty meta, trunk branch looks at hdfs dirs, and returns 1).

          This version passes after HBASE-4508 (backport HBASE-3777 to 0.90 branch) is applied.

          I believe if that patch is not applied, I could modify the test code to force some explicit HConnection deletions.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2287/ ----------------------------------------------------------- (Updated 2011-10-20 03:21:33.922683) Review request for hbase and Ted Yu. Changes ------- Addressed comments added more logging and better error message Handled exit properly. Summary ------- Backport to 0.90 commit 89862b73c6358e27220b87b0362599d86ab0fe4a Author: Jonathan Hsieh <jon@cloudera.com> Date: Wed Sep 28 10:18:11 2011 -0700 HBASE-4377 [hbck] Offline rebuild .META. from fs data only This addresses bug HBASE-4377 . https://issues.apache.org/jira/browse/HBASE-4377 Diffs (updated) src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java f792720 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRebuildTestCore.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java ef246c3 src/main/java/org/apache/hadoop/hbase/util/Bytes.java 13ad026 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java b04aab6 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java PRE-CREATION Diff: https://reviews.apache.org/r/2287/diff Testing ------- Note, the assertion test result is different in the failure cases due to HBASE-451 changes. (0.90 returns 0 tables since it does a meta scan on empty meta, trunk branch looks at hdfs dirs, and returns 1). This version passes after HBASE-4508 (backport HBASE-3777 to 0.90 branch) is applied. I believe if that patch is not applied, I could modify the test code to force some explicit HConnection deletions. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2011-10-07 21:01:16, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 302

          > <https://reviews.apache.org/r/2287/diff/1/?file=48780#file48780line302>

          >

          > Naming rd as rootdir would make the code more readable.

          done

          On 2011-10-07 21:01:16, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 446

          > <https://reviews.apache.org/r/2287/diff/1/?file=48780#file48780line446>

          >

          > I think LOG.info() should be used here.

          I think it is still a problem, but we are in an ok state. I've changed it to 'warn' instead.

          On 2011-10-07 21:01:16, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 276

          > <https://reviews.apache.org/r/2287/diff/1/?file=48780#file48780line276>

          >

          > Minor suggestion: IOException may occur more than once. Would logging all such IOException's before bailing out make user experience better ?

          > Basically we just need to track the last such IOException in a variable and bail out at line 283 if the variable isn't null.

          Updated to track all IOE's and throw MultipleIOException.

          On 2011-10-07 21:01:16, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 346

          > <https://reviews.apache.org/r/2287/diff/1/?file=48780#file48780line346>

          >

          > I think rebuildMeta() should check the return value from generatePuts().

          > Otherwise we would encounter NPE at line 405 below.

          see below

          On 2011-10-07 21:01:16, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 407

          > <https://reviews.apache.org/r/2287/diff/1/?file=48780#file48780line407>

          >

          > false should be returned if puts is null.

          So I believe that checkHdfs and loadTableInfo and the error checking happens before and bails out after suggestFixes(). But sure, it doesn't really hurt here to be event more defensive.

          On 2011-10-07 21:01:16, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 378

          > <https://reviews.apache.org/r/2287/diff/1/?file=48780#file48780line378>

          >

          > Do you plan to add this logic in another JIRA ?

          I have a patch that adds this but it is having problems on the trunk side. I'd like to get this in first and then then we'll deal with that next. New issue filed HBASE-4632.

          • jmhsieh

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2287/#review2440
          -----------------------------------------------------------

          On 2011-10-07 19:04:44, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2287/

          -----------------------------------------------------------

          (Updated 2011-10-07 19:04:44)

          Review request for hbase and Ted Yu.

          Summary

          -------

          Backport to 0.90

          commit 89862b73c6358e27220b87b0362599d86ab0fe4a

          Author: Jonathan Hsieh <jon@cloudera.com>

          Date: Wed Sep 28 10:18:11 2011 -0700

          HBASE-4377 [hbck] Offline rebuild .META. from fs data only

          This addresses bug HBASE-4377.

          https://issues.apache.org/jira/browse/HBASE-4377

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java ef246c3

          src/main/java/org/apache/hadoop/hbase/util/Bytes.java 13ad026

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java b04aab6

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java f792720

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRebuildTestCore.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2287/diff

          Testing

          -------

          Note, the assertion test result is different in the failure cases due to HBASE-451 changes. (0.90 returns 0 tables since it does a meta scan on empty meta, trunk branch looks at hdfs dirs, and returns 1).

          This version passes after HBASE-4508 (backport HBASE-3777 to 0.90 branch) is applied.

          I believe if that patch is not applied, I could modify the test code to force some explicit HConnection deletions.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - On 2011-10-07 21:01:16, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 302 > < https://reviews.apache.org/r/2287/diff/1/?file=48780#file48780line302 > > > Naming rd as rootdir would make the code more readable. done On 2011-10-07 21:01:16, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 446 > < https://reviews.apache.org/r/2287/diff/1/?file=48780#file48780line446 > > > I think LOG.info() should be used here. I think it is still a problem, but we are in an ok state. I've changed it to 'warn' instead. On 2011-10-07 21:01:16, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 276 > < https://reviews.apache.org/r/2287/diff/1/?file=48780#file48780line276 > > > Minor suggestion: IOException may occur more than once. Would logging all such IOException's before bailing out make user experience better ? > Basically we just need to track the last such IOException in a variable and bail out at line 283 if the variable isn't null. Updated to track all IOE's and throw MultipleIOException. On 2011-10-07 21:01:16, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 346 > < https://reviews.apache.org/r/2287/diff/1/?file=48780#file48780line346 > > > I think rebuildMeta() should check the return value from generatePuts(). > Otherwise we would encounter NPE at line 405 below. see below On 2011-10-07 21:01:16, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 407 > < https://reviews.apache.org/r/2287/diff/1/?file=48780#file48780line407 > > > false should be returned if puts is null. So I believe that checkHdfs and loadTableInfo and the error checking happens before and bails out after suggestFixes(). But sure, it doesn't really hurt here to be event more defensive. On 2011-10-07 21:01:16, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 378 > < https://reviews.apache.org/r/2287/diff/1/?file=48780#file48780line378 > > > Do you plan to add this logic in another JIRA ? I have a patch that adds this but it is having problems on the trunk side. I'd like to get this in first and then then we'll deal with that next. New issue filed HBASE-4632 . jmhsieh ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2287/#review2440 ----------------------------------------------------------- On 2011-10-07 19:04:44, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2287/ ----------------------------------------------------------- (Updated 2011-10-07 19:04:44) Review request for hbase and Ted Yu. Summary ------- Backport to 0.90 commit 89862b73c6358e27220b87b0362599d86ab0fe4a Author: Jonathan Hsieh <jon@cloudera.com> Date: Wed Sep 28 10:18:11 2011 -0700 HBASE-4377 [hbck] Offline rebuild .META. from fs data only This addresses bug HBASE-4377 . https://issues.apache.org/jira/browse/HBASE-4377 Diffs ----- src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java ef246c3 src/main/java/org/apache/hadoop/hbase/util/Bytes.java 13ad026 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java b04aab6 src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java f792720 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRebuildTestCore.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java PRE-CREATION Diff: https://reviews.apache.org/r/2287/diff Testing ------- Note, the assertion test result is different in the failure cases due to HBASE-451 changes. (0.90 returns 0 tables since it does a meta scan on empty meta, trunk branch looks at hdfs dirs, and returns 1). This version passes after HBASE-4508 (backport HBASE-3777 to 0.90 branch) is applied. I believe if that patch is not applied, I could modify the test code to force some explicit HConnection deletions. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2287/#review2440
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/2287/#comment5546>

          Minor suggestion: IOException may occur more than once. Would logging all such IOException's before bailing out make user experience better ?
          Basically we just need to track the last such IOException in a variable and bail out at line 283 if the variable isn't null.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/2287/#comment5545>

          Naming rd as rootdir would make the code more readable.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/2287/#comment5548>

          I think rebuildMeta() should check the return value from generatePuts().
          Otherwise we would encounter NPE at line 405 below.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/2287/#comment5549>

          Do you plan to add this logic in another JIRA ?

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/2287/#comment5550>

          false should be returned if puts is null.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/2287/#comment5552>

          I think LOG.info() should be used here.

          • Ted

          On 2011-10-07 19:04:44, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2287/

          -----------------------------------------------------------

          (Updated 2011-10-07 19:04:44)

          Review request for hbase and Ted Yu.

          Summary

          -------

          Backport to 0.90

          commit 89862b73c6358e27220b87b0362599d86ab0fe4a

          Author: Jonathan Hsieh <jon@cloudera.com>

          Date: Wed Sep 28 10:18:11 2011 -0700

          HBASE-4377 [hbck] Offline rebuild .META. from fs data only

          This addresses bug HBASE-4377.

          https://issues.apache.org/jira/browse/HBASE-4377

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java ef246c3

          src/main/java/org/apache/hadoop/hbase/util/Bytes.java 13ad026

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java b04aab6

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java f792720

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRebuildTestCore.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2287/diff

          Testing

          -------

          Note, the assertion test result is different in the failure cases due to HBASE-451 changes. (0.90 returns 0 tables since it does a meta scan on empty meta, trunk branch looks at hdfs dirs, and returns 1).

          This version passes after HBASE-4508 (backport HBASE-3777 to 0.90 branch) is applied.

          I believe if that patch is not applied, I could modify the test code to force some explicit HConnection deletions.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2287/#review2440 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/2287/#comment5546 > Minor suggestion: IOException may occur more than once. Would logging all such IOException's before bailing out make user experience better ? Basically we just need to track the last such IOException in a variable and bail out at line 283 if the variable isn't null. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/2287/#comment5545 > Naming rd as rootdir would make the code more readable. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/2287/#comment5548 > I think rebuildMeta() should check the return value from generatePuts(). Otherwise we would encounter NPE at line 405 below. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/2287/#comment5549 > Do you plan to add this logic in another JIRA ? src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/2287/#comment5550 > false should be returned if puts is null. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/2287/#comment5552 > I think LOG.info() should be used here. Ted On 2011-10-07 19:04:44, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2287/ ----------------------------------------------------------- (Updated 2011-10-07 19:04:44) Review request for hbase and Ted Yu. Summary ------- Backport to 0.90 commit 89862b73c6358e27220b87b0362599d86ab0fe4a Author: Jonathan Hsieh <jon@cloudera.com> Date: Wed Sep 28 10:18:11 2011 -0700 HBASE-4377 [hbck] Offline rebuild .META. from fs data only This addresses bug HBASE-4377 . https://issues.apache.org/jira/browse/HBASE-4377 Diffs ----- src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java ef246c3 src/main/java/org/apache/hadoop/hbase/util/Bytes.java 13ad026 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java b04aab6 src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java f792720 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRebuildTestCore.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java PRE-CREATION Diff: https://reviews.apache.org/r/2287/diff Testing ------- Note, the assertion test result is different in the failure cases due to HBASE-451 changes. (0.90 returns 0 tables since it does a meta scan on empty meta, trunk branch looks at hdfs dirs, and returns 1). This version passes after HBASE-4508 (backport HBASE-3777 to 0.90 branch) is applied. I believe if that patch is not applied, I could modify the test code to force some explicit HConnection deletions. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2287/
          -----------------------------------------------------------

          Review request for hbase and Ted Yu.

          Summary
          -------

          Backport to 0.90

          commit 89862b73c6358e27220b87b0362599d86ab0fe4a
          Author: Jonathan Hsieh <jon@cloudera.com>
          Date: Wed Sep 28 10:18:11 2011 -0700

          HBASE-4377 [hbck] Offline rebuild .META. from fs data only

          This addresses bug HBASE-4377.
          https://issues.apache.org/jira/browse/HBASE-4377

          Diffs


          src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java ef246c3
          src/main/java/org/apache/hadoop/hbase/util/Bytes.java 13ad026
          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java b04aab6
          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java f792720
          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRebuildTestCore.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2287/diff

          Testing
          -------

          Note, the assertion test result is different in the failure cases due to HBASE-451 changes. (0.90 returns 0 tables since it does a meta scan on empty meta, trunk branch looks at hdfs dirs, and returns 1).

          This version passes after HBASE-4508 (backport HBASE-3777 to 0.90 branch) is applied.

          I believe if that patch is not applied, I could modify the test code to force some explicit HConnection deletions.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2287/ ----------------------------------------------------------- Review request for hbase and Ted Yu. Summary ------- Backport to 0.90 commit 89862b73c6358e27220b87b0362599d86ab0fe4a Author: Jonathan Hsieh <jon@cloudera.com> Date: Wed Sep 28 10:18:11 2011 -0700 HBASE-4377 [hbck] Offline rebuild .META. from fs data only This addresses bug HBASE-4377 . https://issues.apache.org/jira/browse/HBASE-4377 Diffs src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java ef246c3 src/main/java/org/apache/hadoop/hbase/util/Bytes.java 13ad026 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java b04aab6 src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java f792720 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRebuildTestCore.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java PRE-CREATION Diff: https://reviews.apache.org/r/2287/diff Testing ------- Note, the assertion test result is different in the failure cases due to HBASE-451 changes. (0.90 returns 0 tables since it does a meta scan on empty meta, trunk branch looks at hdfs dirs, and returns 1). This version passes after HBASE-4508 (backport HBASE-3777 to 0.90 branch) is applied. I believe if that patch is not applied, I could modify the test code to force some explicit HConnection deletions. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2126/
          -----------------------------------------------------------

          (Updated 2011-10-07 18:46:54.806909)

          Review request for hbase, Michael Stack and Andrew Purtell.

          Changes
          -------

          Updates with nits and separated tests into different classes so that we can rely on new jvms to avoid OO file handle errors intermittently encountered when shutting down and restarting mini clusters.

          Summary
          -------

          commit fbf82c17be6b3ecca5a981f5270cf93aac26e479
          Author: Jonathan Hsieh <jon@cloudera.com>
          Date: Wed Sep 28 10:18:11 2011 -0700

          HBASE-4377 [hbck] Offline rebuild .META. from fs data only

          This patch rebuilds a new .META. table by reading all the .regioninfo files in the hbase main directory. It depends on the yet to be committed HBASE-4515 (either my verison or Gary's version), HBASE-4509, and HBASE-4506.

          Some follow on work includes backporting to 0.90, auto-patching true holes, and adding documentation.

          This addresses bug HBASE-4377.
          https://issues.apache.org/jira/browse/HBASE-4377

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java b9c850d
          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 154ac32
          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java f5be448
          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRebuildTestCore.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2126/diff

          Testing
          -------

          An earlier version of this code (backported to 0.90) was used to diagnose and repair a cluster that had 2700 inconsistencies due to failed splits (the cluster was underprovisioned memory-wise, and on restart, the some regions would start splitting and then die due to oome's). This was not actually used on a live cluster – it was used to reconstruct a .META. from .regioninfo's laid out in hbase's directory structure.

          Note also that this is not an automatic fix – whenever any problems are found, this bails out but dumps info on holes, suggests some fixes, and displays sets of overlapping regions. It is up to the user to merge regions, to create .regioninfo files to plug hole, and to do any potential data loosing operations.

          The tests demonstrate current expected behavior – rebuild meta if things line up, and fail without making modifications if holes or overlaps exist.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2126/ ----------------------------------------------------------- (Updated 2011-10-07 18:46:54.806909) Review request for hbase, Michael Stack and Andrew Purtell. Changes ------- Updates with nits and separated tests into different classes so that we can rely on new jvms to avoid OO file handle errors intermittently encountered when shutting down and restarting mini clusters. Summary ------- commit fbf82c17be6b3ecca5a981f5270cf93aac26e479 Author: Jonathan Hsieh <jon@cloudera.com> Date: Wed Sep 28 10:18:11 2011 -0700 HBASE-4377 [hbck] Offline rebuild .META. from fs data only This patch rebuilds a new .META. table by reading all the .regioninfo files in the hbase main directory. It depends on the yet to be committed HBASE-4515 (either my verison or Gary's version), HBASE-4509 , and HBASE-4506 . Some follow on work includes backporting to 0.90, auto-patching true holes, and adding documentation. This addresses bug HBASE-4377 . https://issues.apache.org/jira/browse/HBASE-4377 Diffs (updated) src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java b9c850d src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 154ac32 src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java f5be448 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRebuildTestCore.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java PRE-CREATION Diff: https://reviews.apache.org/r/2126/diff Testing ------- An earlier version of this code (backported to 0.90) was used to diagnose and repair a cluster that had 2700 inconsistencies due to failed splits (the cluster was underprovisioned memory-wise, and on restart, the some regions would start splitting and then die due to oome's). This was not actually used on a live cluster – it was used to reconstruct a .META. from .regioninfo's laid out in hbase's directory structure. Note also that this is not an automatic fix – whenever any problems are found, this bails out but dumps info on holes, suggests some fixes, and displays sets of overlapping regions. It is up to the user to merge regions, to create .regioninfo files to plug hole, and to do any potential data loosing operations. The tests demonstrate current expected behavior – rebuild meta if things line up, and fail without making modifications if holes or overlaps exist. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2126/
          -----------------------------------------------------------

          (Updated 2011-10-07 18:47:01.208741)

          Review request for hbase, Michael Stack and Andrew Purtell.

          Summary
          -------

          commit fbf82c17be6b3ecca5a981f5270cf93aac26e479
          Author: Jonathan Hsieh <jon@cloudera.com>
          Date: Wed Sep 28 10:18:11 2011 -0700

          HBASE-4377 [hbck] Offline rebuild .META. from fs data only

          This patch rebuilds a new .META. table by reading all the .regioninfo files in the hbase main directory. It depends on the yet to be committed HBASE-4515 (either my verison or Gary's version), HBASE-4509, and HBASE-4506.

          Some follow on work includes backporting to 0.90, auto-patching true holes, and adding documentation.

          This addresses bug HBASE-4377.
          https://issues.apache.org/jira/browse/HBASE-4377

          Diffs


          src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java b9c850d
          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 154ac32
          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java f5be448
          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRebuildTestCore.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2126/diff

          Testing
          -------

          An earlier version of this code (backported to 0.90) was used to diagnose and repair a cluster that had 2700 inconsistencies due to failed splits (the cluster was underprovisioned memory-wise, and on restart, the some regions would start splitting and then die due to oome's). This was not actually used on a live cluster – it was used to reconstruct a .META. from .regioninfo's laid out in hbase's directory structure.

          Note also that this is not an automatic fix – whenever any problems are found, this bails out but dumps info on holes, suggests some fixes, and displays sets of overlapping regions. It is up to the user to merge regions, to create .regioninfo files to plug hole, and to do any potential data loosing operations.

          The tests demonstrate current expected behavior – rebuild meta if things line up, and fail without making modifications if holes or overlaps exist.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2126/ ----------------------------------------------------------- (Updated 2011-10-07 18:47:01.208741) Review request for hbase, Michael Stack and Andrew Purtell. Summary ------- commit fbf82c17be6b3ecca5a981f5270cf93aac26e479 Author: Jonathan Hsieh <jon@cloudera.com> Date: Wed Sep 28 10:18:11 2011 -0700 HBASE-4377 [hbck] Offline rebuild .META. from fs data only This patch rebuilds a new .META. table by reading all the .regioninfo files in the hbase main directory. It depends on the yet to be committed HBASE-4515 (either my verison or Gary's version), HBASE-4509 , and HBASE-4506 . Some follow on work includes backporting to 0.90, auto-patching true holes, and adding documentation. This addresses bug HBASE-4377 . https://issues.apache.org/jira/browse/HBASE-4377 Diffs src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java b9c850d src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 154ac32 src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java f5be448 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRebuildTestCore.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java PRE-CREATION Diff: https://reviews.apache.org/r/2126/diff Testing ------- An earlier version of this code (backported to 0.90) was used to diagnose and repair a cluster that had 2700 inconsistencies due to failed splits (the cluster was underprovisioned memory-wise, and on restart, the some regions would start splitting and then die due to oome's). This was not actually used on a live cluster – it was used to reconstruct a .META. from .regioninfo's laid out in hbase's directory structure. Note also that this is not an automatic fix – whenever any problems are found, this bails out but dumps info on holes, suggests some fixes, and displays sets of overlapping regions. It is up to the user to merge regions, to create .regioninfo files to plug hole, and to do any potential data loosing operations. The tests demonstrate current expected behavior – rebuild meta if things line up, and fail without making modifications if holes or overlaps exist. Thanks, jmhsieh
          Hide
          Jonathan Hsieh added a comment -

          @Todd,

          I think there is some confusion. Clients do not directly access hdfs. Let me add more detail.

          In trunk post HBASE-451, the HMaster reads and caches data from the file system (not the client). It then serves this the HTableDescriptors to the client rpc's via HConnectionManager to talk to the HMaster which just ships the cached HTD data.

          HMaster on initialization reads file system for HTD data.
          Client calls listTables() -> HMaster (serve cached data from file system).

          Pre-HBASE-451, it the client HConnectionManager does a meta scan and builds HTableDescriptors.

          Client calls listTables() which actually is a metascan and that builds htds.

          Show
          Jonathan Hsieh added a comment - @Todd, I think there is some confusion. Clients do not directly access hdfs. Let me add more detail. In trunk post HBASE-451 , the HMaster reads and caches data from the file system (not the client). It then serves this the HTableDescriptors to the client rpc's via HConnectionManager to talk to the HMaster which just ships the cached HTD data. HMaster on initialization reads file system for HTD data. Client calls listTables() -> HMaster (serve cached data from file system). Pre- HBASE-451 , it the client HConnectionManager does a meta scan and builds HTableDescriptors. Client calls listTables() which actually is a metascan and that builds htds.
          Hide
          Todd Lipcon added a comment -

          Post HBASE-451, table data from HConnectionManager.listTables() comes from the files system and is cached by the HMaster, and ignores the meta table

          This seems like a bug - clients should never have to have direct access to HDFS! I filed HBASE-4548

          Show
          Todd Lipcon added a comment - Post HBASE-451 , table data from HConnectionManager.listTables() comes from the files system and is cached by the HMaster, and ignores the meta table This seems like a bug - clients should never have to have direct access to HDFS! I filed HBASE-4548
          Hide
          Jonathan Hsieh added a comment -

          In the 0.90 branch, after deleting meta and restarting the # of tables present is 0.
          In trunk and 0.92 branch, after deleting meta and restart the # of tables present is 1.

          This actually does make sense because HBASE-451 changed the behavior of HMaster – in 0.90 (pre-HBASE-451) it HConnectionManager.listTables() loads table info on the client side via a meta scan. Post HBASE-451, table data from HConnectionManager.listTables() comes from the files system and is cached by the HMaster, and ignores the meta table.

          Show
          Jonathan Hsieh added a comment - In the 0.90 branch, after deleting meta and restarting the # of tables present is 0. In trunk and 0.92 branch, after deleting meta and restart the # of tables present is 1. This actually does make sense because HBASE-451 changed the behavior of HMaster – in 0.90 (pre- HBASE-451 ) it HConnectionManager.listTables() loads table info on the client side via a meta scan. Post HBASE-451 , table data from HConnectionManager.listTables() comes from the files system and is cached by the HMaster, and ignores the meta table.
          Hide
          Jonathan Hsieh added a comment -

          Need to test the 0.90 backport of this with the proposed backport.

          Show
          Jonathan Hsieh added a comment - Need to test the 0.90 backport of this with the proposed backport.
          Hide
          Ted Yu added a comment -

          HBASE-4508 would backport HBASE-3777 to 0.90
          We should get consistent behavior from HBaseTestingUtility after HBASE-4508 goes in.

          Show
          Ted Yu added a comment - HBASE-4508 would backport HBASE-3777 to 0.90 We should get consistent behavior from HBaseTestingUtility after HBASE-4508 goes in.
          Hide
          Jonathan Hsieh added a comment -

          Although I've gotten this to work with live systems, it seems like that there are some problems with the testing on the backports. Different versions have different expected values which does not seem to make sense. HBASE-3777 changed some of the semantics of the HBaseTestingUtility so I'll be investigating more.

          Show
          Jonathan Hsieh added a comment - Although I've gotten this to work with live systems, it seems like that there are some problems with the testing on the backports. Different versions have different expected values which does not seem to make sense. HBASE-3777 changed some of the semantics of the HBaseTestingUtility so I'll be investigating more.
          Hide
          Jonathan Hsieh added a comment -

          When backporting to 0.90, the TestOfflineMetaRebuild test case would fail out due to out of file handles exceptions. I dug for a while and found that the static HConnections cached connections that are not flushed between tests. Even after avoiding that there are other resources (maybe pooling on hdfs client or zk client connections?) that cause the open file handles count to increase significantly after every test case.

          To avoid this problem, I'm going to split out the each rebuild tests into own test case so that each can be executed in a new process and avoid the out of file handles problem. I'll do this for trunk and for the 0.90 backport.

          Show
          Jonathan Hsieh added a comment - When backporting to 0.90, the TestOfflineMetaRebuild test case would fail out due to out of file handles exceptions. I dug for a while and found that the static HConnections cached connections that are not flushed between tests. Even after avoiding that there are other resources (maybe pooling on hdfs client or zk client connections?) that cause the open file handles count to increase significantly after every test case. To avoid this problem, I'm going to split out the each rebuild tests into own test case so that each can be executed in a new process and avoid the out of file handles problem. I'll do this for trunk and for the 0.90 backport.
          Hide
          Jonathan Hsieh added a comment -

          Since the review for trunk had relatively minor issues, I'm going to work on re-backporting this to the 0.90 branch.

          Show
          Jonathan Hsieh added a comment - Since the review for trunk had relatively minor issues, I'm going to work on re-backporting this to the 0.90 branch.
          Hide
          Jonathan Hsieh added a comment -

          trunk version applies on 0.92 and trunk.

          Show
          Jonathan Hsieh added a comment - trunk version applies on 0.92 and trunk.
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2011-09-30 21:27:16, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java, line 376

          > <https://reviews.apache.org/r/2126/diff/1/?file=46564#file46564line376>

          >

          > b is not needed here, same with question mark.

          jmhsieh wrote:

          k

          javadoc form for @param is to list the parameter name, so it should be there. agree that no question mark should be there. i think the javadoc-y phrasing would be something like "whether to enable in-memory caching or not"

          • Jonathan

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2126/#review2231
          -----------------------------------------------------------

          On 2011-09-30 00:02:16, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2126/

          -----------------------------------------------------------

          (Updated 2011-09-30 00:02:16)

          Review request for hbase, Michael Stack and Andrew Purtell.

          Summary

          -------

          commit fbf82c17be6b3ecca5a981f5270cf93aac26e479

          Author: Jonathan Hsieh <jon@cloudera.com>

          Date: Wed Sep 28 10:18:11 2011 -0700

          HBASE-4377 [hbck] Offline rebuild .META. from fs data only

          This patch rebuilds a new .META. table by reading all the .regioninfo files in the hbase main directory. It depends on the yet to be committed HBASE-4515 (either my verison or Gary's version), HBASE-4509, and HBASE-4506.

          Some follow on work includes backporting to 0.90, auto-patching true holes, and adding documentation.

          This addresses bug HBASE-4377.

          https://issues.apache.org/jira/browse/HBASE-4377

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java b9c850d

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 8465724

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java fae0881

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuild.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2126/diff

          Testing

          -------

          An earlier version of this code (backported to 0.90) was used to diagnose and repair a cluster that had 2700 inconsistencies due to failed splits (the cluster was underprovisioned memory-wise, and on restart, the some regions would start splitting and then die due to oome's). This was not actually used on a live cluster – it was used to reconstruct a .META. from .regioninfo's laid out in hbase's directory structure.

          Note also that this is not an automatic fix – whenever any problems are found, this bails out but dumps info on holes, suggests some fixes, and displays sets of overlapping regions. It is up to the user to merge regions, to create .regioninfo files to plug hole, and to do any potential data loosing operations.

          The tests demonstrate current expected behavior – rebuild meta if things line up, and fail without making modifications if holes or overlaps exist.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - On 2011-09-30 21:27:16, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java, line 376 > < https://reviews.apache.org/r/2126/diff/1/?file=46564#file46564line376 > > > b is not needed here, same with question mark. jmhsieh wrote: k javadoc form for @param is to list the parameter name, so it should be there. agree that no question mark should be there. i think the javadoc-y phrasing would be something like "whether to enable in-memory caching or not" Jonathan ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2126/#review2231 ----------------------------------------------------------- On 2011-09-30 00:02:16, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2126/ ----------------------------------------------------------- (Updated 2011-09-30 00:02:16) Review request for hbase, Michael Stack and Andrew Purtell. Summary ------- commit fbf82c17be6b3ecca5a981f5270cf93aac26e479 Author: Jonathan Hsieh <jon@cloudera.com> Date: Wed Sep 28 10:18:11 2011 -0700 HBASE-4377 [hbck] Offline rebuild .META. from fs data only This patch rebuilds a new .META. table by reading all the .regioninfo files in the hbase main directory. It depends on the yet to be committed HBASE-4515 (either my verison or Gary's version), HBASE-4509 , and HBASE-4506 . Some follow on work includes backporting to 0.90, auto-patching true holes, and adding documentation. This addresses bug HBASE-4377 . https://issues.apache.org/jira/browse/HBASE-4377 Diffs ----- src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java b9c850d src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 8465724 src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java fae0881 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuild.java PRE-CREATION Diff: https://reviews.apache.org/r/2126/diff Testing ------- An earlier version of this code (backported to 0.90) was used to diagnose and repair a cluster that had 2700 inconsistencies due to failed splits (the cluster was underprovisioned memory-wise, and on restart, the some regions would start splitting and then die due to oome's). This was not actually used on a live cluster – it was used to reconstruct a .META. from .regioninfo's laid out in hbase's directory structure. Note also that this is not an automatic fix – whenever any problems are found, this bails out but dumps info on holes, suggests some fixes, and displays sets of overlapping regions. It is up to the user to merge regions, to create .regioninfo files to plug hole, and to do any potential data loosing operations. The tests demonstrate current expected behavior – rebuild meta if things line up, and fail without making modifications if holes or overlaps exist. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2126/
          -----------------------------------------------------------

          (Updated 2011-10-03 18:16:51.000353)

          Review request for hbase, Michael Stack and Andrew Purtell.

          Changes
          -------

          Addressed review comments.

          Summary
          -------

          commit fbf82c17be6b3ecca5a981f5270cf93aac26e479
          Author: Jonathan Hsieh <jon@cloudera.com>
          Date: Wed Sep 28 10:18:11 2011 -0700

          HBASE-4377 [hbck] Offline rebuild .META. from fs data only

          This patch rebuilds a new .META. table by reading all the .regioninfo files in the hbase main directory. It depends on the yet to be committed HBASE-4515 (either my verison or Gary's version), HBASE-4509, and HBASE-4506.

          Some follow on work includes backporting to 0.90, auto-patching true holes, and adding documentation.

          This addresses bug HBASE-4377.
          https://issues.apache.org/jira/browse/HBASE-4377

          Diffs (updated)


          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java f5be448
          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java b9c850d
          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 154ac32
          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuild.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2126/diff

          Testing
          -------

          An earlier version of this code (backported to 0.90) was used to diagnose and repair a cluster that had 2700 inconsistencies due to failed splits (the cluster was underprovisioned memory-wise, and on restart, the some regions would start splitting and then die due to oome's). This was not actually used on a live cluster – it was used to reconstruct a .META. from .regioninfo's laid out in hbase's directory structure.

          Note also that this is not an automatic fix – whenever any problems are found, this bails out but dumps info on holes, suggests some fixes, and displays sets of overlapping regions. It is up to the user to merge regions, to create .regioninfo files to plug hole, and to do any potential data loosing operations.

          The tests demonstrate current expected behavior – rebuild meta if things line up, and fail without making modifications if holes or overlaps exist.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2126/ ----------------------------------------------------------- (Updated 2011-10-03 18:16:51.000353) Review request for hbase, Michael Stack and Andrew Purtell. Changes ------- Addressed review comments. Summary ------- commit fbf82c17be6b3ecca5a981f5270cf93aac26e479 Author: Jonathan Hsieh <jon@cloudera.com> Date: Wed Sep 28 10:18:11 2011 -0700 HBASE-4377 [hbck] Offline rebuild .META. from fs data only This patch rebuilds a new .META. table by reading all the .regioninfo files in the hbase main directory. It depends on the yet to be committed HBASE-4515 (either my verison or Gary's version), HBASE-4509 , and HBASE-4506 . Some follow on work includes backporting to 0.90, auto-patching true holes, and adding documentation. This addresses bug HBASE-4377 . https://issues.apache.org/jira/browse/HBASE-4377 Diffs (updated) src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java f5be448 src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java b9c850d src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 154ac32 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuild.java PRE-CREATION Diff: https://reviews.apache.org/r/2126/diff Testing ------- An earlier version of this code (backported to 0.90) was used to diagnose and repair a cluster that had 2700 inconsistencies due to failed splits (the cluster was underprovisioned memory-wise, and on restart, the some regions would start splitting and then die due to oome's). This was not actually used on a live cluster – it was used to reconstruct a .META. from .regioninfo's laid out in hbase's directory structure. Note also that this is not an automatic fix – whenever any problems are found, this bails out but dumps info on holes, suggests some fixes, and displays sets of overlapping regions. It is up to the user to merge regions, to create .regioninfo files to plug hole, and to do any potential data loosing operations. The tests demonstrate current expected behavior – rebuild meta if things line up, and fail without making modifications if holes or overlaps exist. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2011-09-30 21:27:16, Ted Yu wrote:

          > How long did it take to scan the cluster with 2700 inconsistencies ?

          > I see certain places in the code where more parallelism can be achieved if practical use of this feature takes long time.

          The cluster that had 12k total regions after clenaup. It took 2m to run (this was localdisk accesses). I didn't feel that the runtime was something to be concerned about. And I honestly hope this code doesn't get used too often!

          We could use the same WorkItem trick to speed up the code but my feeling is that straightforward and correct is the right first step.

          On 2011-09-30 21:27:16, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java, line 374

          > <https://reviews.apache.org/r/2126/diff/1/?file=46564#file46564line374>

          >

          > Better replace root with ROOT

          done.

          On 2011-09-30 21:27:16, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java, line 376

          > <https://reviews.apache.org/r/2126/diff/1/?file=46564#file46564line376>

          >

          > b is not needed here, same with question mark.

          k

          On 2011-09-30 21:27:16, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java, line 391

          > <https://reviews.apache.org/r/2126/diff/1/?file=46564#file46564line391>

          >

          > Please remove b and question mark.

          k

          On 2011-09-30 21:27:16, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 307

          > <https://reviews.apache.org/r/2126/diff/1/?file=46565#file46565line307>

          >

          > The .META. region is open upon return.

          > I think we should document this.

          changed "live" to "open"

          On 2011-09-30 21:27:16, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 288

          > <https://reviews.apache.org/r/2126/diff/1/?file=46565#file46565line288>

          >

          > It would be nice to log the path for the underlying region.

          > Otherwise what purpose does this catch/rethrow serve ?

          nice catch. Updated to include table name and path.

          On 2011-09-30 21:27:16, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 309

          > <https://reviews.apache.org/r/2126/diff/1/?file=46565#file46565line309>

          >

          > Looking at the usage below, maybe createNewRootAndMeta would be a better name.

          done

          On 2011-09-30 21:27:16, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 352

          > <https://reviews.apache.org/r/2126/diff/1/?file=46565#file46565line352>

          >

          > This log doesn't match the check above.

          > If we only produce Put for the first HbckInfo, we'd better declare that in the log.

          updated error message and change behavior so that it bails out. In this particular case, the invariant is checked before this method is called, but I'll just make it more explicit.

          On 2011-09-30 21:27:16, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 356

          > <https://reviews.apache.org/r/2126/diff/1/?file=46565#file46565line356>

          >

          > This would produce exception if his.size() == 0.

          problem avoided with update.

          On 2011-09-30 21:27:16, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 378

          > <https://reviews.apache.org/r/2126/diff/1/?file=46565#file46565line378>

          >

          > Do you plan to do this in the next patch or in another JIRA ?

          > I haven't looked at the other JIRAs you mentioned, pardon me.

          I'll file it as a follow-on jira.

          On 2011-09-30 21:27:16, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 428

          > <https://reviews.apache.org/r/2126/diff/1/?file=46565#file46565line428>

          >

          > Is there something we can do in case we get IOE from this call ?

          added error logging and an attempt to revert.

          On 2011-09-30 21:27:16, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 377

          > <https://reviews.apache.org/r/2126/diff/1/?file=46565#file46565line377>

          >

          > Better use boolean for return value to indicate success/failure.

          done.

          • jmhsieh

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2126/#review2231
          -----------------------------------------------------------

          On 2011-09-30 00:02:16, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2126/

          -----------------------------------------------------------

          (Updated 2011-09-30 00:02:16)

          Review request for hbase, Michael Stack and Andrew Purtell.

          Summary

          -------

          commit fbf82c17be6b3ecca5a981f5270cf93aac26e479

          Author: Jonathan Hsieh <jon@cloudera.com>

          Date: Wed Sep 28 10:18:11 2011 -0700

          HBASE-4377 [hbck] Offline rebuild .META. from fs data only

          This patch rebuilds a new .META. table by reading all the .regioninfo files in the hbase main directory. It depends on the yet to be committed HBASE-4515 (either my verison or Gary's version), HBASE-4509, and HBASE-4506.

          Some follow on work includes backporting to 0.90, auto-patching true holes, and adding documentation.

          This addresses bug HBASE-4377.

          https://issues.apache.org/jira/browse/HBASE-4377

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java b9c850d

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 8465724

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java fae0881

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuild.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2126/diff

          Testing

          -------

          An earlier version of this code (backported to 0.90) was used to diagnose and repair a cluster that had 2700 inconsistencies due to failed splits (the cluster was underprovisioned memory-wise, and on restart, the some regions would start splitting and then die due to oome's). This was not actually used on a live cluster – it was used to reconstruct a .META. from .regioninfo's laid out in hbase's directory structure.

          Note also that this is not an automatic fix – whenever any problems are found, this bails out but dumps info on holes, suggests some fixes, and displays sets of overlapping regions. It is up to the user to merge regions, to create .regioninfo files to plug hole, and to do any potential data loosing operations.

          The tests demonstrate current expected behavior – rebuild meta if things line up, and fail without making modifications if holes or overlaps exist.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - On 2011-09-30 21:27:16, Ted Yu wrote: > How long did it take to scan the cluster with 2700 inconsistencies ? > I see certain places in the code where more parallelism can be achieved if practical use of this feature takes long time. The cluster that had 12k total regions after clenaup. It took 2m to run (this was localdisk accesses). I didn't feel that the runtime was something to be concerned about. And I honestly hope this code doesn't get used too often! We could use the same WorkItem trick to speed up the code but my feeling is that straightforward and correct is the right first step. On 2011-09-30 21:27:16, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java, line 374 > < https://reviews.apache.org/r/2126/diff/1/?file=46564#file46564line374 > > > Better replace root with ROOT done. On 2011-09-30 21:27:16, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java, line 376 > < https://reviews.apache.org/r/2126/diff/1/?file=46564#file46564line376 > > > b is not needed here, same with question mark. k On 2011-09-30 21:27:16, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java, line 391 > < https://reviews.apache.org/r/2126/diff/1/?file=46564#file46564line391 > > > Please remove b and question mark. k On 2011-09-30 21:27:16, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 307 > < https://reviews.apache.org/r/2126/diff/1/?file=46565#file46565line307 > > > The .META. region is open upon return. > I think we should document this. changed "live" to "open" On 2011-09-30 21:27:16, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 288 > < https://reviews.apache.org/r/2126/diff/1/?file=46565#file46565line288 > > > It would be nice to log the path for the underlying region. > Otherwise what purpose does this catch/rethrow serve ? nice catch. Updated to include table name and path. On 2011-09-30 21:27:16, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 309 > < https://reviews.apache.org/r/2126/diff/1/?file=46565#file46565line309 > > > Looking at the usage below, maybe createNewRootAndMeta would be a better name. done On 2011-09-30 21:27:16, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 352 > < https://reviews.apache.org/r/2126/diff/1/?file=46565#file46565line352 > > > This log doesn't match the check above. > If we only produce Put for the first HbckInfo, we'd better declare that in the log. updated error message and change behavior so that it bails out. In this particular case, the invariant is checked before this method is called, but I'll just make it more explicit. On 2011-09-30 21:27:16, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 356 > < https://reviews.apache.org/r/2126/diff/1/?file=46565#file46565line356 > > > This would produce exception if his.size() == 0. problem avoided with update. On 2011-09-30 21:27:16, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 378 > < https://reviews.apache.org/r/2126/diff/1/?file=46565#file46565line378 > > > Do you plan to do this in the next patch or in another JIRA ? > I haven't looked at the other JIRAs you mentioned, pardon me. I'll file it as a follow-on jira. On 2011-09-30 21:27:16, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 428 > < https://reviews.apache.org/r/2126/diff/1/?file=46565#file46565line428 > > > Is there something we can do in case we get IOE from this call ? added error logging and an attempt to revert. On 2011-09-30 21:27:16, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 377 > < https://reviews.apache.org/r/2126/diff/1/?file=46565#file46565line377 > > > Better use boolean for return value to indicate success/failure. done. jmhsieh ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2126/#review2231 ----------------------------------------------------------- On 2011-09-30 00:02:16, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2126/ ----------------------------------------------------------- (Updated 2011-09-30 00:02:16) Review request for hbase, Michael Stack and Andrew Purtell. Summary ------- commit fbf82c17be6b3ecca5a981f5270cf93aac26e479 Author: Jonathan Hsieh <jon@cloudera.com> Date: Wed Sep 28 10:18:11 2011 -0700 HBASE-4377 [hbck] Offline rebuild .META. from fs data only This patch rebuilds a new .META. table by reading all the .regioninfo files in the hbase main directory. It depends on the yet to be committed HBASE-4515 (either my verison or Gary's version), HBASE-4509 , and HBASE-4506 . Some follow on work includes backporting to 0.90, auto-patching true holes, and adding documentation. This addresses bug HBASE-4377 . https://issues.apache.org/jira/browse/HBASE-4377 Diffs ----- src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java b9c850d src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 8465724 src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java fae0881 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuild.java PRE-CREATION Diff: https://reviews.apache.org/r/2126/diff Testing ------- An earlier version of this code (backported to 0.90) was used to diagnose and repair a cluster that had 2700 inconsistencies due to failed splits (the cluster was underprovisioned memory-wise, and on restart, the some regions would start splitting and then die due to oome's). This was not actually used on a live cluster – it was used to reconstruct a .META. from .regioninfo's laid out in hbase's directory structure. Note also that this is not an automatic fix – whenever any problems are found, this bails out but dumps info on holes, suggests some fixes, and displays sets of overlapping regions. It is up to the user to merge regions, to create .regioninfo files to plug hole, and to do any potential data loosing operations. The tests demonstrate current expected behavior – rebuild meta if things line up, and fail without making modifications if holes or overlaps exist. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2126/#review2236
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/2126/#comment5194>

          My comment above on the first rename call was inaccurate.
          IOE out of the second call would be fatal.

          • Ted

          On 2011-09-30 00:02:16, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2126/

          -----------------------------------------------------------

          (Updated 2011-09-30 00:02:16)

          Review request for hbase, Michael Stack and Andrew Purtell.

          Summary

          -------

          commit fbf82c17be6b3ecca5a981f5270cf93aac26e479

          Author: Jonathan Hsieh <jon@cloudera.com>

          Date: Wed Sep 28 10:18:11 2011 -0700

          HBASE-4377 [hbck] Offline rebuild .META. from fs data only

          This patch rebuilds a new .META. table by reading all the .regioninfo files in the hbase main directory. It depends on the yet to be committed HBASE-4515 (either my verison or Gary's version), HBASE-4509, and HBASE-4506.

          Some follow on work includes backporting to 0.90, auto-patching true holes, and adding documentation.

          This addresses bug HBASE-4377.

          https://issues.apache.org/jira/browse/HBASE-4377

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java b9c850d

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 8465724

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java fae0881

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuild.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2126/diff

          Testing

          -------

          An earlier version of this code (backported to 0.90) was used to diagnose and repair a cluster that had 2700 inconsistencies due to failed splits (the cluster was underprovisioned memory-wise, and on restart, the some regions would start splitting and then die due to oome's). This was not actually used on a live cluster – it was used to reconstruct a .META. from .regioninfo's laid out in hbase's directory structure.

          Note also that this is not an automatic fix – whenever any problems are found, this bails out but dumps info on holes, suggests some fixes, and displays sets of overlapping regions. It is up to the user to merge regions, to create .regioninfo files to plug hole, and to do any potential data loosing operations.

          The tests demonstrate current expected behavior – rebuild meta if things line up, and fail without making modifications if holes or overlaps exist.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2126/#review2236 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/2126/#comment5194 > My comment above on the first rename call was inaccurate. IOE out of the second call would be fatal. Ted On 2011-09-30 00:02:16, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2126/ ----------------------------------------------------------- (Updated 2011-09-30 00:02:16) Review request for hbase, Michael Stack and Andrew Purtell. Summary ------- commit fbf82c17be6b3ecca5a981f5270cf93aac26e479 Author: Jonathan Hsieh <jon@cloudera.com> Date: Wed Sep 28 10:18:11 2011 -0700 HBASE-4377 [hbck] Offline rebuild .META. from fs data only This patch rebuilds a new .META. table by reading all the .regioninfo files in the hbase main directory. It depends on the yet to be committed HBASE-4515 (either my verison or Gary's version), HBASE-4509 , and HBASE-4506 . Some follow on work includes backporting to 0.90, auto-patching true holes, and adding documentation. This addresses bug HBASE-4377 . https://issues.apache.org/jira/browse/HBASE-4377 Diffs ----- src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java b9c850d src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 8465724 src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java fae0881 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuild.java PRE-CREATION Diff: https://reviews.apache.org/r/2126/diff Testing ------- An earlier version of this code (backported to 0.90) was used to diagnose and repair a cluster that had 2700 inconsistencies due to failed splits (the cluster was underprovisioned memory-wise, and on restart, the some regions would start splitting and then die due to oome's). This was not actually used on a live cluster – it was used to reconstruct a .META. from .regioninfo's laid out in hbase's directory structure. Note also that this is not an automatic fix – whenever any problems are found, this bails out but dumps info on holes, suggests some fixes, and displays sets of overlapping regions. It is up to the user to merge regions, to create .regioninfo files to plug hole, and to do any potential data loosing operations. The tests demonstrate current expected behavior – rebuild meta if things line up, and fail without making modifications if holes or overlaps exist. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2126/#review2231
          -----------------------------------------------------------

          How long did it take to scan the cluster with 2700 inconsistencies ?
          I see certain places in the code where more parallelism can be achieved if practical use of this feature takes long time.

          src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java
          <https://reviews.apache.org/r/2126/#comment5180>

          Better replace root with ROOT

          src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java
          <https://reviews.apache.org/r/2126/#comment5179>

          b is not needed here, same with question mark.

          src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java
          <https://reviews.apache.org/r/2126/#comment5181>

          Please remove b and question mark.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/2126/#comment5185>

          It would be nice to log the path for the underlying region.
          Otherwise what purpose does this catch/rethrow serve ?

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/2126/#comment5188>

          The .META. region is open upon return.
          I think we should document this.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/2126/#comment5187>

          Looking at the usage below, maybe createNewRootAndMeta would be a better name.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/2126/#comment5189>

          This log doesn't match the check above.
          If we only produce Put for the first HbckInfo, we'd better declare that in the log.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/2126/#comment5190>

          This would produce exception if his.size() == 0.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/2126/#comment5192>

          Better use boolean for return value to indicate success/failure.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/2126/#comment5191>

          Do you plan to do this in the next patch or in another JIRA ?
          I haven't looked at the other JIRAs you mentioned, pardon me.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/2126/#comment5193>

          Is there something we can do in case we get IOE from this call ?

          • Ted

          On 2011-09-30 00:02:16, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2126/

          -----------------------------------------------------------

          (Updated 2011-09-30 00:02:16)

          Review request for hbase, Michael Stack and Andrew Purtell.

          Summary

          -------

          commit fbf82c17be6b3ecca5a981f5270cf93aac26e479

          Author: Jonathan Hsieh <jon@cloudera.com>

          Date: Wed Sep 28 10:18:11 2011 -0700

          HBASE-4377 [hbck] Offline rebuild .META. from fs data only

          This patch rebuilds a new .META. table by reading all the .regioninfo files in the hbase main directory. It depends on the yet to be committed HBASE-4515 (either my verison or Gary's version), HBASE-4509, and HBASE-4506.

          Some follow on work includes backporting to 0.90, auto-patching true holes, and adding documentation.

          This addresses bug HBASE-4377.

          https://issues.apache.org/jira/browse/HBASE-4377

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java b9c850d

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 8465724

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java fae0881

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuild.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2126/diff

          Testing

          -------

          An earlier version of this code (backported to 0.90) was used to diagnose and repair a cluster that had 2700 inconsistencies due to failed splits (the cluster was underprovisioned memory-wise, and on restart, the some regions would start splitting and then die due to oome's). This was not actually used on a live cluster – it was used to reconstruct a .META. from .regioninfo's laid out in hbase's directory structure.

          Note also that this is not an automatic fix – whenever any problems are found, this bails out but dumps info on holes, suggests some fixes, and displays sets of overlapping regions. It is up to the user to merge regions, to create .regioninfo files to plug hole, and to do any potential data loosing operations.

          The tests demonstrate current expected behavior – rebuild meta if things line up, and fail without making modifications if holes or overlaps exist.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2126/#review2231 ----------------------------------------------------------- How long did it take to scan the cluster with 2700 inconsistencies ? I see certain places in the code where more parallelism can be achieved if practical use of this feature takes long time. src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java < https://reviews.apache.org/r/2126/#comment5180 > Better replace root with ROOT src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java < https://reviews.apache.org/r/2126/#comment5179 > b is not needed here, same with question mark. src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java < https://reviews.apache.org/r/2126/#comment5181 > Please remove b and question mark. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/2126/#comment5185 > It would be nice to log the path for the underlying region. Otherwise what purpose does this catch/rethrow serve ? src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/2126/#comment5188 > The .META. region is open upon return. I think we should document this. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/2126/#comment5187 > Looking at the usage below, maybe createNewRootAndMeta would be a better name. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/2126/#comment5189 > This log doesn't match the check above. If we only produce Put for the first HbckInfo, we'd better declare that in the log. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/2126/#comment5190 > This would produce exception if his.size() == 0. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/2126/#comment5192 > Better use boolean for return value to indicate success/failure. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/2126/#comment5191 > Do you plan to do this in the next patch or in another JIRA ? I haven't looked at the other JIRAs you mentioned, pardon me. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/2126/#comment5193 > Is there something we can do in case we get IOE from this call ? Ted On 2011-09-30 00:02:16, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2126/ ----------------------------------------------------------- (Updated 2011-09-30 00:02:16) Review request for hbase, Michael Stack and Andrew Purtell. Summary ------- commit fbf82c17be6b3ecca5a981f5270cf93aac26e479 Author: Jonathan Hsieh <jon@cloudera.com> Date: Wed Sep 28 10:18:11 2011 -0700 HBASE-4377 [hbck] Offline rebuild .META. from fs data only This patch rebuilds a new .META. table by reading all the .regioninfo files in the hbase main directory. It depends on the yet to be committed HBASE-4515 (either my verison or Gary's version), HBASE-4509 , and HBASE-4506 . Some follow on work includes backporting to 0.90, auto-patching true holes, and adding documentation. This addresses bug HBASE-4377 . https://issues.apache.org/jira/browse/HBASE-4377 Diffs ----- src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java b9c850d src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 8465724 src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java fae0881 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuild.java PRE-CREATION Diff: https://reviews.apache.org/r/2126/diff Testing ------- An earlier version of this code (backported to 0.90) was used to diagnose and repair a cluster that had 2700 inconsistencies due to failed splits (the cluster was underprovisioned memory-wise, and on restart, the some regions would start splitting and then die due to oome's). This was not actually used on a live cluster – it was used to reconstruct a .META. from .regioninfo's laid out in hbase's directory structure. Note also that this is not an automatic fix – whenever any problems are found, this bails out but dumps info on holes, suggests some fixes, and displays sets of overlapping regions. It is up to the user to merge regions, to create .regioninfo files to plug hole, and to do any potential data loosing operations. The tests demonstrate current expected behavior – rebuild meta if things line up, and fail without making modifications if holes or overlaps exist. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2126/
          -----------------------------------------------------------

          Review request for hbase, Michael Stack and Andrew Purtell.

          Summary
          -------

          commit fbf82c17be6b3ecca5a981f5270cf93aac26e479
          Author: Jonathan Hsieh <jon@cloudera.com>
          Date: Wed Sep 28 10:18:11 2011 -0700

          HBASE-4377 [hbck] Offline rebuild .META. from fs data only

          This patch rebuilds a new .META. table by reading all the .regioninfo files in the hbase main directory. It depends on the yet to be committed HBASE-4515 (either my verison or Gary's version), HBASE-4509, and HBASE-4506.

          Some follow on work includes backporting to 0.90, auto-patching true holes, and adding documentation.

          This addresses bug HBASE-4377.
          https://issues.apache.org/jira/browse/HBASE-4377

          Diffs


          src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java b9c850d
          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 8465724
          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java fae0881
          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuild.java PRE-CREATION

          Diff: https://reviews.apache.org/r/2126/diff

          Testing
          -------

          An earlier version of this code (backported to 0.90) was used to diagnose and repair a cluster that had 2700 inconsistencies due to failed splits (the cluster was underprovisioned memory-wise, and on restart, the some regions would start splitting and then die due to oome's). This was not actually used on a live cluster – it was used to reconstruct a .META. from .regioninfo's laid out in hbase's directory structure.

          Note also that this is not an automatic fix – whenever any problems are found, this bails out but dumps info on holes, suggests some fixes, and displays sets of overlapping regions. It is up to the user to merge regions, to create .regioninfo files to plug hole, and to do any potential data loosing operations.

          The tests demonstrate current expected behavior – rebuild meta if things line up, and fail without making modifications if holes or overlaps exist.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2126/ ----------------------------------------------------------- Review request for hbase, Michael Stack and Andrew Purtell. Summary ------- commit fbf82c17be6b3ecca5a981f5270cf93aac26e479 Author: Jonathan Hsieh <jon@cloudera.com> Date: Wed Sep 28 10:18:11 2011 -0700 HBASE-4377 [hbck] Offline rebuild .META. from fs data only This patch rebuilds a new .META. table by reading all the .regioninfo files in the hbase main directory. It depends on the yet to be committed HBASE-4515 (either my verison or Gary's version), HBASE-4509 , and HBASE-4506 . Some follow on work includes backporting to 0.90, auto-patching true holes, and adding documentation. This addresses bug HBASE-4377 . https://issues.apache.org/jira/browse/HBASE-4377 Diffs src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java b9c850d src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 8465724 src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java fae0881 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuild.java PRE-CREATION Diff: https://reviews.apache.org/r/2126/diff Testing ------- An earlier version of this code (backported to 0.90) was used to diagnose and repair a cluster that had 2700 inconsistencies due to failed splits (the cluster was underprovisioned memory-wise, and on restart, the some regions would start splitting and then die due to oome's). This was not actually used on a live cluster – it was used to reconstruct a .META. from .regioninfo's laid out in hbase's directory structure. Note also that this is not an automatic fix – whenever any problems are found, this bails out but dumps info on holes, suggests some fixes, and displays sets of overlapping regions. It is up to the user to merge regions, to create .regioninfo files to plug hole, and to do any potential data loosing operations. The tests demonstrate current expected behavior – rebuild meta if things line up, and fail without making modifications if holes or overlaps exist. Thanks, jmhsieh
          Hide
          Jonathan Hsieh added a comment -

          HBASE-4515 is required for tests to pass consistently

          Show
          Jonathan Hsieh added a comment - HBASE-4515 is required for tests to pass consistently
          Hide
          Jonathan Hsieh added a comment -

          I'm having a hard time with tests that restart the test hbase mini cluster. I start cluster, modify meta/hdfs regions, shutdown cluster, rebuild meta, and then get an NPE when restarting.

          Specifically, this method sometimes returns null which later causes an NPE when constructor calls

          User.HadoopUser.<init>
            ugi = (UserGroupInformation) callStatic("getCurrentUGI");
          

          Test were passing at one point but I can't seem to figure out a direct cause for why this would fail. Any hints?

          Show
          Jonathan Hsieh added a comment - I'm having a hard time with tests that restart the test hbase mini cluster. I start cluster, modify meta/hdfs regions, shutdown cluster, rebuild meta, and then get an NPE when restarting. Specifically, this method sometimes returns null which later causes an NPE when constructor calls User.HadoopUser.<init> ugi = (UserGroupInformation) callStatic( "getCurrentUGI" ); Test were passing at one point but I can't seem to figure out a direct cause for why this would fail. Any hints?
          Hide
          Jonathan Hsieh added a comment -

          More detail – I've done a large refactor of hbck but found that then doing the changes would more difficult understand or review the offline rebuild code. So, my plan is to add the offline rebuild code, and then potentially do a refactor afterwards.

          Regardless of whether the refactor happens, I feel that I need to add tests and docs for this before it is ready for review.

          Show
          Jonathan Hsieh added a comment - More detail – I've done a large refactor of hbck but found that then doing the changes would more difficult understand or review the offline rebuild code. So, my plan is to add the offline rebuild code, and then potentially do a refactor afterwards. Regardless of whether the refactor happens, I feel that I need to add tests and docs for this before it is ready for review.
          Hide
          Jonathan Hsieh added a comment -

          @stack: Not yet, I'm still cleaning this up and adding tests right now.

          Show
          Jonathan Hsieh added a comment - @stack: Not yet, I'm still cleaning this up and adding tests right now.
          Hide
          stack added a comment -

          @Jon So you want me to review whats over in github and commit that?

          Show
          stack added a comment - @Jon So you want me to review whats over in github and commit that?
          Hide
          Jonathan Hsieh added a comment -

          I think my plan is to postpone the large refactor until after this gets through.

          Show
          Jonathan Hsieh added a comment - I think my plan is to postpone the large refactor until after this gets through.
          Hide
          Jonathan Hsieh added a comment -

          I have a very hacky version that I've successfully recently used to rebuild a .META. table with over 10k regions. It can be found here:

          https://github.com/jmhsieh/hbase/tree/hbase-4377

          I've also hacked the hack to backport it onto an 0.90.x branch.

          To run it build hbase and then use the following command line

          bin/hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair -base ~/pathToHbase/hbase -details
          

          The program will fail telling the user about any problems it encounters. It only succeed if all the info gathered from .regioninfo's is clean after going through the regionsplit calculator.

          This code will take some time to clean up.

          I would like to do some refactoring of the current hbck and create a o.a.h.hbase.util.hbck or o.a.h.hbase.hbck package. Any preferences or concerns there?

          Show
          Jonathan Hsieh added a comment - I have a very hacky version that I've successfully recently used to rebuild a .META. table with over 10k regions. It can be found here: https://github.com/jmhsieh/hbase/tree/hbase-4377 I've also hacked the hack to backport it onto an 0.90.x branch. To run it build hbase and then use the following command line bin/hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair -base ~/pathToHbase/hbase -details The program will fail telling the user about any problems it encounters. It only succeed if all the info gathered from .regioninfo's is clean after going through the regionsplit calculator. This code will take some time to clean up. I would like to do some refactoring of the current hbck and create a o.a.h.hbase.util.hbck or o.a.h.hbase.hbck package. Any preferences or concerns there?
          Hide
          stack added a comment -

          +1 on punt to user if whats in fs has overlapping regions (user would rule what to omit).

          Show
          stack added a comment - +1 on punt to user if whats in fs has overlapping regions (user would rule what to omit).

            People

            • Assignee:
              Jonathan Hsieh
              Reporter:
              Jonathan Hsieh
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development