HBase
  1. HBase
  2. HBASE-5128

[uber hbck] Online automated repair of table integrity and region consistency problems

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.90.5, 0.92.0, 0.94.0, 0.95.2
    • Fix Version/s: 0.94.0, 0.95.0
    • Component/s: hbck
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      HBaseFsck (hbck) has been updated with new repair capabilities. hbck is a tool for checking the region consistency and the table integrity invariants of a running HBase cluster. Checking region consistency verifies that .META., region deployment on region servers and the state of data in HDFS (.regioninfo files) all are in accordance. Table integrity checks verify that all possible row keys resolve to exactly one region of a table -- e.g. there are no individual degenerate or backwards regions; no holes between regions; and no overlapping regions. Previously hbck had the ability to diagnose inconsistencies but only had the ability to repair deployment region consistency problems. The updated version now has been augmented with the ability repair region consistency problems in .META. (by patching holes), repair overlapping regions (via merging), patch region holes (by fabricating new regions), and detecting and adopting orphaned regions (by fabricating new .regioninfo file if it is missing in a region's dir).

      Caveats:
      * The new hbck selects repairs assuming that HDFS as ground truth, the previous version treated .META. as ground truth.
      * The hbck '-fix' option is present but deprecated and replaced with '-fixAssignments' option.
      * This tool adds APIs in 0.90.7, 0.92.2 and 0.94.0 for clean repairs. The 0.90 version of the tool is compatible with HBase 0.90+, but may require restarting the master or individual individual regionserver for table enable/disable/delete commands to work properly.
      Show
      HBaseFsck (hbck) has been updated with new repair capabilities. hbck is a tool for checking the region consistency and the table integrity invariants of a running HBase cluster. Checking region consistency verifies that .META., region deployment on region servers and the state of data in HDFS (.regioninfo files) all are in accordance. Table integrity checks verify that all possible row keys resolve to exactly one region of a table -- e.g. there are no individual degenerate or backwards regions; no holes between regions; and no overlapping regions. Previously hbck had the ability to diagnose inconsistencies but only had the ability to repair deployment region consistency problems. The updated version now has been augmented with the ability repair region consistency problems in .META. (by patching holes), repair overlapping regions (via merging), patch region holes (by fabricating new regions), and detecting and adopting orphaned regions (by fabricating new .regioninfo file if it is missing in a region's dir). Caveats: * The new hbck selects repairs assuming that HDFS as ground truth, the previous version treated .META. as ground truth. * The hbck '-fix' option is present but deprecated and replaced with '-fixAssignments' option. * This tool adds APIs in 0.90.7, 0.92.2 and 0.94.0 for clean repairs. The 0.90 version of the tool is compatible with HBase 0.90+, but may require restarting the master or individual individual regionserver for table enable/disable/delete commands to work properly.

      Description

      The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations. However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems. This updated version should be able to handle all cases (including a new orphan regiondir case). When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.

      Here's the approach (from the comment of at the top of the new version of the file).

      /**
       * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
       * table integrity.  
       * 
       * Region consistency checks verify that META, region deployment on
       * region servers and the state of data in HDFS (.regioninfo files) all are in
       * accordance. 
       * 
       * Table integrity checks verify that that all possible row keys can resolve to
       * exactly one region of a table.  This means there are no individual degenerate
       * or backwards regions; no holes between regions; and that there no overlapping
       * regions. 
       * 
       * The general repair strategy works in these steps.
       * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
       * 2) Repair Region Consistency with META and assignments
       * 
       * For table integrity repairs, the tables their region directories are scanned
       * for .regioninfo files.  Each table's integrity is then verified.  If there 
       * are any orphan regions (regions with no .regioninfo files), or holes, new 
       * regions are fabricated.  Backwards regions are sidelined as well as empty
       * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
       * a new region is created and all data is merged into the new region.  
       * 
       * Table integrity repairs deal solely with HDFS and can be done offline -- the
       * hbase region servers or master do not need to be running.  These phase can be
       * use to completely reconstruct the META table in an offline fashion. 
       * 
       * Region consistency requires three conditions -- 1) valid .regioninfo file 
       * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
       * and 3) a region is deployed only at the regionserver that is was assigned to.
       * 
       * Region consistency requires hbck to contact the HBase master and region
       * servers, so the connect() must first be called successfully.  Much of the
       * region consistency information is transient and less risky to repair.
       */
      
      1. hbase-5128-trunk.patch
        153 kB
        Jonathan Hsieh
      2. hbase-5128-trunk-v2.patch
        151 kB
        Jonathan Hsieh
      3. hbase-5128-0.94-v2.patch
        151 kB
        Jonathan Hsieh
      4. hbase-5128-0.92-v2.patch
        151 kB
        Jonathan Hsieh
      5. hbase-5128-0.90-v2.patch
        165 kB
        Jonathan Hsieh
      6. hbase-5128-0.90-v2b.patch
        152 kB
        Jonathan Hsieh
      7. hbase-5128-v3.patch
        154 kB
        Jonathan Hsieh
      8. hbase-5128-0.92-v4.patch
        155 kB
        Jonathan Hsieh
      9. hbase-5128-0.94-v4.patch
        155 kB
        Jonathan Hsieh
      10. hbase-5128-v4.patch
        156 kB
        Jonathan Hsieh
      11. hbase-5128-0.90-v4.patch
        155 kB
        Jonathan Hsieh
      12. 5128-trunk.addendum
        6 kB
        Ted Yu

        Issue Links

          Activity

          Hide
          Jonathan Hsieh added a comment -

          Updated release notes.

          Show
          Jonathan Hsieh added a comment - Updated release notes.
          Hide
          Jonathan Hsieh added a comment -

          Docs jira is here: HBASE-5634.

          Show
          Jonathan Hsieh added a comment - Docs jira is here: HBASE-5634 .
          Hide
          Lars Hofhansl added a comment -

          Thanks for getting this done for 0.94, Jon!
          +1 on release notes and book update, but doesn't need to hold up 0.94rc

          Show
          Lars Hofhansl added a comment - Thanks for getting this done for 0.94, Jon! +1 on release notes and book update, but doesn't need to hold up 0.94rc
          Hide
          stack added a comment -

          Hurray!!!!

          Would suggest you stick something in the release note section Jon as means of spreading the good news about this fat tool. What about this section in the reference manual: http://hbase.apache.org/book.html#hbck Should we update it some?

          Good stuff

          Show
          stack added a comment - Hurray!!!! Would suggest you stick something in the release note section Jon as means of spreading the good news about this fat tool. What about this section in the reference manual: http://hbase.apache.org/book.html#hbck Should we update it some? Good stuff
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #2694 (See https://builds.apache.org/job/HBase-TRUNK/2694/)
          HBASE-5128 Addendum adds two new files Jon forgot to add (Revision 1304702)
          HBASE-5128 [uber hbck] Online automated repair of table integrity and region consistency problems (Revision 1304665)

          Result = SUCCESS
          tedyu :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java

          jmhsieh :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #2694 (See https://builds.apache.org/job/HBase-TRUNK/2694/ ) HBASE-5128 Addendum adds two new files Jon forgot to add (Revision 1304702) HBASE-5128 [uber hbck] Online automated repair of table integrity and region consistency problems (Revision 1304665) Result = SUCCESS tedyu : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java jmhsieh : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.92 #338 (See https://builds.apache.org/job/HBase-0.92/338/)
          HBASE-5128 Addendum adds two missing new files (Revision 1304723)

          Result = FAILURE
          jmhsieh :
          Files :

          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java
          Show
          Hudson added a comment - Integrated in HBase-0.92 #338 (See https://builds.apache.org/job/HBase-0.92/338/ ) HBASE-5128 Addendum adds two missing new files (Revision 1304723) Result = FAILURE jmhsieh : Files : /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94 #51 (See https://builds.apache.org/job/HBase-0.94/51/)
          HBASE-5128 Addendum adds two missing new files (Revision 1304722)

          Result = FAILURE
          jmhsieh :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java
          Show
          Hudson added a comment - Integrated in HBase-0.94 #51 (See https://builds.apache.org/job/HBase-0.94/51/ ) HBASE-5128 Addendum adds two missing new files (Revision 1304722) Result = FAILURE jmhsieh : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java
          Hide
          Jonathan Hsieh added a comment -

          Thanks Ted. I've updated the rest. Will do better next time.

          Show
          Jonathan Hsieh added a comment - Thanks Ted. I've updated the rest. Will do better next time.
          Hide
          Ted Yu added a comment -

          Applied addendum to trunk so that Hadoop QA can function.

          Show
          Ted Yu added a comment - Applied addendum to trunk so that Hadoop QA can function.
          Hide
          Ted Yu added a comment -

          Addendum for trunk.
          Hadoop QA couldn't work when compilation is broken.

          Show
          Ted Yu added a comment - Addendum for trunk. Hadoop QA couldn't work when compilation is broken.
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94 #50 (See https://builds.apache.org/job/HBase-0.94/50/)
          HBASE-5128 [uber hbck] Online automated repair of table integrity and region consistency problems (Revision 1304666)

          Result = FAILURE
          jmhsieh :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java
          Show
          Hudson added a comment - Integrated in HBase-0.94 #50 (See https://builds.apache.org/job/HBase-0.94/50/ ) HBASE-5128 [uber hbck] Online automated repair of table integrity and region consistency problems (Revision 1304666) Result = FAILURE jmhsieh : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/HMaster.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.92 #337 (See https://builds.apache.org/job/HBase-0.92/337/)
          HBASE-5128 [uber hbck] Online automated repair of table integrity and region consistency problems (Revision 1304667)

          Result = FAILURE
          jmhsieh :
          Files :

          • /hbase/branches/0.92/CHANGES.txt
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java
          • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java
          • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java
          • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java
          • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java
          • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java
          • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java
          Show
          Hudson added a comment - Integrated in HBase-0.92 #337 (See https://builds.apache.org/job/HBase-0.92/337/ ) HBASE-5128 [uber hbck] Online automated repair of table integrity and region consistency problems (Revision 1304667) Result = FAILURE jmhsieh : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/HMaster.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-03-22 06:33:20, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 435

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line435>

          >

          > I think master.synchronousBalanceSwitch() is better candidate for this action.

          jmhsieh wrote:

          I agree, but since this method is only in the trunk/0.94 branches I'll file a follow on issue for this.

          HBASE-5630

          On 2012-03-22 06:33:20, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 554

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line554>

          >

          > Can we do this in the current JIRA ?

          >

          > Why do we need to reload for every type of fix ?

          jmhsieh wrote:

          I'd rather do it in a follow on issue. Correctness first, then performance. This patch is massive already.

          HBASE-5628

          On 2012-03-22 06:33:20, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 702

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line702>

          >

          > Can hbaseRoot.getFileSystem() be saved in a variable outside the loop ?

          jmhsieh wrote:

          The guard makes this only executed once per table. In the 0.90 version, the way I got a TableInfo was via a method call to get the HRegionInfo/HTableDescription and I actually checked for inconsistencies there – in 0.92+ there is only the .tableinfo file so this consistency check isn't relevant (though there should be another .tableinfo checks specific for 0.92+ which I can file as a follow on.)

          HBASE-5631

          • jmhsieh

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4280/#review6208
          -----------------------------------------------------------

          On 2012-03-21 23:24:13, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4280/

          -----------------------------------------------------------

          (Updated 2012-03-21 23:24:13)

          Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.

          Summary

          -------

          This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.

          1) No trackHTD method needed since we can read from the file system.

          2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.

          3) Fixed comparator in HRegionInfo

          4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.

          I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.

          This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548

          Diff: https://reviews.apache.org/r/4280/diff

          Testing

          -------

          Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-03-22 06:33:20, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 435 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line435 > > > I think master.synchronousBalanceSwitch() is better candidate for this action. jmhsieh wrote: I agree, but since this method is only in the trunk/0.94 branches I'll file a follow on issue for this. HBASE-5630 On 2012-03-22 06:33:20, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 554 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line554 > > > Can we do this in the current JIRA ? > > Why do we need to reload for every type of fix ? jmhsieh wrote: I'd rather do it in a follow on issue. Correctness first, then performance. This patch is massive already. HBASE-5628 On 2012-03-22 06:33:20, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 702 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line702 > > > Can hbaseRoot.getFileSystem() be saved in a variable outside the loop ? jmhsieh wrote: The guard makes this only executed once per table. In the 0.90 version, the way I got a TableInfo was via a method call to get the HRegionInfo/HTableDescription and I actually checked for inconsistencies there – in 0.92+ there is only the .tableinfo file so this consistency check isn't relevant (though there should be another .tableinfo checks specific for 0.92+ which I can file as a follow on.) HBASE-5631 jmhsieh ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/#review6208 ----------------------------------------------------------- On 2012-03-21 23:24:13, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/ ----------------------------------------------------------- (Updated 2012-03-21 23:24:13) Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl. Summary ------- This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences. 1) No trackHTD method needed since we can read from the file system. 2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables. 3) Fixed comparator in HRegionInfo 4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator. I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared. This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable. This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 Diff: https://reviews.apache.org/r/4280/diff Testing ------- Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-03-22 07:11:34, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1076

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1076>

          >

          > I think we should handle RejectedExecutionException and re-submit the item.

          jmhsieh wrote:

          Follow on issue. Failing hard here is probably good, and the change here was just more logging.

          HBASE-5632

          • jmhsieh

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4280/#review6213
          -----------------------------------------------------------

          On 2012-03-21 23:24:13, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4280/

          -----------------------------------------------------------

          (Updated 2012-03-21 23:24:13)

          Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.

          Summary

          -------

          This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.

          1) No trackHTD method needed since we can read from the file system.

          2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.

          3) Fixed comparator in HRegionInfo

          4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.

          I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.

          This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548

          Diff: https://reviews.apache.org/r/4280/diff

          Testing

          -------

          Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-03-22 07:11:34, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1076 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1076 > > > I think we should handle RejectedExecutionException and re-submit the item. jmhsieh wrote: Follow on issue. Failing hard here is probably good, and the change here was just more logging. HBASE-5632 jmhsieh ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/#review6213 ----------------------------------------------------------- On 2012-03-21 23:24:13, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/ ----------------------------------------------------------- (Updated 2012-03-21 23:24:13) Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl. Summary ------- This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences. 1) No trackHTD method needed since we can read from the file system. 2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables. 3) Fixed comparator in HRegionInfo 4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator. I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared. This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable. This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 Diff: https://reviews.apache.org/r/4280/diff Testing ------- Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-03-11 14:37:01, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2879

          > <https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line2879>

          >

          > Can we name this option fixRegionHolesOnHdfs ?

          > It would be better to note which options can be run with cluster offline.

          jmhsieh wrote:

          at the moment, hbck can only be run while hbase is online. This has not been unified with OfflineMetaRebuild yet.

          HBASE-5629

          • jmhsieh

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4280/#review5826
          -----------------------------------------------------------

          On 2012-03-21 23:24:13, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4280/

          -----------------------------------------------------------

          (Updated 2012-03-21 23:24:13)

          Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.

          Summary

          -------

          This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.

          1) No trackHTD method needed since we can read from the file system.

          2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.

          3) Fixed comparator in HRegionInfo

          4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.

          I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.

          This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548

          Diff: https://reviews.apache.org/r/4280/diff

          Testing

          -------

          Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-03-11 14:37:01, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2879 > < https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line2879 > > > Can we name this option fixRegionHolesOnHdfs ? > It would be better to note which options can be run with cluster offline. jmhsieh wrote: at the moment, hbck can only be run while hbase is online. This has not been unified with OfflineMetaRebuild yet. HBASE-5629 jmhsieh ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/#review5826 ----------------------------------------------------------- On 2012-03-21 23:24:13, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/ ----------------------------------------------------------- (Updated 2012-03-21 23:24:13) Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl. Summary ------- This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences. 1) No trackHTD method needed since we can read from the file system. 2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables. 3) Fixed comparator in HRegionInfo 4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator. I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared. This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable. This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 Diff: https://reviews.apache.org/r/4280/diff Testing ------- Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-03-22 18:10:45, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1771

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1771>

          >

          > I suggest renaming holeStart as startRow and renaming holeStop as stopRow.

          > Then you don't need the comment on 1700.

          renamed to holeStartKey and holeStopKey to make it clear. Add log message to inform user about action.

          On 2012-03-22 18:10:45, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1812

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1812>

          >

          > Should include maxMerge in the log.

          great suggestion. done.

          On 2012-03-22 18:10:45, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1849

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1849>

          >

          > I wonder whether we should bail if there have been two IOE's, one on 1759 and one here.

          This is soft state (doesn't modifiy the file system) so I'm less adamant about hard stopping when these conditions a reached.

          On 2012-03-22 18:10:45, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1863

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1863>

          >

          > 'Creating' -> 'Created'

          done

          On 2012-03-22 18:10:45, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1864

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1864>

          >

          > Are newRegion and region representing the same entity ?

          Good catch, changed to:

          LOG.info("Created new empty container region: " +
          newRegion + " to contain regions: " + Joiner.on(",").join(overlap));

          On 2012-03-22 18:10:45, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1872

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1872>

          >

          > If mergeRegionDirs() returns 0 (or less), should we note (partial) failure in merging ?

          hm.. it is possible to have multiple empty overlapping regions merged that do no HFile moves, which would still count as a fix. I've changed where the return value is added to just increment HBaseFsck's fixes count by 1.

          On 2012-03-22 18:10:45, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2159

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line2159>

          >

          > Should say 'unable to get regions from master' or something similar

          "Fatal error: unable to get root region location. Exiting..."

          On 2012-03-22 18:10:45, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2298

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line2298>

          >

          > Please remove this.

          done

          On 2012-03-22 18:10:45, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2299

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line2299>

          >

          > 'with not' -> 'without'

          > Should also include some info on the entry.

          "with no"

          On 2012-03-22 18:10:45, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2311

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line2311>

          >

          > Please remove this.

          done

          On 2012-03-22 18:10:45, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2821

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line2821>

          >

          > Typo: maximum

          k

          On 2012-03-22 18:10:45, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2705

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line2705>

          >

          > Nit: name hdfsRegiondirModtime as hdfsRegionDirModTime

          k

          • jmhsieh

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4280/#review6229
          -----------------------------------------------------------

          On 2012-03-21 23:24:13, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4280/

          -----------------------------------------------------------

          (Updated 2012-03-21 23:24:13)

          Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.

          Summary

          -------

          This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.

          1) No trackHTD method needed since we can read from the file system.

          2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.

          3) Fixed comparator in HRegionInfo

          4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.

          I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.

          This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548

          Diff: https://reviews.apache.org/r/4280/diff

          Testing

          -------

          Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-03-22 18:10:45, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1771 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1771 > > > I suggest renaming holeStart as startRow and renaming holeStop as stopRow. > Then you don't need the comment on 1700. renamed to holeStartKey and holeStopKey to make it clear. Add log message to inform user about action. On 2012-03-22 18:10:45, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1812 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1812 > > > Should include maxMerge in the log. great suggestion. done. On 2012-03-22 18:10:45, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1849 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1849 > > > I wonder whether we should bail if there have been two IOE's, one on 1759 and one here. This is soft state (doesn't modifiy the file system) so I'm less adamant about hard stopping when these conditions a reached. On 2012-03-22 18:10:45, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1863 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1863 > > > 'Creating' -> 'Created' done On 2012-03-22 18:10:45, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1864 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1864 > > > Are newRegion and region representing the same entity ? Good catch, changed to: LOG.info("Created new empty container region: " + newRegion + " to contain regions: " + Joiner.on(",").join(overlap)); On 2012-03-22 18:10:45, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1872 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1872 > > > If mergeRegionDirs() returns 0 (or less), should we note (partial) failure in merging ? hm.. it is possible to have multiple empty overlapping regions merged that do no HFile moves, which would still count as a fix. I've changed where the return value is added to just increment HBaseFsck's fixes count by 1. On 2012-03-22 18:10:45, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2159 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line2159 > > > Should say 'unable to get regions from master' or something similar "Fatal error: unable to get root region location. Exiting..." On 2012-03-22 18:10:45, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2298 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line2298 > > > Please remove this. done On 2012-03-22 18:10:45, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2299 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line2299 > > > 'with not' -> 'without' > Should also include some info on the entry. "with no" On 2012-03-22 18:10:45, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2311 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line2311 > > > Please remove this. done On 2012-03-22 18:10:45, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2821 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line2821 > > > Typo: maximum k On 2012-03-22 18:10:45, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2705 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line2705 > > > Nit: name hdfsRegiondirModtime as hdfsRegionDirModTime k jmhsieh ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/#review6229 ----------------------------------------------------------- On 2012-03-21 23:24:13, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/ ----------------------------------------------------------- (Updated 2012-03-21 23:24:13) Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl. Summary ------- This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences. 1) No trackHTD method needed since we can read from the file system. 2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables. 3) Fixed comparator in HRegionInfo 4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator. I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared. This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable. This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 Diff: https://reviews.apache.org/r/4280/diff Testing ------- Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-03-23 19:53:18, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 191

          > <https://reviews.apache.org/r/3435/diff/6/?file=95002#file95002line191>

          >

          > Why TreeMap it if its encoded region names? These are hashes so no value sorting them?

          I think you are right. The sorting is necessary it the range managing data structure but not here. I'll file a follow up for this and the following issue.

          On 2012-03-23 19:53:18, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 344

          > <https://reviews.apache.org/r/3435/diff/6/?file=95002#file95002line344>

          >

          > This almost recommends that HBaseFsck becomes a shell that does nothing but instantiate another class that does acual fixup. clearState in that case would throw away the instantiated 'Fsck' class and create a completely new instance rather than zero out data members as this does. For the future.

          I'll file a follow on jira for that too.

          • jmhsieh

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3435/#review6304
          -----------------------------------------------------------

          On 2012-03-23 16:13:50, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/3435/

          -----------------------------------------------------------

          (Updated 2012-03-23 16:13:50)

          Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.

          Summary

          -------

          This should nearly be to ready for integration. This has the same control flow as the trunk/0.92/0.94 versions but has a few differences.

          - It needs to track HTableDescritors instead of reading them from the file system.

          - It uses a different HBaseFsckRepair.forceOfflineInZK method – which for some reason means we don't need HBASE-5563.

          - Uses HServerAddress instead of ServerName

          This version is close to what we've used on production clusters.

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 1a4f7f1

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548

          Diff: https://reviews.apache.org/r/3435/diff

          Testing

          -------

          All TestHBaseFsck unit tests pass. Currently running full suite.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-03-23 19:53:18, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 191 > < https://reviews.apache.org/r/3435/diff/6/?file=95002#file95002line191 > > > Why TreeMap it if its encoded region names? These are hashes so no value sorting them? I think you are right. The sorting is necessary it the range managing data structure but not here. I'll file a follow up for this and the following issue. On 2012-03-23 19:53:18, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 344 > < https://reviews.apache.org/r/3435/diff/6/?file=95002#file95002line344 > > > This almost recommends that HBaseFsck becomes a shell that does nothing but instantiate another class that does acual fixup. clearState in that case would throw away the instantiated 'Fsck' class and create a completely new instance rather than zero out data members as this does. For the future. I'll file a follow on jira for that too. jmhsieh ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3435/#review6304 ----------------------------------------------------------- On 2012-03-23 16:13:50, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3435/ ----------------------------------------------------------- (Updated 2012-03-23 16:13:50) Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans. Summary ------- This should nearly be to ready for integration. This has the same control flow as the trunk/0.92/0.94 versions but has a few differences. - It needs to track HTableDescritors instead of reading them from the file system. - It uses a different HBaseFsckRepair.forceOfflineInZK method – which for some reason means we don't need HBASE-5563 . - Uses HServerAddress instead of ServerName This version is close to what we've used on production clusters. This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 1a4f7f1 src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 Diff: https://reviews.apache.org/r/3435/diff Testing ------- All TestHBaseFsck unit tests pass. Currently running full suite. Thanks, jmhsieh
          Hide
          Jonathan Hsieh added a comment -

          Thanks for all the reviews LarH, Stack and Ted! This has been committed to 0.90/0.92/0.94/trunk branches.

          Show
          Jonathan Hsieh added a comment - Thanks for all the reviews LarH, Stack and Ted! This has been committed to 0.90/0.92/0.94/trunk branches.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12519754/hbase-5128-0.90-v4.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 15 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1289//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519754/hbase-5128-0.90-v4.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1289//console This message is automatically generated.
          Hide
          Jonathan Hsieh added a comment -

          Issue rename to be more concise.

          Show
          Jonathan Hsieh added a comment - Issue rename to be more concise.
          Hide
          Jonathan Hsieh added a comment -

          Cleaned up 0.90 version. Requires HBASE-5563 to pass consistently.

          Show
          Jonathan Hsieh added a comment - Cleaned up 0.90 version. Requires HBASE-5563 to pass consistently.
          Hide
          Jonathan Hsieh added a comment -

          Full suites of 0.92/0.94/trunk versions pass.

          Looks like the 0.90 version has always had flakey tests for the same reason the 0.92/0.94/trunk versions. It is related to assignment and HBASE-5563, but just didn't happen as often as in the 0.92/0.94/trunk version (2/10 runs vs 5/10 runs). HBASE-5563 would not be available on older clusters but won't cause permanent problems if this updated hbck was used against a version that did not have the improvement.

          Let's say using this hbck against an older 0.90-based cluster that didn't have HBASE-5563 or HBASE-5589. The side effect is that you may have to run 'hbck -fixAssignments' an extra time to fix region assignment/deployment problems after disabling and deleting a table that has been fixed, or alternately, you may need to bounce the HMaster or affected RegionServer to clean up this transient state.

          I currently have a 0.90 version of HBASE-5563 (attached there), and an updated HBASE-5128 for 0.90 that is as close as possible to the 0.92/0.94/trunk versions as possible.

          Show
          Jonathan Hsieh added a comment - Full suites of 0.92/0.94/trunk versions pass. Looks like the 0.90 version has always had flakey tests for the same reason the 0.92/0.94/trunk versions. It is related to assignment and HBASE-5563 , but just didn't happen as often as in the 0.92/0.94/trunk version (2/10 runs vs 5/10 runs). HBASE-5563 would not be available on older clusters but won't cause permanent problems if this updated hbck was used against a version that did not have the improvement. Let's say using this hbck against an older 0.90-based cluster that didn't have HBASE-5563 or HBASE-5589 . The side effect is that you may have to run 'hbck -fixAssignments' an extra time to fix region assignment/deployment problems after disabling and deleting a table that has been fixed, or alternately, you may need to bounce the HMaster or affected RegionServer to clean up this transient state. I currently have a 0.90 version of HBASE-5563 (attached there), and an updated HBASE-5128 for 0.90 that is as close as possible to the 0.92/0.94/trunk versions as possible.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3435/#review6304
          -----------------------------------------------------------

          Ship it!

          Went through a third. Minors below that should not hold up commit. Get it in!!! Great stuff Jon.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/3435/#comment13682>

          Good doc (though I've said this previous, its still good doc)

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/3435/#comment13683>

          Why TreeMap it if its encoded region names? These are hashes so no value sorting them?

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/3435/#comment13684>

          Ditto on sort here? Why sort by table name? How does sort prevent dupes?

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/3435/#comment13685>

          This almost recommends that HBaseFsck becomes a shell that does nothing but instantiate another class that does acual fixup. clearState in that case would throw away the instantiated 'Fsck' class and create a completely new instance rather than zero out data members as this does. For the future.

          • Michael

          On 2012-03-23 16:13:50, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/3435/

          -----------------------------------------------------------

          (Updated 2012-03-23 16:13:50)

          Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.

          Summary

          -------

          This should nearly be to ready for integration. This has the same control flow as the trunk/0.92/0.94 versions but has a few differences.

          - It needs to track HTableDescritors instead of reading them from the file system.

          - It uses a different HBaseFsckRepair.forceOfflineInZK method – which for some reason means we don't need HBASE-5563.

          - Uses HServerAddress instead of ServerName

          This version is close to what we've used on production clusters.

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 1a4f7f1

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548

          Diff: https://reviews.apache.org/r/3435/diff

          Testing

          -------

          All TestHBaseFsck unit tests pass. Currently running full suite.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3435/#review6304 ----------------------------------------------------------- Ship it! Went through a third. Minors below that should not hold up commit. Get it in!!! Great stuff Jon. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/3435/#comment13682 > Good doc (though I've said this previous, its still good doc) src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/3435/#comment13683 > Why TreeMap it if its encoded region names? These are hashes so no value sorting them? src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/3435/#comment13684 > Ditto on sort here? Why sort by table name? How does sort prevent dupes? src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/3435/#comment13685 > This almost recommends that HBaseFsck becomes a shell that does nothing but instantiate another class that does acual fixup. clearState in that case would throw away the instantiated 'Fsck' class and create a completely new instance rather than zero out data members as this does. For the future. Michael On 2012-03-23 16:13:50, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3435/ ----------------------------------------------------------- (Updated 2012-03-23 16:13:50) Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans. Summary ------- This should nearly be to ready for integration. This has the same control flow as the trunk/0.92/0.94 versions but has a few differences. - It needs to track HTableDescritors instead of reading them from the file system. - It uses a different HBaseFsckRepair.forceOfflineInZK method – which for some reason means we don't need HBASE-5563 . - Uses HServerAddress instead of ServerName This version is close to what we've used on production clusters. This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 1a4f7f1 src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 Diff: https://reviews.apache.org/r/3435/diff Testing ------- All TestHBaseFsck unit tests pass. Currently running full suite. Thanks, jmhsieh
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12519649/hbase-5128-v4.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 21 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.mapreduce.TestImportTsv
          org.apache.hadoop.hbase.mapred.TestTableMapReduce
          org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1276//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1276//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1276//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519649/hbase-5128-v4.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 21 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1276//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1276//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1276//console This message is automatically generated.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3435/
          -----------------------------------------------------------

          (Updated 2012-03-23 16:13:50.054043)

          Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.

          Changes
          -------

          Addresses a few last concerns and does some arcanist and findbugs related tweaks.

          Summary
          -------

          This should nearly be to ready for integration. This has the same control flow as the trunk/0.92/0.94 versions but has a few differences.

          • It needs to track HTableDescritors instead of reading them from the file system.
          • It uses a different HBaseFsckRepair.forceOfflineInZK method – which for some reason means we don't need HBASE-5563.
          • Uses HServerAddress instead of ServerName

          This version is close to what we've used on production clusters.

          This addresses bug HBASE-5128.
          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 1a4f7f1
          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4
          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10
          src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f
          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65
          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c
          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b
          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02
          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d
          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1
          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548

          Diff: https://reviews.apache.org/r/3435/diff

          Testing
          -------

          All TestHBaseFsck unit tests pass. Currently running full suite.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3435/ ----------------------------------------------------------- (Updated 2012-03-23 16:13:50.054043) Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans. Changes ------- Addresses a few last concerns and does some arcanist and findbugs related tweaks. Summary ------- This should nearly be to ready for integration. This has the same control flow as the trunk/0.92/0.94 versions but has a few differences. It needs to track HTableDescritors instead of reading them from the file system. It uses a different HBaseFsckRepair.forceOfflineInZK method – which for some reason means we don't need HBASE-5563 . Uses HServerAddress instead of ServerName This version is close to what we've used on production clusters. This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs (updated) src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 1a4f7f1 src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 Diff: https://reviews.apache.org/r/3435/diff Testing ------- All TestHBaseFsck unit tests pass. Currently running full suite. Thanks, jmhsieh
          Hide
          Jonathan Hsieh added a comment -

          Porting to changes to 0.90 is causing some test flakyness in that version. My plan is to work these out (there are more constraints there – need to figure out why they flake, need to avoid a master-side HBASE-5563 change, and figure out the ramifications. I plan on opening a new issue to back port this patch to 0.90. While trunk/0.94/0.92 versions are very similar, 0.90 has several differences.

          Show
          Jonathan Hsieh added a comment - Porting to changes to 0.90 is causing some test flakyness in that version. My plan is to work these out (there are more constraints there – need to figure out why they flake, need to avoid a master-side HBASE-5563 change, and figure out the ramifications. I plan on opening a new issue to back port this patch to 0.90. While trunk/0.94/0.92 versions are very similar, 0.90 has several differences.
          Hide
          Jonathan Hsieh added a comment -

          Updated to address ted's last concern, arcanist fixes, and a handful of findbug fixes.

          Show
          Jonathan Hsieh added a comment - Updated to address ted's last concern, arcanist fixes, and a handful of findbug fixes.
          Hide
          fulin wang added a comment -

          1.NOT_IN_META_OR_DEPLOYED
          handler.handleHoleInRegionChain(key, holeStopKey);

          NOT_IN_META
          HBaseFsckRepair.fixMetaHoleOnline(conf, hbi.getHdfsHRI());

          I think that you should check the region file of table the hole and the region in the hole, you can create region for this hole.
          otherwise you should not create region.
          There is scenarios you shou think: the region of table is good or this region is junk file.

          2.FIRST_REGION_STARTKEY_NOT_EMPTY and HOLE_IN_REGION_CHAIN
          I think when there is a type of error you can create empty region for this hole.
          if there is another error, another error you handle the first.

          Show
          fulin wang added a comment - 1.NOT_IN_META_OR_DEPLOYED handler.handleHoleInRegionChain(key, holeStopKey); NOT_IN_META HBaseFsckRepair.fixMetaHoleOnline(conf, hbi.getHdfsHRI()); I think that you should check the region file of table the hole and the region in the hole, you can create region for this hole. otherwise you should not create region. There is scenarios you shou think: the region of table is good or this region is junk file. 2.FIRST_REGION_STARTKEY_NOT_EMPTY and HOLE_IN_REGION_CHAIN I think when there is a type of error you can create empty region for this hole. if there is another error, another error you handle the first.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12519538/hbase-5128-v3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 21 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 7 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1266//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1266//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1266//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519538/hbase-5128-v3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 21 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 7 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1266//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1266//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1266//console This message is automatically generated.
          Hide
          Jonathan Hsieh added a comment -

          @Ted

          w.r.t. fixDupeAssignment(), can we call it closeAndOfflineRegion() or something similar ?

          This doesn't seem to capture the fact that there are multiple places where the region needs to be closed. Maybe fixMultiAssignment?

          Show
          Jonathan Hsieh added a comment - @Ted w.r.t. fixDupeAssignment(), can we call it closeAndOfflineRegion() or something similar ? This doesn't seem to capture the fact that there are multiple places where the region needs to be closed. Maybe fixMultiAssignment?
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3435/
          -----------------------------------------------------------

          (Updated 2012-03-22 23:18:21.689735)

          Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.

          Changes
          -------

          Updated from ted's mega review. Hbck and OfflineMetaRebuild tests pass.

          Plan on committing and filing several follow up jiras if this passes hadoop qa robot.

          Summary
          -------

          This should nearly be to ready for integration. This has the same control flow as the trunk/0.92/0.94 versions but has a few differences.

          • It needs to track HTableDescritors instead of reading them from the file system.
          • It uses a different HBaseFsckRepair.forceOfflineInZK method – which for some reason means we don't need HBASE-5563.
          • Uses HServerAddress instead of ServerName

          This version is close to what we've used on production clusters.

          This addresses bug HBASE-5128.
          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4
          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10
          src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f
          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65
          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c
          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b
          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02
          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d
          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1
          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548

          Diff: https://reviews.apache.org/r/3435/diff

          Testing
          -------

          All TestHBaseFsck unit tests pass. Currently running full suite.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3435/ ----------------------------------------------------------- (Updated 2012-03-22 23:18:21.689735) Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans. Changes ------- Updated from ted's mega review. Hbck and OfflineMetaRebuild tests pass. Plan on committing and filing several follow up jiras if this passes hadoop qa robot. Summary ------- This should nearly be to ready for integration. This has the same control flow as the trunk/0.92/0.94 versions but has a few differences. It needs to track HTableDescritors instead of reading them from the file system. It uses a different HBaseFsckRepair.forceOfflineInZK method – which for some reason means we don't need HBASE-5563 . Uses HServerAddress instead of ServerName This version is close to what we've used on production clusters. This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs (updated) src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 Diff: https://reviews.apache.org/r/3435/diff Testing ------- All TestHBaseFsck unit tests pass. Currently running full suite. Thanks, jmhsieh
          Hide
          Ted Yu added a comment -

          Thanks for the mega review

          The other way around: after going through the code in detail, I can see your effort in this tool. Thank you on behalf of hbase users.

          w.r.t. fixDupeAssignment(), can we call it closeAndOfflineRegion() or something similar ?

          My comments are just advice. As long as a few bugs are addressed (in patches for trunk and .92), I am fine with follow on work.

          Show
          Ted Yu added a comment - Thanks for the mega review The other way around: after going through the code in detail, I can see your effort in this tool. Thank you on behalf of hbase users. w.r.t. fixDupeAssignment(), can we call it closeAndOfflineRegion() or something similar ? My comments are just advice. As long as a few bugs are addressed (in patches for trunk and .92), I am fine with follow on work.
          Hide
          Jonathan Hsieh added a comment -

          @Ted,

          Thanks for the mega review – I know it must have taken a while. This set of patches probably should have been broken up but has had a funny history ports going back and forth between 0.90, 0.92 and a lot of hacks while firefighting mode to get it working well enough.

          I'll getting tests passing again and deal with arcanist nits. After that do you mind if I start filing the new set of follow on jiras and then commit? There is plenty of follow on work but plenty of goodness in here too.

          Show
          Jonathan Hsieh added a comment - @Ted, Thanks for the mega review – I know it must have taken a while. This set of patches probably should have been broken up but has had a funny history ports going back and forth between 0.90, 0.92 and a lot of hacks while firefighting mode to get it working well enough. I'll getting tests passing again and deal with arcanist nits. After that do you mind if I start filing the new set of follow on jiras and then commit? There is plenty of follow on work but plenty of goodness in here too.
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-03-22 19:00:46, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1771

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1771>

          >

          > Is @Override missing ?

          yeah, i missed all of them.

          On 2012-03-22 19:00:46, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 72

          > <https://reviews.apache.org/r/4280/diff/2/?file=94414#file94414line72>

          >

          > Renaming this method is desirable as I mentioned earlier.

          Suggestion?

          On 2012-03-22 19:00:46, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 92

          > <https://reviews.apache.org/r/4280/diff/2/?file=94414#file94414line92>

          >

          > Typo: assume

          "This assumes that info is in META."

          On 2012-03-22 19:00:46, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 99

          > <https://reviews.apache.org/r/4280/diff/2/?file=94414#file94414line99>

          >

          > This method is called in two places where HBaseAdmin is available.

          >

          > Please change the method signature to avoid creating HBaseAdmin every time.

          thanks. This was something missed when porting back and forth between 0.90 and 0.92.

          On 2012-03-22 19:00:46, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 152

          > <https://reviews.apache.org/r/4280/diff/2/?file=94414#file94414line152>

          >

          > Why ?

          removed

          On 2012-03-22 19:00:46, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 161

          > <https://reviews.apache.org/r/4280/diff/2/?file=94414#file94414line161>

          >

          > success is no longer set in this method.

          > This can be removed.

          done (likely from 0.90 version)

          On 2012-03-22 19:00:46, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 185

          > <https://reviews.apache.org/r/4280/diff/2/?file=94414#file94414line185>

          >

          > Shall we return directly here ?

          > The new exception would be caught at line 182

          yes.

          On 2012-03-22 19:00:46, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 215

          > <https://reviews.apache.org/r/4280/diff/2/?file=94414#file94414line215>

          >

          > Please use this method in the three places of HBaseFsck I mentioned.

          done

          On 2012-03-22 19:00:46, Ted Yu wrote:

          > src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java, line 274

          > <https://reviews.apache.org/r/4280/diff/2/?file=94419#file94419line274>

          >

          > Can we reuse the method from HBaseFsck ?

          done

          On 2012-03-22 19:00:46, Ted Yu wrote:

          > src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java, line 1217

          > <https://reviews.apache.org/r/4280/diff/2/?file=94418#file94418line1217>

          >

          > This check was added because of failed test ?

          This is an unhandled case. In one of the patches I had some extra ScrubMeta and DumpMeta methods that would clean this up – this is follow on work for another jira.

          On 2012-03-22 19:00:46, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java, line 30

          > <https://reviews.apache.org/r/4280/diff/2/?file=94417#file94417line30>

          >

          > Can this class be package-private ?

          not yet – hbck needs to be moved from o.a.h.h.util to o.a.h.h.util.hbck for this to be possible.

          On 2012-03-22 19:00:46, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java, line 63

          > <https://reviews.apache.org/r/4280/diff/2/?file=94416#file94416line63>

          >

          > Javadoc for parameters.

          Updated in interface, added:

          /**

          On 2012-03-22 19:00:46, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java, line 83

          > <https://reviews.apache.org/r/4280/diff/2/?file=94416#file94416line83>

          >

          > Javadoc for parameters.

          Updated in interface, added:

          /**

          • {@inheritDoc}

            */
            (and for the other cases).

          On 2012-03-22 19:00:46, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 112

          > <https://reviews.apache.org/r/4280/diff/2/?file=94414#file94414line112>

          >

          > Typo: handleHBCK

          this comment is not relevent to this branch anymore, removing.

          On 2012-03-22 19:00:46, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 122

          > <https://reviews.apache.org/r/4280/diff/2/?file=94414#file94414line122>

          >

          > This is called in a loop in checkMetaRegion().

          > It would be nice for this method to take a list of regions and wait for them to come out of RIT.

          This was a cause of a bunch of flakyness or 5 second sleeps in the older hbck so I updated this.

          On 2012-03-22 19:00:46, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 207

          > <https://reviews.apache.org/r/4280/diff/2/?file=94414#file94414line207>

          >

          > It would be nice to cache meta for subsequent calls.

          > Can be done in another JIRA.

          follow up jira.

          • jmhsieh

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4280/#review6239
          -----------------------------------------------------------

          On 2012-03-21 23:24:13, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4280/

          -----------------------------------------------------------

          (Updated 2012-03-21 23:24:13)

          Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.

          Summary

          -------

          This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.

          1) No trackHTD method needed since we can read from the file system.

          2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.

          3) Fixed comparator in HRegionInfo

          4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.

          I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.

          This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548

          Diff: https://reviews.apache.org/r/4280/diff

          Testing

          -------

          Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-03-22 19:00:46, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1771 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1771 > > > Is @Override missing ? yeah, i missed all of them. On 2012-03-22 19:00:46, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 72 > < https://reviews.apache.org/r/4280/diff/2/?file=94414#file94414line72 > > > Renaming this method is desirable as I mentioned earlier. Suggestion? On 2012-03-22 19:00:46, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 92 > < https://reviews.apache.org/r/4280/diff/2/?file=94414#file94414line92 > > > Typo: assume "This assumes that info is in META." On 2012-03-22 19:00:46, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 99 > < https://reviews.apache.org/r/4280/diff/2/?file=94414#file94414line99 > > > This method is called in two places where HBaseAdmin is available. > > Please change the method signature to avoid creating HBaseAdmin every time. thanks. This was something missed when porting back and forth between 0.90 and 0.92. On 2012-03-22 19:00:46, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 152 > < https://reviews.apache.org/r/4280/diff/2/?file=94414#file94414line152 > > > Why ? removed On 2012-03-22 19:00:46, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 161 > < https://reviews.apache.org/r/4280/diff/2/?file=94414#file94414line161 > > > success is no longer set in this method. > This can be removed. done (likely from 0.90 version) On 2012-03-22 19:00:46, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 185 > < https://reviews.apache.org/r/4280/diff/2/?file=94414#file94414line185 > > > Shall we return directly here ? > The new exception would be caught at line 182 yes. On 2012-03-22 19:00:46, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 215 > < https://reviews.apache.org/r/4280/diff/2/?file=94414#file94414line215 > > > Please use this method in the three places of HBaseFsck I mentioned. done On 2012-03-22 19:00:46, Ted Yu wrote: > src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java, line 274 > < https://reviews.apache.org/r/4280/diff/2/?file=94419#file94419line274 > > > Can we reuse the method from HBaseFsck ? done On 2012-03-22 19:00:46, Ted Yu wrote: > src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java, line 1217 > < https://reviews.apache.org/r/4280/diff/2/?file=94418#file94418line1217 > > > This check was added because of failed test ? This is an unhandled case. In one of the patches I had some extra ScrubMeta and DumpMeta methods that would clean this up – this is follow on work for another jira. On 2012-03-22 19:00:46, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java, line 30 > < https://reviews.apache.org/r/4280/diff/2/?file=94417#file94417line30 > > > Can this class be package-private ? not yet – hbck needs to be moved from o.a.h.h.util to o.a.h.h.util.hbck for this to be possible. On 2012-03-22 19:00:46, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java, line 63 > < https://reviews.apache.org/r/4280/diff/2/?file=94416#file94416line63 > > > Javadoc for parameters. Updated in interface, added: /** {@inheritDoc} */ bq. On 2012-03-22 19:00:46, Ted Yu wrote: bq. > src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java, line 71 bq. > < https://reviews.apache.org/r/4280/diff/2/?file=94416#file94416line71 > bq. > bq. > Javadoc for parameters. Updated in interface, added: /** * {@inheritDoc} */ On 2012-03-22 19:00:46, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java, line 83 > < https://reviews.apache.org/r/4280/diff/2/?file=94416#file94416line83 > > > Javadoc for parameters. Updated in interface, added: /** {@inheritDoc} */ (and for the other cases). On 2012-03-22 19:00:46, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 112 > < https://reviews.apache.org/r/4280/diff/2/?file=94414#file94414line112 > > > Typo: handleHBCK this comment is not relevent to this branch anymore, removing. On 2012-03-22 19:00:46, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 122 > < https://reviews.apache.org/r/4280/diff/2/?file=94414#file94414line122 > > > This is called in a loop in checkMetaRegion(). > It would be nice for this method to take a list of regions and wait for them to come out of RIT. This was a cause of a bunch of flakyness or 5 second sleeps in the older hbck so I updated this. On 2012-03-22 19:00:46, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 207 > < https://reviews.apache.org/r/4280/diff/2/?file=94414#file94414line207 > > > It would be nice to cache meta for subsequent calls. > Can be done in another JIRA. follow up jira. jmhsieh ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/#review6239 ----------------------------------------------------------- On 2012-03-21 23:24:13, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/ ----------------------------------------------------------- (Updated 2012-03-21 23:24:13) Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl. Summary ------- This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences. 1) No trackHTD method needed since we can read from the file system. 2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables. 3) Fixed comparator in HRegionInfo 4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator. I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared. This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable. This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 Diff: https://reviews.apache.org/r/4280/diff Testing ------- Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-03-22 06:33:20, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 489

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line489>

          >

          > Shall we continue with the remaining HFiles ?

          jmhsieh wrote:

          good point. changed break to continue.

          Actually, I think I'm going to change this back to break for the time being – fail fast and make the user do something about it until we get testing to make sure this recovery makes sense.

          • jmhsieh

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4280/#review6208
          -----------------------------------------------------------

          On 2012-03-21 23:24:13, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4280/

          -----------------------------------------------------------

          (Updated 2012-03-21 23:24:13)

          Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.

          Summary

          -------

          This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.

          1) No trackHTD method needed since we can read from the file system.

          2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.

          3) Fixed comparator in HRegionInfo

          4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.

          I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.

          This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548

          Diff: https://reviews.apache.org/r/4280/diff

          Testing

          -------

          Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-03-22 06:33:20, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 489 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line489 > > > Shall we continue with the remaining HFiles ? jmhsieh wrote: good point. changed break to continue. Actually, I think I'm going to change this back to break for the time being – fail fast and make the user do something about it until we get testing to make sure this recovery makes sense. jmhsieh ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/#review6208 ----------------------------------------------------------- On 2012-03-21 23:24:13, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/ ----------------------------------------------------------- (Updated 2012-03-21 23:24:13) Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl. Summary ------- This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences. 1) No trackHTD method needed since we can read from the file system. 2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables. 3) Fixed comparator in HRegionInfo 4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator. I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared. This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable. This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 Diff: https://reviews.apache.org/r/4280/diff Testing ------- Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-03-22 16:55:08, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1354

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1354>

          >

          > Please log some information about this region

          done

          On 2012-03-22 16:55:08, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1358

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1358>

          >

          > Redundant 'with'

          done

          On 2012-03-22 16:55:08, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1363

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1363>

          >

          > 'reassigned' -> 'reassign'

          done

          On 2012-03-22 16:55:08, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1375

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1375>

          >

          > It would be nice to create method so that this block of code and lines 1271-1289 can be unified.

          used in 3 places, sure.

          On 2012-03-22 16:55:08, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1410

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1410>

          >

          > Please remove unused code.

          done

          On 2012-03-22 16:55:08, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1436

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1436>

          >

          > (inMeta && inHdfs) appears more than once above, is there a chance that this case mistakenly falls into one of them ?

          This logic is unchanged from since before I started modifying hbck. I think those cases are handled in the healthy section.

          On 2012-03-22 16:55:08, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1530

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1530>

          >

          > checkRegionChain() is synchronous.

          > Can we share one TableIntegrityErrorHandler and set its tInfo in the loop ?

          I generally prefer a style where we set internal variables once during constructors and avoid using get/set methods since it makes the the lifecycle of the object simpler and makes it easier if we want to parallelize it in the future. Since this is the body of the loop it should be easy for the jvm to keep on the stack.

          On 2012-03-22 16:55:08, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1542

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1542>

          >

          > This would eclipse the global counter, right ?

          The return value is and added to the HbaseFsck object's fixed field at call sites. I'll rename and add comments about return value.

          On 2012-03-22 16:55:08, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1668

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1668>

          >

          > This class can be private

          ok

          On 2012-03-22 16:55:08, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1728

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1728>

          >

          > This class can be private

          ok

          On 2012-03-22 16:55:08, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1759

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1759>

          >

          > The following four lines are repeated 3 times in this class.

          > Refactor and create a new method.

          ok

          On 2012-03-22 16:55:08, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1445

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1445>

          >

          > Looking at fixDupeAssignment(), it really does region closing and offlining.

          > Can we give it a better name ?

          do you have a suggestion? In my mind it the higher level goal of the combined actions – it tries to fix regions that have assigned to many places. I could see fixMultiAssignment as a slight improvement.

          • jmhsieh

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4280/#review6224
          -----------------------------------------------------------

          On 2012-03-21 23:24:13, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4280/

          -----------------------------------------------------------

          (Updated 2012-03-21 23:24:13)

          Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.

          Summary

          -------

          This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.

          1) No trackHTD method needed since we can read from the file system.

          2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.

          3) Fixed comparator in HRegionInfo

          4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.

          I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.

          This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548

          Diff: https://reviews.apache.org/r/4280/diff

          Testing

          -------

          Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-03-22 16:55:08, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1354 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1354 > > > Please log some information about this region done On 2012-03-22 16:55:08, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1358 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1358 > > > Redundant 'with' done On 2012-03-22 16:55:08, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1363 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1363 > > > 'reassigned' -> 'reassign' done On 2012-03-22 16:55:08, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1375 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1375 > > > It would be nice to create method so that this block of code and lines 1271-1289 can be unified. used in 3 places, sure. On 2012-03-22 16:55:08, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1410 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1410 > > > Please remove unused code. done On 2012-03-22 16:55:08, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1436 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1436 > > > (inMeta && inHdfs) appears more than once above, is there a chance that this case mistakenly falls into one of them ? This logic is unchanged from since before I started modifying hbck. I think those cases are handled in the healthy section. On 2012-03-22 16:55:08, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1530 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1530 > > > checkRegionChain() is synchronous. > Can we share one TableIntegrityErrorHandler and set its tInfo in the loop ? I generally prefer a style where we set internal variables once during constructors and avoid using get/set methods since it makes the the lifecycle of the object simpler and makes it easier if we want to parallelize it in the future. Since this is the body of the loop it should be easy for the jvm to keep on the stack. On 2012-03-22 16:55:08, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1542 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1542 > > > This would eclipse the global counter, right ? The return value is and added to the HbaseFsck object's fixed field at call sites. I'll rename and add comments about return value. On 2012-03-22 16:55:08, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1668 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1668 > > > This class can be private ok On 2012-03-22 16:55:08, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1728 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1728 > > > This class can be private ok On 2012-03-22 16:55:08, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1759 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1759 > > > The following four lines are repeated 3 times in this class. > Refactor and create a new method. ok On 2012-03-22 16:55:08, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1445 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1445 > > > Looking at fixDupeAssignment(), it really does region closing and offlining. > Can we give it a better name ? do you have a suggestion? In my mind it the higher level goal of the combined actions – it tries to fix regions that have assigned to many places. I could see fixMultiAssignment as a slight improvement. jmhsieh ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/#review6224 ----------------------------------------------------------- On 2012-03-21 23:24:13, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/ ----------------------------------------------------------- (Updated 2012-03-21 23:24:13) Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl. Summary ------- This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences. 1) No trackHTD method needed since we can read from the file system. 2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables. 3) Fixed comparator in HRegionInfo 4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator. I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared. This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable. This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 Diff: https://reviews.apache.org/r/4280/diff Testing ------- Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4280/#review6254
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13525>

          Logging would show user there is progress.

          • Ted

          On 2012-03-21 23:24:13, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4280/

          -----------------------------------------------------------

          (Updated 2012-03-21 23:24:13)

          Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.

          Summary

          -------

          This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.

          1) No trackHTD method needed since we can read from the file system.

          2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.

          3) Fixed comparator in HRegionInfo

          4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.

          I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.

          This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548

          Diff: https://reviews.apache.org/r/4280/diff

          Testing

          -------

          Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/#review6254 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13525 > Logging would show user there is progress. Ted On 2012-03-21 23:24:13, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/ ----------------------------------------------------------- (Updated 2012-03-21 23:24:13) Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl. Summary ------- This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences. 1) No trackHTD method needed since we can read from the file system. 2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables. 3) Fixed comparator in HRegionInfo 4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator. I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared. This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable. This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 Diff: https://reviews.apache.org/r/4280/diff Testing ------- Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-03-22 07:11:34, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 948

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line948>

          >

          > Please check return value from delete() call.

          done

          On 2012-03-22 07:11:34, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1040

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1040>

          >

          > You renamed it to regionInfoMap, right ?

          yes

          On 2012-03-22 07:11:34, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1076

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1076>

          >

          > I think we should handle RejectedExecutionException and re-submit the item.

          Follow on issue. Failing hard here is probably good, and the change here was just more logging.

          On 2012-03-22 07:11:34, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1235

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1235>

          >

          > Shall we log something since these two calls may take some time.

          Do you mean between the two calls? The close silently method fails after a 120s timeout.

          On 2012-03-22 07:11:34, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1257

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1257>

          >

          > Please move this to line 1178

          sure

          On 2012-03-22 07:11:34, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1272

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1272>

          >

          > Indentation.

          sure

          • jmhsieh

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4280/#review6213
          -----------------------------------------------------------

          On 2012-03-21 23:24:13, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4280/

          -----------------------------------------------------------

          (Updated 2012-03-21 23:24:13)

          Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.

          Summary

          -------

          This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.

          1) No trackHTD method needed since we can read from the file system.

          2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.

          3) Fixed comparator in HRegionInfo

          4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.

          I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.

          This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548

          Diff: https://reviews.apache.org/r/4280/diff

          Testing

          -------

          Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-03-22 07:11:34, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 948 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line948 > > > Please check return value from delete() call. done On 2012-03-22 07:11:34, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1040 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1040 > > > You renamed it to regionInfoMap, right ? yes On 2012-03-22 07:11:34, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1076 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1076 > > > I think we should handle RejectedExecutionException and re-submit the item. Follow on issue. Failing hard here is probably good, and the change here was just more logging. On 2012-03-22 07:11:34, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1235 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1235 > > > Shall we log something since these two calls may take some time. Do you mean between the two calls? The close silently method fails after a 120s timeout. On 2012-03-22 07:11:34, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1257 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1257 > > > Please move this to line 1178 sure On 2012-03-22 07:11:34, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1272 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line1272 > > > Indentation. sure jmhsieh ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/#review6213 ----------------------------------------------------------- On 2012-03-21 23:24:13, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/ ----------------------------------------------------------- (Updated 2012-03-21 23:24:13) Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl. Summary ------- This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences. 1) No trackHTD method needed since we can read from the file system. 2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables. 3) Fixed comparator in HRegionInfo 4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator. I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared. This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable. This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 Diff: https://reviews.apache.org/r/4280/diff Testing ------- Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-03-22 06:33:20, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 554

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line554>

          >

          > Can we do this in the current JIRA ?

          >

          > Why do we need to reload for every type of fix ?

          I'd rather do it in a follow on issue. Correctness first, then performance. This patch is massive already.

          On 2012-03-22 06:33:20, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 404

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line404>

          >

          > Should be 'what are online'

          "get regions according to what is online on each RegionServer"

          On 2012-03-22 06:33:20, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 418

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line418>

          >

          > checkAndRestoreConsistency() would be a better name.

          every other variable is fix* so I think it seems ok to keep this fix as well.

          On 2012-03-22 06:33:20, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 435

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line435>

          >

          > I think master.synchronousBalanceSwitch() is better candidate for this action.

          I agree, but since this method is only in the trunk/0.94 branches I'll file a follow on issue for this.

          On 2012-03-22 06:33:20, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 457

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line457>

          >

          > the trailing s of '.regioninfos' should be removed.

          "Orphaned regions are regions without a .regioninfo file in them."

          On 2012-03-22 06:33:20, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 484

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line484>

          >

          > I don't see where the hf is closed.

          good catch!

          On 2012-03-22 06:33:20, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 488

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line488>

          >

          > Should hfile be added to a list so that we can report them collectively ?

          >

          > Currently user has to search the output of hbck.

          From my point of view it is easier to keep these all on separate lines so we can grep the output. Adding word "orphan" to log message.

          On 2012-03-22 06:33:20, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 489

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line489>

          >

          > Shall we continue with the remaining HFiles ?

          good point. changed break to continue.

          On 2012-03-22 06:33:20, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 501

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line501>

          >

          > Help me understand this comparison:

          > are we shrinking the range here ?

          Good catch!

          The goal here is to indeed expand the region to cover the range of all the hfiles.

          On 2012-03-22 06:33:20, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 531

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line531>

          >

          > Should read 'If there are errors to be fixed'

          • This method determines if there are table integrity errors in HDFS. If
          • there are errors and the appropriate "fix" options are enabled, the method
          • will first correct orphan regions making them into legit regiondirs, and
          • then reload to merge potentially overlapping regions.

          On 2012-03-22 06:33:20, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 567

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line567>

          >

          > Some assertion here for the declared state (no holes) ?

          removed no orphans, no holes from comment - the overlap repairs could happen if the hdfs hole fix options are off.

          On 2012-03-22 06:33:20, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 655

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line655>

          >

          > This exception isn't used.

          > Do we need it ?

          not needed and removed. I believe this is in the 0.90 version and a remnant of porting back and forth between versions.

          On 2012-03-22 06:33:20, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 702

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line702>

          >

          > Can hbaseRoot.getFileSystem() be saved in a variable outside the loop ?

          The guard makes this only executed once per table. In the 0.90 version, the way I got a TableInfo was via a method call to get the HRegionInfo/HTableDescription and I actually checked for inconsistencies there – in 0.92+ there is only the .tableinfo file so this consistency check isn't relevant (though there should be another .tableinfo checks specific for 0.92+ which I can file as a follow on.)

          On 2012-03-22 06:33:20, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 800

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line800>

          >

          > Please put this on line 734

          done.

          On 2012-03-22 06:33:20, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 924

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line924>

          >

          > rename() returns a boolean, should we check the return value ?

          added check similar to the one in the following call to rename.

          On 2012-03-22 06:33:20, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 817

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line817>

          >

          > Why is tablesInfo declared again ?

          removed

          On 2012-03-22 06:33:20, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 642

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line642>

          >

          > This exception isn't used.

          > Do we need it ?

          Removed from here. Not needed in this version, but is used in 0.90 version.

          • jmhsieh

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4280/#review6208
          -----------------------------------------------------------

          On 2012-03-21 23:24:13, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4280/

          -----------------------------------------------------------

          (Updated 2012-03-21 23:24:13)

          Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.

          Summary

          -------

          This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.

          1) No trackHTD method needed since we can read from the file system.

          2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.

          3) Fixed comparator in HRegionInfo

          4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.

          I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.

          This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548

          Diff: https://reviews.apache.org/r/4280/diff

          Testing

          -------

          Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-03-22 06:33:20, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 554 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line554 > > > Can we do this in the current JIRA ? > > Why do we need to reload for every type of fix ? I'd rather do it in a follow on issue. Correctness first, then performance. This patch is massive already. On 2012-03-22 06:33:20, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 404 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line404 > > > Should be 'what are online' "get regions according to what is online on each RegionServer" On 2012-03-22 06:33:20, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 418 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line418 > > > checkAndRestoreConsistency() would be a better name. every other variable is fix* so I think it seems ok to keep this fix as well. On 2012-03-22 06:33:20, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 435 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line435 > > > I think master.synchronousBalanceSwitch() is better candidate for this action. I agree, but since this method is only in the trunk/0.94 branches I'll file a follow on issue for this. On 2012-03-22 06:33:20, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 457 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line457 > > > the trailing s of '.regioninfos' should be removed. "Orphaned regions are regions without a .regioninfo file in them." On 2012-03-22 06:33:20, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 484 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line484 > > > I don't see where the hf is closed. good catch! On 2012-03-22 06:33:20, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 488 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line488 > > > Should hfile be added to a list so that we can report them collectively ? > > Currently user has to search the output of hbck. From my point of view it is easier to keep these all on separate lines so we can grep the output. Adding word "orphan" to log message. On 2012-03-22 06:33:20, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 489 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line489 > > > Shall we continue with the remaining HFiles ? good point. changed break to continue. On 2012-03-22 06:33:20, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 501 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line501 > > > Help me understand this comparison: > are we shrinking the range here ? Good catch! The goal here is to indeed expand the region to cover the range of all the hfiles. On 2012-03-22 06:33:20, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 531 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line531 > > > Should read 'If there are errors to be fixed' This method determines if there are table integrity errors in HDFS. If there are errors and the appropriate "fix" options are enabled, the method will first correct orphan regions making them into legit regiondirs, and then reload to merge potentially overlapping regions. On 2012-03-22 06:33:20, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 567 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line567 > > > Some assertion here for the declared state (no holes) ? removed no orphans, no holes from comment - the overlap repairs could happen if the hdfs hole fix options are off. On 2012-03-22 06:33:20, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 655 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line655 > > > This exception isn't used. > Do we need it ? not needed and removed. I believe this is in the 0.90 version and a remnant of porting back and forth between versions. On 2012-03-22 06:33:20, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 702 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line702 > > > Can hbaseRoot.getFileSystem() be saved in a variable outside the loop ? The guard makes this only executed once per table. In the 0.90 version, the way I got a TableInfo was via a method call to get the HRegionInfo/HTableDescription and I actually checked for inconsistencies there – in 0.92+ there is only the .tableinfo file so this consistency check isn't relevant (though there should be another .tableinfo checks specific for 0.92+ which I can file as a follow on.) On 2012-03-22 06:33:20, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 800 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line800 > > > Please put this on line 734 done. On 2012-03-22 06:33:20, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 924 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line924 > > > rename() returns a boolean, should we check the return value ? added check similar to the one in the following call to rename. On 2012-03-22 06:33:20, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 817 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line817 > > > Why is tablesInfo declared again ? removed On 2012-03-22 06:33:20, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 642 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line642 > > > This exception isn't used. > Do we need it ? Removed from here. Not needed in this version, but is used in 0.90 version. jmhsieh ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/#review6208 ----------------------------------------------------------- On 2012-03-21 23:24:13, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/ ----------------------------------------------------------- (Updated 2012-03-21 23:24:13) Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl. Summary ------- This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences. 1) No trackHTD method needed since we can read from the file system. 2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables. 3) Fixed comparator in HRegionInfo 4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator. I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared. This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable. This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 Diff: https://reviews.apache.org/r/4280/diff Testing ------- Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4280/#review6239
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13498>

          Is @Override missing ?

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java
          <https://reviews.apache.org/r/4280/#comment13474>

          Renaming this method is desirable as I mentioned earlier.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java
          <https://reviews.apache.org/r/4280/#comment13473>

          Typo: assume

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java
          <https://reviews.apache.org/r/4280/#comment13475>

          This method is called in two places where HBaseAdmin is available.

          Please change the method signature to avoid creating HBaseAdmin every time.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java
          <https://reviews.apache.org/r/4280/#comment13478>

          Typo: handleHBCK

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java
          <https://reviews.apache.org/r/4280/#comment13485>

          This is called in a loop in checkMetaRegion().
          It would be nice for this method to take a list of regions and wait for them to come out of RIT.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java
          <https://reviews.apache.org/r/4280/#comment13483>

          Why ?

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java
          <https://reviews.apache.org/r/4280/#comment13484>

          success is no longer set in this method.
          This can be removed.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java
          <https://reviews.apache.org/r/4280/#comment13486>

          Shall we return directly here ?
          The new exception would be caught at line 182

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java
          <https://reviews.apache.org/r/4280/#comment13487>

          It would be nice to cache meta for subsequent calls.
          Can be done in another JIRA.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java
          <https://reviews.apache.org/r/4280/#comment13489>

          Please use this method in the three places of HBaseFsck I mentioned.

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java
          <https://reviews.apache.org/r/4280/#comment13494>

          Javadoc for parameters.

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java
          <https://reviews.apache.org/r/4280/#comment13495>

          Javadoc for parameters.

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java
          <https://reviews.apache.org/r/4280/#comment13496>

          Javadoc for parameters.

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java
          <https://reviews.apache.org/r/4280/#comment13497>

          Can this class be package-private ?

          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          <https://reviews.apache.org/r/4280/#comment13501>

          This check was added because of failed test ?

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13502>

          Can we reuse the method from HBaseFsck ?

          • Ted

          On 2012-03-21 23:24:13, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4280/

          -----------------------------------------------------------

          (Updated 2012-03-21 23:24:13)

          Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.

          Summary

          -------

          This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.

          1) No trackHTD method needed since we can read from the file system.

          2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.

          3) Fixed comparator in HRegionInfo

          4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.

          I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.

          This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548

          Diff: https://reviews.apache.org/r/4280/diff

          Testing

          -------

          Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/#review6239 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13498 > Is @Override missing ? src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java < https://reviews.apache.org/r/4280/#comment13474 > Renaming this method is desirable as I mentioned earlier. src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java < https://reviews.apache.org/r/4280/#comment13473 > Typo: assume src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java < https://reviews.apache.org/r/4280/#comment13475 > This method is called in two places where HBaseAdmin is available. Please change the method signature to avoid creating HBaseAdmin every time. src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java < https://reviews.apache.org/r/4280/#comment13478 > Typo: handleHBCK src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java < https://reviews.apache.org/r/4280/#comment13485 > This is called in a loop in checkMetaRegion(). It would be nice for this method to take a list of regions and wait for them to come out of RIT. src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java < https://reviews.apache.org/r/4280/#comment13483 > Why ? src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java < https://reviews.apache.org/r/4280/#comment13484 > success is no longer set in this method. This can be removed. src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java < https://reviews.apache.org/r/4280/#comment13486 > Shall we return directly here ? The new exception would be caught at line 182 src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java < https://reviews.apache.org/r/4280/#comment13487 > It would be nice to cache meta for subsequent calls. Can be done in another JIRA. src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java < https://reviews.apache.org/r/4280/#comment13489 > Please use this method in the three places of HBaseFsck I mentioned. src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java < https://reviews.apache.org/r/4280/#comment13494 > Javadoc for parameters. src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java < https://reviews.apache.org/r/4280/#comment13495 > Javadoc for parameters. src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java < https://reviews.apache.org/r/4280/#comment13496 > Javadoc for parameters. src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java < https://reviews.apache.org/r/4280/#comment13497 > Can this class be package-private ? src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java < https://reviews.apache.org/r/4280/#comment13501 > This check was added because of failed test ? src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java < https://reviews.apache.org/r/4280/#comment13502 > Can we reuse the method from HBaseFsck ? Ted On 2012-03-21 23:24:13, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/ ----------------------------------------------------------- (Updated 2012-03-21 23:24:13) Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl. Summary ------- This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences. 1) No trackHTD method needed since we can read from the file system. 2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables. 3) Fixed comparator in HRegionInfo 4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator. I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared. This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable. This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 Diff: https://reviews.apache.org/r/4280/diff Testing ------- Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4280/#review6229
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13451>

          I think we should distinguish the return value in this case (0) from that returned on line 1515.
          See comment on line 1792

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13428>

          I suggest renaming holeStart as startRow and renaming holeStop as stopRow.
          Then you don't need the comment on 1700.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13434>

          Should include maxMerge in the log.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13439>

          I wonder whether we should bail if there have been two IOE's, one on 1759 and one here.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13440>

          'Creating' -> 'Created'

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13442>

          Are newRegion and region representing the same entity ?

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13453>

          If mergeRegionDirs() returns 0 (or less), should we note (partial) failure in merging ?

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13456>

          Should say 'unable to get regions from master' or something similar

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13458>

          Please remove this.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13460>

          'with not' -> 'without'
          Should also include some info on the entry.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13459>

          Please remove this.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13464>

          Nit: name hdfsRegiondirModtime as hdfsRegionDirModTime

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13465>

          Typo: maximum

          • Ted

          On 2012-03-21 23:24:13, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4280/

          -----------------------------------------------------------

          (Updated 2012-03-21 23:24:13)

          Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.

          Summary

          -------

          This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.

          1) No trackHTD method needed since we can read from the file system.

          2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.

          3) Fixed comparator in HRegionInfo

          4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.

          I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.

          This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548

          Diff: https://reviews.apache.org/r/4280/diff

          Testing

          -------

          Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/#review6229 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13451 > I think we should distinguish the return value in this case (0) from that returned on line 1515. See comment on line 1792 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13428 > I suggest renaming holeStart as startRow and renaming holeStop as stopRow. Then you don't need the comment on 1700. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13434 > Should include maxMerge in the log. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13439 > I wonder whether we should bail if there have been two IOE's, one on 1759 and one here. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13440 > 'Creating' -> 'Created' src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13442 > Are newRegion and region representing the same entity ? src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13453 > If mergeRegionDirs() returns 0 (or less), should we note (partial) failure in merging ? src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13456 > Should say 'unable to get regions from master' or something similar src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13458 > Please remove this. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13460 > 'with not' -> 'without' Should also include some info on the entry. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13459 > Please remove this. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13464 > Nit: name hdfsRegiondirModtime as hdfsRegionDirModTime src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13465 > Typo: maximum Ted On 2012-03-21 23:24:13, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/ ----------------------------------------------------------- (Updated 2012-03-21 23:24:13) Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl. Summary ------- This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences. 1) No trackHTD method needed since we can read from the file system. 2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables. 3) Fixed comparator in HRegionInfo 4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator. I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared. This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable. This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 Diff: https://reviews.apache.org/r/4280/diff Testing ------- Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-03-22 05:21:28, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 172

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line172>

          >

          > I think tablesToFix would be a better name for this member.

          agreed.

          On 2012-03-22 05:21:28, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 186

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line186>

          >

          > 'encoded region name' would be clearer.

          "It maps from encoded region name to HbckInfo structure. "

          On 2012-03-22 05:21:28, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 198

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line198>

          >

          > TInfo should be TableInfo

          done

          On 2012-03-22 05:21:28, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 363

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line363>

          >

          > fixes is a global variable.

          > I think the loop condition should check that fixes increases across iterations.

          > If the count doesn't increase, we can break out of the loop.

          clearState() reset's the fixes count. I'll add a comment.

          On 2012-03-22 05:21:28, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 365

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line365>

          >

          > Why is 2 specially treated here ?

          iteration 1 does repairs, iteration 2 verifies things are fixed. If there are more something funny has happened. adding comment.

          Changed success logging message to info.

          On 2012-03-22 05:21:28, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 396

          > <https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line396>

          >

          > Ideally a different return value (say -2) should be used.

          done

          • jmhsieh

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4280/#review6205
          -----------------------------------------------------------

          On 2012-03-21 23:24:13, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4280/

          -----------------------------------------------------------

          (Updated 2012-03-21 23:24:13)

          Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.

          Summary

          -------

          This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.

          1) No trackHTD method needed since we can read from the file system.

          2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.

          3) Fixed comparator in HRegionInfo

          4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.

          I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.

          This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548

          Diff: https://reviews.apache.org/r/4280/diff

          Testing

          -------

          Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-03-22 05:21:28, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 172 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line172 > > > I think tablesToFix would be a better name for this member. agreed. On 2012-03-22 05:21:28, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 186 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line186 > > > 'encoded region name' would be clearer. "It maps from encoded region name to HbckInfo structure. " On 2012-03-22 05:21:28, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 198 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line198 > > > TInfo should be TableInfo done On 2012-03-22 05:21:28, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 363 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line363 > > > fixes is a global variable. > I think the loop condition should check that fixes increases across iterations. > If the count doesn't increase, we can break out of the loop. clearState() reset's the fixes count. I'll add a comment. On 2012-03-22 05:21:28, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 365 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line365 > > > Why is 2 specially treated here ? iteration 1 does repairs, iteration 2 verifies things are fixed. If there are more something funny has happened. adding comment. Changed success logging message to info. On 2012-03-22 05:21:28, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 396 > < https://reviews.apache.org/r/4280/diff/2/?file=94413#file94413line396 > > > Ideally a different return value (say -2) should be used. done jmhsieh ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/#review6205 ----------------------------------------------------------- On 2012-03-21 23:24:13, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/ ----------------------------------------------------------- (Updated 2012-03-21 23:24:13) Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl. Summary ------- This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences. 1) No trackHTD method needed since we can read from the file system. 2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables. 3) Fixed comparator in HRegionInfo 4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator. I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared. This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable. This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 Diff: https://reviews.apache.org/r/4280/diff Testing ------- Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4280/#review6224
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13407>

          Please log some information about this region

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13408>

          Redundant 'with'

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13409>

          'reassigned' -> 'reassign'

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13410>

          It would be nice to create method so that this block of code and lines 1271-1289 can be unified.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13411>

          Please remove unused code.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13413>

          (inMeta && inHdfs) appears more than once above, is there a chance that this case mistakenly falls into one of them ?

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13414>

          Looking at fixDupeAssignment(), it really does region closing and offlining.
          Can we give it a better name ?

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13415>

          checkRegionChain() is synchronous.
          Can we share one TableIntegrityErrorHandler and set its tInfo in the loop ?

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13416>

          This would eclipse the global counter, right ?

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13420>

          This class can be private

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13421>

          This class can be private

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13422>

          The following four lines are repeated 3 times in this class.
          Refactor and create a new method.

          • Ted

          On 2012-03-21 23:24:13, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4280/

          -----------------------------------------------------------

          (Updated 2012-03-21 23:24:13)

          Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.

          Summary

          -------

          This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.

          1) No trackHTD method needed since we can read from the file system.

          2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.

          3) Fixed comparator in HRegionInfo

          4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.

          I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.

          This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548

          Diff: https://reviews.apache.org/r/4280/diff

          Testing

          -------

          Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/#review6224 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13407 > Please log some information about this region src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13408 > Redundant 'with' src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13409 > 'reassigned' -> 'reassign' src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13410 > It would be nice to create method so that this block of code and lines 1271-1289 can be unified. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13411 > Please remove unused code. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13413 > (inMeta && inHdfs) appears more than once above, is there a chance that this case mistakenly falls into one of them ? src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13414 > Looking at fixDupeAssignment(), it really does region closing and offlining. Can we give it a better name ? src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13415 > checkRegionChain() is synchronous. Can we share one TableIntegrityErrorHandler and set its tInfo in the loop ? src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13416 > This would eclipse the global counter, right ? src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13420 > This class can be private src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13421 > This class can be private src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13422 > The following four lines are repeated 3 times in this class. Refactor and create a new method. Ted On 2012-03-21 23:24:13, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/ ----------------------------------------------------------- (Updated 2012-03-21 23:24:13) Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl. Summary ------- This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences. 1) No trackHTD method needed since we can read from the file system. 2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables. 3) Fixed comparator in HRegionInfo 4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator. I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared. This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable. This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 Diff: https://reviews.apache.org/r/4280/diff Testing ------- Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions. Thanks, jmhsieh
          Hide
          Lars Hofhansl added a comment -

          We'll be trying to get this into 0.94.

          Show
          Lars Hofhansl added a comment - We'll be trying to get this into 0.94.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4280/#review6213
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13390>

          Please check return value from delete() call.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13392>

          You renamed it to regionInfoMap, right ?

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13391>

          I think we should handle RejectedExecutionException and re-submit the item.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13393>

          Shall we log something since these two calls may take some time.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13394>

          Please move this to line 1178

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13395>

          Indentation.

          • Ted

          On 2012-03-21 23:24:13, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4280/

          -----------------------------------------------------------

          (Updated 2012-03-21 23:24:13)

          Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.

          Summary

          -------

          This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.

          1) No trackHTD method needed since we can read from the file system.

          2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.

          3) Fixed comparator in HRegionInfo

          4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.

          I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.

          This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548

          Diff: https://reviews.apache.org/r/4280/diff

          Testing

          -------

          Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/#review6213 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13390 > Please check return value from delete() call. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13392 > You renamed it to regionInfoMap, right ? src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13391 > I think we should handle RejectedExecutionException and re-submit the item. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13393 > Shall we log something since these two calls may take some time. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13394 > Please move this to line 1178 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13395 > Indentation. Ted On 2012-03-21 23:24:13, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/ ----------------------------------------------------------- (Updated 2012-03-21 23:24:13) Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl. Summary ------- This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences. 1) No trackHTD method needed since we can read from the file system. 2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables. 3) Fixed comparator in HRegionInfo 4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator. I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared. This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable. This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 Diff: https://reviews.apache.org/r/4280/diff Testing ------- Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4280/#review6208
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13371>

          Should be 'what are online'

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13372>

          checkAndRestoreConsistency() would be a better name.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13373>

          I think master.synchronousBalanceSwitch() is better candidate for this action.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13374>

          the trailing s of '.regioninfos' should be removed.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13375>

          I don't see where the hf is closed.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13376>

          Should hfile be added to a list so that we can report them collectively ?

          Currently user has to search the output of hbck.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13377>

          Shall we continue with the remaining HFiles ?

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13379>

          Help me understand this comparison:
          are we shrinking the range here ?

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13380>

          Should read 'If there are errors to be fixed'

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13382>

          Can we do this in the current JIRA ?

          Why do we need to reload for every type of fix ?

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13381>

          Some assertion here for the declared state (no holes) ?

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13383>

          This exception isn't used.
          Do we need it ?

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13384>

          This exception isn't used.
          Do we need it ?

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13385>

          Can hbaseRoot.getFileSystem() be saved in a variable outside the loop ?

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13386>

          Please put this on line 734

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13387>

          Why is tablesInfo declared again ?

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13388>

          rename() returns a boolean, should we check the return value ?

          • Ted

          On 2012-03-21 23:24:13, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4280/

          -----------------------------------------------------------

          (Updated 2012-03-21 23:24:13)

          Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.

          Summary

          -------

          This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.

          1) No trackHTD method needed since we can read from the file system.

          2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.

          3) Fixed comparator in HRegionInfo

          4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.

          I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.

          This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548

          Diff: https://reviews.apache.org/r/4280/diff

          Testing

          -------

          Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/#review6208 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13371 > Should be 'what are online' src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13372 > checkAndRestoreConsistency() would be a better name. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13373 > I think master.synchronousBalanceSwitch() is better candidate for this action. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13374 > the trailing s of '.regioninfos' should be removed. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13375 > I don't see where the hf is closed. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13376 > Should hfile be added to a list so that we can report them collectively ? Currently user has to search the output of hbck. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13377 > Shall we continue with the remaining HFiles ? src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13379 > Help me understand this comparison: are we shrinking the range here ? src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13380 > Should read 'If there are errors to be fixed' src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13382 > Can we do this in the current JIRA ? Why do we need to reload for every type of fix ? src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13381 > Some assertion here for the declared state (no holes) ? src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13383 > This exception isn't used. Do we need it ? src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13384 > This exception isn't used. Do we need it ? src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13385 > Can hbaseRoot.getFileSystem() be saved in a variable outside the loop ? src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13386 > Please put this on line 734 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13387 > Why is tablesInfo declared again ? src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13388 > rename() returns a boolean, should we check the return value ? Ted On 2012-03-21 23:24:13, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/ ----------------------------------------------------------- (Updated 2012-03-21 23:24:13) Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl. Summary ------- This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences. 1) No trackHTD method needed since we can read from the file system. 2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables. 3) Fixed comparator in HRegionInfo 4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator. I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared. This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable. This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 Diff: https://reviews.apache.org/r/4280/diff Testing ------- Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4280/#review6205
          -----------------------------------------------------------

          Nice work.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13364>

          I think tablesToFix would be a better name for this member.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13365>

          'encoded region name' would be clearer.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13366>

          TInfo should be TableInfo

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13367>

          Currently this config is hidden.
          It would be nice to mention it in release notes.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13368>

          fixes is a global variable.
          I think the loop condition should check that fixes increases across iterations.
          If the count doesn't increase, we can break out of the loop.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13369>

          Why is 2 specially treated here ?

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment13370>

          Ideally a different return value (say -2) should be used.

          • Ted

          On 2012-03-21 23:24:13, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4280/

          -----------------------------------------------------------

          (Updated 2012-03-21 23:24:13)

          Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.

          Summary

          -------

          This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.

          1) No trackHTD method needed since we can read from the file system.

          2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.

          3) Fixed comparator in HRegionInfo

          4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.

          I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.

          This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548

          Diff: https://reviews.apache.org/r/4280/diff

          Testing

          -------

          Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/#review6205 ----------------------------------------------------------- Nice work. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13364 > I think tablesToFix would be a better name for this member. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13365 > 'encoded region name' would be clearer. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13366 > TInfo should be TableInfo src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13367 > Currently this config is hidden. It would be nice to mention it in release notes. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13368 > fixes is a global variable. I think the loop condition should check that fixes increases across iterations. If the count doesn't increase, we can break out of the loop. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13369 > Why is 2 specially treated here ? src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment13370 > Ideally a different return value (say -2) should be used. Ted On 2012-03-21 23:24:13, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/ ----------------------------------------------------------- (Updated 2012-03-21 23:24:13) Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl. Summary ------- This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences. 1) No trackHTD method needed since we can read from the file system. 2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables. 3) Fixed comparator in HRegionInfo 4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator. I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared. This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable. This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 Diff: https://reviews.apache.org/r/4280/diff Testing ------- Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions. Thanks, jmhsieh
          Hide
          Jonathan Hsieh added a comment -

          Previous version accidentally included two dev tools that are not part of this patch (ScrubMeta and DumpMeta).

          Show
          Jonathan Hsieh added a comment - Previous version accidentally included two dev tools that are not part of this patch (ScrubMeta and DumpMeta).
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3435/
          -----------------------------------------------------------

          (Updated 2012-03-22 05:16:09.079201)

          Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.

          Changes
          -------

          Updated with safeguard features found in trunk/0.94/0.92 versions.

          Summary (updated)
          -------

          This should nearly be to ready for integration. This has the same control flow as the trunk/0.92/0.94 versions but has a few differences.

          • It needs to track HTableDescritors instead of reading them from the file system.
          • It uses a different HBaseFsckRepair.forceOfflineInZK method – which for some reason means we don't need HBASE-5563.
          • Uses HServerAddress instead of ServerName

          This version is close to what we've used on production clusters.

          This addresses bug HBASE-5128.
          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java de6ebe3
          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 7404377
          src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1ec17cd
          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d
          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b
          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2
          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 7138d63
          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57
          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 2c4a79e
          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151

          Diff: https://reviews.apache.org/r/3435/diff

          Testing (updated)
          -------

          All TestHBaseFsck unit tests pass. Currently running full suite.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3435/ ----------------------------------------------------------- (Updated 2012-03-22 05:16:09.079201) Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans. Changes ------- Updated with safeguard features found in trunk/0.94/0.92 versions. Summary (updated) ------- This should nearly be to ready for integration. This has the same control flow as the trunk/0.92/0.94 versions but has a few differences. It needs to track HTableDescritors instead of reading them from the file system. It uses a different HBaseFsckRepair.forceOfflineInZK method – which for some reason means we don't need HBASE-5563 . Uses HServerAddress instead of ServerName This version is close to what we've used on production clusters. This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs (updated) src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java de6ebe3 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 7404377 src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1ec17cd src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2 src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 7138d63 src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57 src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 2c4a79e src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151 Diff: https://reviews.apache.org/r/3435/diff Testing (updated) ------- All TestHBaseFsck unit tests pass. Currently running full suite. Thanks, jmhsieh
          Hide
          Jonathan Hsieh added a comment -

          trunk/0.94/0.92 versions pass full unit test suites, (or had flakies that passed locally).

          I plan on doing one final pass on these version to find and fix findbugs/arcanist nits.

          Show
          Jonathan Hsieh added a comment - trunk/0.94/0.92 versions pass full unit test suites, (or had flakies that passed locally). I plan on doing one final pass on these version to find and fix findbugs/arcanist nits.
          Hide
          Jonathan Hsieh added a comment -

          For the 0.90v2 version, TestHBaseFsck passes consistently. This is very close to the version we've used to repair production clusters.

          0.90 version has a different HBaseFsckRepair.forceOfflineInZK() which is somehow responsible for that version not needed HBASE-5563. I haven't investigated enough to determine why the equivalent method for the 0.92/0.94/trunk versions fail unit tests consistently.

          Show
          Jonathan Hsieh added a comment - For the 0.90v2 version, TestHBaseFsck passes consistently. This is very close to the version we've used to repair production clusters. 0.90 version has a different HBaseFsckRepair.forceOfflineInZK() which is somehow responsible for that version not needed HBASE-5563 . I haven't investigated enough to determine why the equivalent method for the 0.92/0.94/trunk versions fail unit tests consistently.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12519364/hbase-5128-trunk-v2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 21 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 7 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1248//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1248//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1248//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519364/hbase-5128-trunk-v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 21 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 7 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1248//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1248//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1248//console This message is automatically generated.
          Hide
          Jonathan Hsieh added a comment -

          0.94 and 0.92 versions have minor tweaks from trunk version and in both cases TestHBaseFsck passes.

          Show
          Jonathan Hsieh added a comment - 0.94 and 0.92 versions have minor tweaks from trunk version and in both cases TestHBaseFsck passes.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4280/
          -----------------------------------------------------------

          (Updated 2012-03-21 23:24:13.538416)

          Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.

          Changes
          -------

          Updated from Ted and Stack's reviews.

          Highlights:

          • usage and actual command line params renamed and updated.

          Summary
          -------

          This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.

          1) No trackHTD method needed since we can read from the file system.
          2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.
          3) Fixed comparator in HRegionInfo
          4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.

          I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.

          This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.

          This addresses bug HBASE-5128.
          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4
          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10
          src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f
          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65
          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c
          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b
          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION
          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02
          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d
          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1
          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548

          Diff: https://reviews.apache.org/r/4280/diff

          Testing
          -------

          Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/ ----------------------------------------------------------- (Updated 2012-03-21 23:24:13.538416) Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl. Changes ------- Updated from Ted and Stack's reviews. Highlights: usage and actual command line params renamed and updated. Summary ------- This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences. 1) No trackHTD method needed since we can read from the file system. 2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables. 3) Fixed comparator in HRegionInfo 4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator. I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared. This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable. This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs (updated) src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3c635d4 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java d47ef10 src/main/java/org/apache/hadoop/hbase/master/HMaster.java cd1755f src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java c0aaf65 src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandlerImpl.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java d9a2a02 src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 Diff: https://reviews.apache.org/r/4280/diff Testing ------- Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-03-12 23:35:46, Michael Stack wrote:

          > I went through about half of this patch. Its plain that there have been a bunch of improvements. There is some great stuff in here. I'm +1 on committing this because its fat and full of goodies and then working on issues in new issues.

          Sorry for the delay. I'll have a new patch that addresses these comments up shortly, and will focus on porting to 0.94/0.92/0.90 to take account of HBASE-5563, HBASE-5588, and HBASE-5589.

          On 2012-03-12 23:35:46, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java, line 169

          > <https://reviews.apache.org/r/4280/diff/1/?file=90975#file90975line169>

          >

          > Good one.

          comment removed due to update in HBASE-5563.

          On 2012-03-12 23:35:46, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java, line 1025

          > <https://reviews.apache.org/r/4280/diff/1/?file=90975#file90975line1025>

          >

          > We overloaded the method here?

          This was a style thing – I misread the method when I read it so I rewrote be more verbose but readable.

          On 2012-03-12 23:35:46, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 96

          > <https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line96>

          >

          > Add <p> here is you want the line between paragraphs to come out in javadoc. You add a white space for each empty line.

          html-ized javadoc.

          On 2012-03-12 23:35:46, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 102

          > <https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line102>

          >

          > that there 'are' no...

          fixed

          On 2012-03-12 23:35:46, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 131

          > <https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line131>

          >

          > s/handleful/handful/

          fixed

          On 2012-03-12 23:35:46, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 135

          > <https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line135>

          >

          > Capitalize 'replaces'

          removed from here and fixed in usage

          On 2012-03-12 23:35:46, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 139

          > <https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line139>

          >

          > Nice doc. Helps.

          >

          > Would suggest this args explaination stuff only be done in the usage, not in usage and up here in class comment. They have a tendency to diverge.

          removed flags and added link to the printUsageAndExit method.

          On 2012-03-12 23:35:46, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 394

          > <https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line394>

          >

          > Declare and assign in the one step? As is, you declare two lines above and then assign it here

          Done. I likely reused the var a couple times in an earlier rev.

          On 2012-03-12 23:35:46, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 898

          > <https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line898>

          >

          > Is this method like: http://hbase.apache.org/xref/org/apache/hadoop/hbase/util/FSUtils.html#469

          >

          > ?

          changed to use FSUtil.getRootDir().

          On 2012-03-12 23:35:46, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 365

          > <https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line365>

          >

          > This is 'destructive' in that it changes whats on hdfs? If so, change the comment above.... it says 'determine'

          changed to 'repair'

          • jmhsieh

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4280/#review5860
          -----------------------------------------------------------

          On 2012-03-10 01:04:58, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4280/

          -----------------------------------------------------------

          (Updated 2012-03-10 01:04:58)

          Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.

          Summary

          -------

          This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.

          1) No trackHTD method needed since we can read from the file system.

          2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.

          3) Fixed comparator in HRegionInfo

          4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.

          I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.

          This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 98f79fc

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3bcf899

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java ae468ca

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java e2bbbd0

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 720841c

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 38eb6a8

          src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java 1b3b6df

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548

          Diff: https://reviews.apache.org/r/4280/diff

          Testing

          -------

          Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-03-12 23:35:46, Michael Stack wrote: > I went through about half of this patch. Its plain that there have been a bunch of improvements. There is some great stuff in here. I'm +1 on committing this because its fat and full of goodies and then working on issues in new issues. Sorry for the delay. I'll have a new patch that addresses these comments up shortly, and will focus on porting to 0.94/0.92/0.90 to take account of HBASE-5563 , HBASE-5588 , and HBASE-5589 . On 2012-03-12 23:35:46, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java, line 169 > < https://reviews.apache.org/r/4280/diff/1/?file=90975#file90975line169 > > > Good one. comment removed due to update in HBASE-5563 . On 2012-03-12 23:35:46, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java, line 1025 > < https://reviews.apache.org/r/4280/diff/1/?file=90975#file90975line1025 > > > We overloaded the method here? This was a style thing – I misread the method when I read it so I rewrote be more verbose but readable. On 2012-03-12 23:35:46, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 96 > < https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line96 > > > Add <p> here is you want the line between paragraphs to come out in javadoc. You add a white space for each empty line. html-ized javadoc. On 2012-03-12 23:35:46, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 102 > < https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line102 > > > that there 'are' no... fixed On 2012-03-12 23:35:46, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 131 > < https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line131 > > > s/handleful/handful/ fixed On 2012-03-12 23:35:46, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 135 > < https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line135 > > > Capitalize 'replaces' removed from here and fixed in usage On 2012-03-12 23:35:46, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 139 > < https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line139 > > > Nice doc. Helps. > > Would suggest this args explaination stuff only be done in the usage, not in usage and up here in class comment. They have a tendency to diverge. removed flags and added link to the printUsageAndExit method. On 2012-03-12 23:35:46, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 394 > < https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line394 > > > Declare and assign in the one step? As is, you declare two lines above and then assign it here Done. I likely reused the var a couple times in an earlier rev. On 2012-03-12 23:35:46, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 898 > < https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line898 > > > Is this method like: http://hbase.apache.org/xref/org/apache/hadoop/hbase/util/FSUtils.html#469 > > ? changed to use FSUtil.getRootDir(). On 2012-03-12 23:35:46, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 365 > < https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line365 > > > This is 'destructive' in that it changes whats on hdfs? If so, change the comment above.... it says 'determine' changed to 'repair' jmhsieh ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/#review5860 ----------------------------------------------------------- On 2012-03-10 01:04:58, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/ ----------------------------------------------------------- (Updated 2012-03-10 01:04:58) Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl. Summary ------- This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences. 1) No trackHTD method needed since we can read from the file system. 2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables. 3) Fixed comparator in HRegionInfo 4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator. I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared. This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable. This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 98f79fc src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3bcf899 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java ae468ca src/main/java/org/apache/hadoop/hbase/master/HMaster.java e2bbbd0 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 720841c src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 38eb6a8 src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java 1b3b6df src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 Diff: https://reviews.apache.org/r/4280/diff Testing ------- Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-03-11 14:37:01, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1689

          > <https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line1689>

          >

          > Precede regioninfo with a dot.

          done

          On 2012-03-11 14:37:01, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2879

          > <https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line2879>

          >

          > Can we name this option fixRegionHolesOnHdfs ?

          > It would be better to note which options can be run with cluster offline.

          at the moment, hbck can only be run while hbase is online. This has not been unified with OfflineMetaRebuild yet.

          On 2012-03-11 14:37:01, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2880

          > <https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line2880>

          >

          > Name this fixRegionOverlapsOnHdfs ?

          I'm not sure what making the flag even long buys us. I was thinking about making it even more concise: -fixHoles, -fixOverlaps. The assumption in this tool is that the data in the file system is golden and to reconstruct everything from there (previous version trusted meta table only).

          On 2012-03-11 14:37:01, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2949

          > <https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line2949>

          >

          > white space.

          fixed

          On 2012-03-11 14:37:01, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java, line 28

          > <https://reviews.apache.org/r/4280/diff/1/?file=90980#file90980line28>

          >

          > Should read 'callbacks for handling particular table integrity invariant violations detected.'

          updated to be in english.

          On 2012-03-11 14:37:01, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java, line 33

          > <https://reviews.apache.org/r/4280/diff/1/?file=90980#file90980line33>

          >

          > Please add javadoc for the handleXXX methods on what scenario each fixes.

          done

          On 2012-03-11 14:37:01, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java, line 52

          > <https://reviews.apache.org/r/4280/diff/1/?file=90980#file90980line52>

          >

          > This class should be abstract.

          > It is better to put it in its own file.

          done.

          On 2012-03-11 14:37:01, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java, line 38

          > <https://reviews.apache.org/r/4280/diff/1/?file=90980#file90980line38>

          >

          > Since region always belongs to some table, I suggest naming this method handleNonEmptyRegionStartKey()

          renamed to handleRegionStartKeyNotEmpty

          On 2012-03-11 14:37:01, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2878

          > <https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line2878>

          >

          > More than one option modifies .META. table.

          > Shall we name this option fixMetaUsingRegionInfoOnHdfs ?

          there are two kinds of flags here – individual flags like -fixAssignments, -fixMeta, -fixHdfs*, and combo flags that enable a few such as -fixAll. I going to change the combo flags to make them more distinct; I'll change -fixAll to be -allFix or something like that to make it clearer. I also need to update the usage info to be more accurate.

          On 2012-03-11 14:37:01, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2876

          > <https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line2876>

          >

          > Looking at code @ line 2835 below, it seems -fixAssignments and -fix are equivalent.

          > What was the reason for deprecating -fix ?

          -fix and -fixAssigments are equivalent to the original application's behavior. I didn't want our front line supporters to use -fix assuming old behavior and have it fix durable state (hdfs modifications), so I added other flags to enable those modifications. With the other options it seemed like changing the name to be consistent made sense.

          • jmhsieh

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4280/#review5826
          -----------------------------------------------------------

          On 2012-03-10 01:04:58, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4280/

          -----------------------------------------------------------

          (Updated 2012-03-10 01:04:58)

          Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.

          Summary

          -------

          This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.

          1) No trackHTD method needed since we can read from the file system.

          2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.

          3) Fixed comparator in HRegionInfo

          4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.

          I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.

          This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 98f79fc

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3bcf899

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java ae468ca

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java e2bbbd0

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 720841c

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 38eb6a8

          src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java 1b3b6df

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548

          Diff: https://reviews.apache.org/r/4280/diff

          Testing

          -------

          Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-03-11 14:37:01, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1689 > < https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line1689 > > > Precede regioninfo with a dot. done On 2012-03-11 14:37:01, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2879 > < https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line2879 > > > Can we name this option fixRegionHolesOnHdfs ? > It would be better to note which options can be run with cluster offline. at the moment, hbck can only be run while hbase is online. This has not been unified with OfflineMetaRebuild yet. On 2012-03-11 14:37:01, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2880 > < https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line2880 > > > Name this fixRegionOverlapsOnHdfs ? I'm not sure what making the flag even long buys us. I was thinking about making it even more concise: -fixHoles, -fixOverlaps. The assumption in this tool is that the data in the file system is golden and to reconstruct everything from there (previous version trusted meta table only). On 2012-03-11 14:37:01, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2949 > < https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line2949 > > > white space. fixed On 2012-03-11 14:37:01, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java, line 28 > < https://reviews.apache.org/r/4280/diff/1/?file=90980#file90980line28 > > > Should read 'callbacks for handling particular table integrity invariant violations detected.' updated to be in english. On 2012-03-11 14:37:01, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java, line 33 > < https://reviews.apache.org/r/4280/diff/1/?file=90980#file90980line33 > > > Please add javadoc for the handleXXX methods on what scenario each fixes. done On 2012-03-11 14:37:01, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java, line 52 > < https://reviews.apache.org/r/4280/diff/1/?file=90980#file90980line52 > > > This class should be abstract. > It is better to put it in its own file. done. On 2012-03-11 14:37:01, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java, line 38 > < https://reviews.apache.org/r/4280/diff/1/?file=90980#file90980line38 > > > Since region always belongs to some table, I suggest naming this method handleNonEmptyRegionStartKey() renamed to handleRegionStartKeyNotEmpty On 2012-03-11 14:37:01, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2878 > < https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line2878 > > > More than one option modifies .META. table. > Shall we name this option fixMetaUsingRegionInfoOnHdfs ? there are two kinds of flags here – individual flags like -fixAssignments, -fixMeta, -fixHdfs*, and combo flags that enable a few such as -fixAll. I going to change the combo flags to make them more distinct; I'll change -fixAll to be -allFix or something like that to make it clearer. I also need to update the usage info to be more accurate. On 2012-03-11 14:37:01, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2876 > < https://reviews.apache.org/r/4280/diff/1/?file=90977#file90977line2876 > > > Looking at code @ line 2835 below, it seems -fixAssignments and -fix are equivalent. > What was the reason for deprecating -fix ? -fix and -fixAssigments are equivalent to the original application's behavior. I didn't want our front line supporters to use -fix assuming old behavior and have it fix durable state (hdfs modifications), so I added other flags to enable those modifications. With the other options it seemed like changing the name to be consistent made sense. jmhsieh ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/#review5826 ----------------------------------------------------------- On 2012-03-10 01:04:58, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/ ----------------------------------------------------------- (Updated 2012-03-10 01:04:58) Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl. Summary ------- This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences. 1) No trackHTD method needed since we can read from the file system. 2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables. 3) Fixed comparator in HRegionInfo 4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator. I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared. This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable. This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 98f79fc src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3bcf899 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java ae468ca src/main/java/org/apache/hadoop/hbase/master/HMaster.java e2bbbd0 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 720841c src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 38eb6a8 src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java 1b3b6df src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 Diff: https://reviews.apache.org/r/4280/diff Testing ------- Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-03-12 23:35:46, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/HRegionInfo.java, line 805

          > <https://reviews.apache.org/r/4280/diff/1/?file=90973#file90973line805>

          >

          > Interesting. https://issues.apache.org/jira/browse/HBASE-5563 is all about adding this.

          HBASE-5563 committed.

          On 2012-03-12 23:35:46, Michael Stack wrote:

          > src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java, line 229

          > <https://reviews.apache.org/r/4280/diff/1/?file=90974#file90974line229>

          >

          > Will this break compatibility? Put at the end of the Interface and it might be ok.

          >

          > I think we need this one. In the past, we've addressed this issue by having the user restart master.

          This is being handled in https://issues.apache.org/jira/browse/HBASE-5589. In the notes there the compatiblity breaking does not happen.

          • jmhsieh

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4280/#review5860
          -----------------------------------------------------------

          On 2012-03-10 01:04:58, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4280/

          -----------------------------------------------------------

          (Updated 2012-03-10 01:04:58)

          Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.

          Summary

          -------

          This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.

          1) No trackHTD method needed since we can read from the file system.

          2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.

          3) Fixed comparator in HRegionInfo

          4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.

          I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.

          This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 98f79fc

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3bcf899

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java ae468ca

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java e2bbbd0

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 720841c

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 38eb6a8

          src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java 1b3b6df

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548

          Diff: https://reviews.apache.org/r/4280/diff

          Testing

          -------

          Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-03-12 23:35:46, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/HRegionInfo.java, line 805 > < https://reviews.apache.org/r/4280/diff/1/?file=90973#file90973line805 > > > Interesting. https://issues.apache.org/jira/browse/HBASE-5563 is all about adding this. HBASE-5563 committed. On 2012-03-12 23:35:46, Michael Stack wrote: > src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java, line 229 > < https://reviews.apache.org/r/4280/diff/1/?file=90974#file90974line229 > > > Will this break compatibility? Put at the end of the Interface and it might be ok. > > I think we need this one. In the past, we've addressed this issue by having the user restart master. This is being handled in https://issues.apache.org/jira/browse/HBASE-5589 . In the notes there the compatiblity breaking does not happen. jmhsieh ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/#review5860 ----------------------------------------------------------- On 2012-03-10 01:04:58, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/ ----------------------------------------------------------- (Updated 2012-03-10 01:04:58) Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl. Summary ------- This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences. 1) No trackHTD method needed since we can read from the file system. 2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables. 3) Fixed comparator in HRegionInfo 4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator. I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared. This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable. This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 98f79fc src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3bcf899 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java ae468ca src/main/java/org/apache/hadoop/hbase/master/HMaster.java e2bbbd0 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 720841c src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 38eb6a8 src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java 1b3b6df src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 Diff: https://reviews.apache.org/r/4280/diff Testing ------- Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-03-11 01:25:43, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java, line 2652

          > <https://reviews.apache.org/r/4280/diff/1/?file=90975#file90975line2652>

          >

          > Can we deprecate this method in 0.94 and remove it in 0.96 ?

          Completed in HBASE-5588.

          • jmhsieh

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4280/#review5823
          -----------------------------------------------------------

          On 2012-03-10 01:04:58, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4280/

          -----------------------------------------------------------

          (Updated 2012-03-10 01:04:58)

          Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.

          Summary

          -------

          This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.

          1) No trackHTD method needed since we can read from the file system.

          2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.

          3) Fixed comparator in HRegionInfo

          4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.

          I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.

          This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 98f79fc

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3bcf899

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java ae468ca

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java e2bbbd0

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 720841c

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 38eb6a8

          src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java 1b3b6df

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548

          Diff: https://reviews.apache.org/r/4280/diff

          Testing

          -------

          Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-03-11 01:25:43, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java, line 2652 > < https://reviews.apache.org/r/4280/diff/1/?file=90975#file90975line2652 > > > Can we deprecate this method in 0.94 and remove it in 0.96 ? Completed in HBASE-5588 . jmhsieh ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/#review5823 ----------------------------------------------------------- On 2012-03-10 01:04:58, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/ ----------------------------------------------------------- (Updated 2012-03-10 01:04:58) Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl. Summary ------- This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences. 1) No trackHTD method needed since we can read from the file system. 2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables. 3) Fixed comparator in HRegionInfo 4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator. I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared. This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable. This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 98f79fc src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3bcf899 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java ae468ca src/main/java/org/apache/hadoop/hbase/master/HMaster.java e2bbbd0 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 720841c src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 38eb6a8 src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java 1b3b6df src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 Diff: https://reviews.apache.org/r/4280/diff Testing ------- Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions. Thanks, jmhsieh
          Hide
          stack added a comment -

          @Jon Should go into 0.92 soon as ready. On...

          I'll double check and report back before I attempt any commits.

          That'd be cool. If you don't do it, I will. Its pretty critical we not break rolling restart. Good on you Jon.

          Show
          stack added a comment - @Jon Should go into 0.92 soon as ready. On... I'll double check and report back before I attempt any commits. That'd be cool. If you don't do it, I will. Its pretty critical we not break rolling restart. Good on you Jon.
          Hide
          Jonathan Hsieh added a comment -

          @Zhihong

          No problem – I intend to address the reviews.

          Sorry about the test failures – these are actually are related to HBASE-5563 – I'll help chenhui there. I've been in 0.92 and 0.90 land and then away for a little bit and didn't realize that a failure in medium skips all the large tests. (I fixed the medium and expected it to pass but then the large tests ran and failed).

          Show
          Jonathan Hsieh added a comment - @Zhihong No problem – I intend to address the reviews. Sorry about the test failures – these are actually are related to HBASE-5563 – I'll help chenhui there. I've been in 0.92 and 0.90 land and then away for a little bit and didn't realize that a failure in medium skips all the large tests. (I fixed the medium and expected it to pass but then the large tests ran and failed).
          Hide
          Ted Yu added a comment -

          @Jonathan:
          Can you address QA report @ 10/Mar/12 02:00 ?

          There're outstanding review comments on review board.

          Show
          Ted Yu added a comment - @Jonathan: Can you address QA report @ 10/Mar/12 02:00 ? There're outstanding review comments on review board.
          Hide
          Jonathan Hsieh added a comment -

          @Stack The 0.90 version has been used against version that didn't use the offline method and I don't think that the order matters. I'll double check and report back before I attempt any commits. What's your thoughts on getting it into 0.92.1rcX if rc0 doesn't make it (not blocking it but getting in if the window opens up?)?

          Show
          Jonathan Hsieh added a comment - @Stack The 0.90 version has been used against version that didn't use the offline method and I don't think that the order matters. I'll double check and report back before I attempt any commits. What's your thoughts on getting it into 0.92.1rcX if rc0 doesn't make it (not blocking it but getting in if the window opens up?)?
          Hide
          Jonathan Hsieh added a comment -

          @Lars I believe the port to 0.94.0 and 0.92.x are likely identical and nearly trivial and I was intending on doing it. The initial version was also for 0.90.x and a version for that will be ported as well since my crew will be supporting that version for a while. I may try to do a 0.90.x release at some point.

          Show
          Jonathan Hsieh added a comment - @Lars I believe the port to 0.94.0 and 0.92.x are likely identical and nearly trivial and I was intending on doing it. The initial version was also for 0.90.x and a version for that will be ported as well since my crew will be supporting that version for a while. I may try to do a 0.90.x release at some point.
          Hide
          stack added a comment -

          The only issue that I can see – the bulk of the patch is hbck stuff, changes in tests and in hbck package only – is the addition to the master Interface where offline method is added. It needs to be moved to the end of the Interface so we don't break rolling restart (moving to the end of the Interface may be aprophyal but IIRC, thats the way to add a method w/o breaking backward compatibility). We should get this into 0.92 and 0.94 after trunk commit.

          Show
          stack added a comment - The only issue that I can see – the bulk of the patch is hbck stuff, changes in tests and in hbck package only – is the addition to the master Interface where offline method is added. It needs to be moved to the end of the Interface so we don't break rolling restart (moving to the end of the Interface may be aprophyal but IIRC, thats the way to add a method w/o breaking backward compatibility). We should get this into 0.92 and 0.94 after trunk commit.
          Hide
          Lars Hofhansl added a comment -

          What's the feeling about 0.94 vs 0.96? It seems the changes are isolated enough to be not too risky for 0.94.

          Show
          Lars Hofhansl added a comment - What's the feeling about 0.94 vs 0.96? It seems the changes are isolated enough to be not too risky for 0.94.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4280/#review5860
          -----------------------------------------------------------

          Ship it!

          I went through about half of this patch. Its plain that there have been a bunch of improvements. There is some great stuff in here. I'm +1 on committing this because its fat and full of goodies and then working on issues in new issues.

          src/main/java/org/apache/hadoop/hbase/HRegionInfo.java
          <https://reviews.apache.org/r/4280/#comment12760>

          Interesting. https://issues.apache.org/jira/browse/HBASE-5563 is all about adding this.

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java
          <https://reviews.apache.org/r/4280/#comment12761>

          Will this break compatibility? Put at the end of the Interface and it might be ok.

          I think we need this one. In the past, we've addressed this issue by having the user restart master.

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
          <https://reviews.apache.org/r/4280/#comment12764>

          Good one.

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
          <https://reviews.apache.org/r/4280/#comment12765>

          We overloaded the method here?

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment12766>

          Add <p> here is you want the line between paragraphs to come out in javadoc. You add a white space for each empty line.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment12770>

          that there 'are' no...

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment12772>

          s/handleful/handful/

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment12773>

          Capitalize 'replaces'

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment12774>

          Nice doc. Helps.

          Would suggest this args explaination stuff only be done in the usage, not in usage and up here in class comment. They have a tendency to diverge.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment12777>

          This is 'destructive' in that it changes whats on hdfs? If so, change the comment above.... it says 'determine'

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment12778>

          Declare and assign in the one step? As is, you declare two lines above and then assign it here

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment12779>

          Excellent

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment12780>

          Is this method like: http://hbase.apache.org/xref/org/apache/hadoop/hbase/util/FSUtils.html#469

          ?

          • Michael

          On 2012-03-10 01:04:58, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4280/

          -----------------------------------------------------------

          (Updated 2012-03-10 01:04:58)

          Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.

          Summary

          -------

          This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.

          1) No trackHTD method needed since we can read from the file system.

          2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.

          3) Fixed comparator in HRegionInfo

          4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.

          I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.

          This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 98f79fc

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3bcf899

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java ae468ca

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java e2bbbd0

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 720841c

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 38eb6a8

          src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java 1b3b6df

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548

          Diff: https://reviews.apache.org/r/4280/diff

          Testing

          -------

          Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/#review5860 ----------------------------------------------------------- Ship it! I went through about half of this patch. Its plain that there have been a bunch of improvements. There is some great stuff in here. I'm +1 on committing this because its fat and full of goodies and then working on issues in new issues. src/main/java/org/apache/hadoop/hbase/HRegionInfo.java < https://reviews.apache.org/r/4280/#comment12760 > Interesting. https://issues.apache.org/jira/browse/HBASE-5563 is all about adding this. src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java < https://reviews.apache.org/r/4280/#comment12761 > Will this break compatibility? Put at the end of the Interface and it might be ok. I think we need this one. In the past, we've addressed this issue by having the user restart master. src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java < https://reviews.apache.org/r/4280/#comment12764 > Good one. src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java < https://reviews.apache.org/r/4280/#comment12765 > We overloaded the method here? src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment12766 > Add <p> here is you want the line between paragraphs to come out in javadoc. You add a white space for each empty line. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment12770 > that there 'are' no... src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment12772 > s/handleful/handful/ src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment12773 > Capitalize 'replaces' src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment12774 > Nice doc. Helps. Would suggest this args explaination stuff only be done in the usage, not in usage and up here in class comment. They have a tendency to diverge. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment12777 > This is 'destructive' in that it changes whats on hdfs? If so, change the comment above.... it says 'determine' src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment12778 > Declare and assign in the one step? As is, you declare two lines above and then assign it here src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment12779 > Excellent src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment12780 > Is this method like: http://hbase.apache.org/xref/org/apache/hadoop/hbase/util/FSUtils.html#469 ? Michael On 2012-03-10 01:04:58, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/ ----------------------------------------------------------- (Updated 2012-03-10 01:04:58) Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl. Summary ------- This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences. 1) No trackHTD method needed since we can read from the file system. 2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables. 3) Fixed comparator in HRegionInfo 4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator. I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared. This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable. This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 98f79fc src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3bcf899 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java ae468ca src/main/java/org/apache/hadoop/hbase/master/HMaster.java e2bbbd0 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 720841c src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 38eb6a8 src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java 1b3b6df src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 Diff: https://reviews.apache.org/r/4280/diff Testing ------- Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions. Thanks, jmhsieh
          Hide
          Ted Yu added a comment -

          What was the reason for deprecating -fix ?

          I guess -fixAll may take long time to execute now that hbck is able to fix various types of problems.
          Otherwise it may be desirable to let -fix correct all the problems.

          Show
          Ted Yu added a comment - What was the reason for deprecating -fix ? I guess -fixAll may take long time to execute now that hbck is able to fix various types of problems. Otherwise it may be desirable to let -fix correct all the problems.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4280/#review5826
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment12728>

          Precede regioninfo with a dot.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment12724>

          Looking at code @ line 2835 below, it seems -fixAssignments and -fix are equivalent.
          What was the reason for deprecating -fix ?

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment12722>

          More than one option modifies .META. table.
          Shall we name this option fixMetaUsingRegionInfoOnHdfs ?

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment12721>

          Can we name this option fixRegionHolesOnHdfs ?
          It would be better to note which options can be run with cluster offline.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment12723>

          Name this fixRegionOverlapsOnHdfs ?

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/4280/#comment12725>

          white space.

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java
          <https://reviews.apache.org/r/4280/#comment12726>

          Should read 'callbacks for handling particular table integrity invariant violations detected.'

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java
          <https://reviews.apache.org/r/4280/#comment12730>

          Please add javadoc for the handleXXX methods on what scenario each fixes.

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java
          <https://reviews.apache.org/r/4280/#comment12729>

          Since region always belongs to some table, I suggest naming this method handleNonEmptyRegionStartKey()

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java
          <https://reviews.apache.org/r/4280/#comment12727>

          This class should be abstract.
          It is better to put it in its own file.

          • Ted

          On 2012-03-10 01:04:58, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4280/

          -----------------------------------------------------------

          (Updated 2012-03-10 01:04:58)

          Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.

          Summary

          -------

          This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.

          1) No trackHTD method needed since we can read from the file system.

          2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.

          3) Fixed comparator in HRegionInfo

          4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.

          I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.

          This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 98f79fc

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3bcf899

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java ae468ca

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java e2bbbd0

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 720841c

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 38eb6a8

          src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java 1b3b6df

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548

          Diff: https://reviews.apache.org/r/4280/diff

          Testing

          -------

          Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/#review5826 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment12728 > Precede regioninfo with a dot. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment12724 > Looking at code @ line 2835 below, it seems -fixAssignments and -fix are equivalent. What was the reason for deprecating -fix ? src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment12722 > More than one option modifies .META. table. Shall we name this option fixMetaUsingRegionInfoOnHdfs ? src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment12721 > Can we name this option fixRegionHolesOnHdfs ? It would be better to note which options can be run with cluster offline. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment12723 > Name this fixRegionOverlapsOnHdfs ? src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/4280/#comment12725 > white space. src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java < https://reviews.apache.org/r/4280/#comment12726 > Should read 'callbacks for handling particular table integrity invariant violations detected.' src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java < https://reviews.apache.org/r/4280/#comment12730 > Please add javadoc for the handleXXX methods on what scenario each fixes. src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java < https://reviews.apache.org/r/4280/#comment12729 > Since region always belongs to some table, I suggest naming this method handleNonEmptyRegionStartKey() src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java < https://reviews.apache.org/r/4280/#comment12727 > This class should be abstract. It is better to put it in its own file. Ted On 2012-03-10 01:04:58, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/ ----------------------------------------------------------- (Updated 2012-03-10 01:04:58) Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl. Summary ------- This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences. 1) No trackHTD method needed since we can read from the file system. 2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables. 3) Fixed comparator in HRegionInfo 4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator. I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared. This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable. This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 98f79fc src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3bcf899 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java ae468ca src/main/java/org/apache/hadoop/hbase/master/HMaster.java e2bbbd0 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 720841c src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 38eb6a8 src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java 1b3b6df src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 Diff: https://reviews.apache.org/r/4280/diff Testing ------- Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4280/#review5823
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
          <https://reviews.apache.org/r/4280/#comment12720>

          Can we deprecate this method in 0.94 and remove it in 0.96 ?

          • Ted

          On 2012-03-10 01:04:58, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4280/

          -----------------------------------------------------------

          (Updated 2012-03-10 01:04:58)

          Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.

          Summary

          -------

          This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.

          1) No trackHTD method needed since we can read from the file system.

          2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.

          3) Fixed comparator in HRegionInfo

          4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.

          I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.

          This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 98f79fc

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3bcf899

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java ae468ca

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java e2bbbd0

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 720841c

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 38eb6a8

          src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java 1b3b6df

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548

          Diff: https://reviews.apache.org/r/4280/diff

          Testing

          -------

          Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/#review5823 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java < https://reviews.apache.org/r/4280/#comment12720 > Can we deprecate this method in 0.94 and remove it in 0.96 ? Ted On 2012-03-10 01:04:58, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/ ----------------------------------------------------------- (Updated 2012-03-10 01:04:58) Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl. Summary ------- This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences. 1) No trackHTD method needed since we can read from the file system. 2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables. 3) Fixed comparator in HRegionInfo 4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator. I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared. This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable. This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 98f79fc src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3bcf899 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java ae468ca src/main/java/org/apache/hadoop/hbase/master/HMaster.java e2bbbd0 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 720841c src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 38eb6a8 src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java 1b3b6df src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 Diff: https://reviews.apache.org/r/4280/diff Testing ------- Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions. Thanks, jmhsieh
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12517811/hbase-5128-trunk.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 24 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -122 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 163 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.master.TestDistributedLogSplitting
          org.apache.hadoop.hbase.coprocessor.TestClassLoading
          org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTable
          org.apache.hadoop.hbase.master.TestRollingRestart
          org.apache.hadoop.hbase.client.TestAdmin
          org.apache.hadoop.hbase.client.TestShell

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1153//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1153//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1153//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12517811/hbase-5128-trunk.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 24 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -122 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 163 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestDistributedLogSplitting org.apache.hadoop.hbase.coprocessor.TestClassLoading org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTable org.apache.hadoop.hbase.master.TestRollingRestart org.apache.hadoop.hbase.client.TestAdmin org.apache.hadoop.hbase.client.TestShell Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1153//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1153//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1153//console This message is automatically generated.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4280/
          -----------------------------------------------------------

          Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl.

          Summary
          -------

          This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences.

          1) No trackHTD method needed since we can read from the file system.
          2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables.
          3) Fixed comparator in HRegionInfo
          4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator.

          I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared.

          This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable.

          This addresses bug HBASE-5128.
          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs


          src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 98f79fc
          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3bcf899
          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java ae468ca
          src/main/java/org/apache/hadoop/hbase/master/HMaster.java e2bbbd0
          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 720841c
          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c
          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b
          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 38eb6a8
          src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java 1b3b6df
          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d
          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1
          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548

          Diff: https://reviews.apache.org/r/4280/diff

          Testing
          -------

          Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4280/ ----------------------------------------------------------- Review request for hbase, Todd Lipcon, Ted Yu, and Lars Hofhansl. Summary ------- This version is similar to the 0.90.x version posted a few months back, but has a few new features and some minor differences. 1) No trackHTD method needed since we can read from the file system. 2) Added safeguards to prevent mega merges, and to isolate repairs to particular tables. 3) Fixed comparator in HRegionInfo 4) Fixed TestRegionObserverInterface so that it doesn't rely on bug in HRegionInfo comparator. I'll backport to 0.94/0.92 (which should be very similar) and update the 0.90 versions after this patch has mostly cleared. This version is not perfect (there are definitely cases not covered) but it think it is worth trying to get this in so that future reviews are more manageable. This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 98f79fc src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 3bcf899 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java ae468ca src/main/java/org/apache/hadoop/hbase/master/HMaster.java e2bbbd0 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 720841c src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java 5916d9c src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java d57bb6b src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 38eb6a8 src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java 1b3b6df src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java 937781d src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 0599da1 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 2b4cac8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java ebbeead src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java b175548 Diff: https://reviews.apache.org/r/4280/diff Testing ------- Unit tests cover many many situations and pass. Most "live" testing has been done on 0.90.x versions. Many improvements and features added from experience. Not much testing live on the trunk versions. Thanks, jmhsieh
          Hide
          Jonathan Hsieh added a comment -

          This version is getting very large, and though imperfect it is quite useful. Would prefer to get this in and then add follow in jiras to improve this.

          Attached version for trunk. Will backport to 0.92/0.94. Also have version for 0.90 but would like to get trunk version in first now.

          Show
          Jonathan Hsieh added a comment - This version is getting very large, and though imperfect it is quite useful. Would prefer to get this in and then add follow in jiras to improve this. Attached version for trunk. Will backport to 0.92/0.94. Also have version for 0.90 but would like to get trunk version in first now.
          Hide
          Jonathan Hsieh added a comment -

          Jimmy mentions this actually may be HBASE-4238.

          Show
          Jonathan Hsieh added a comment - Jimmy mentions this actually may be HBASE-4238 .
          Hide
          Jonathan Hsieh added a comment -

          Update:

          Recently found a case that may have been suffering from parent region not getting removed by the catalog janitor. Since we rely on hdfs being ground truth and this version did not check have offline/split status in meta, this resulted in the tool attempting to merge all regions into one mega region. Harsh mentioned, that the parent region cleanup issue might be related to (HBASE-4799) (target cluster didn't have this patch).

          Next cuts will add some failsafes – specifiers to repair only specific tables and to skip if a merge attempts to merge more than a specified number of regions into one region.

          Also, at the moment, I also have first cut versions that for 0.92/trunk but have one flaky test.

          Show
          Jonathan Hsieh added a comment - Update: Recently found a case that may have been suffering from parent region not getting removed by the catalog janitor. Since we rely on hdfs being ground truth and this version did not check have offline/split status in meta, this resulted in the tool attempting to merge all regions into one mega region. Harsh mentioned, that the parent region cleanup issue might be related to ( HBASE-4799 ) (target cluster didn't have this patch). Next cuts will add some failsafes – specifiers to repair only specific tables and to skip if a merge attempts to merge more than a specified number of regions into one region. Also, at the moment, I also have first cut versions that for 0.92/trunk but have one flaky test.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3435/#review4781
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/3435/#comment10510>

          Is this wrong? Should it be "> 0" here and "< 0" below?

          • Jimmy

          On 2012-01-25 17:24:41, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/3435/

          -----------------------------------------------------------

          (Updated 2012-01-25 17:24:41)

          Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.

          Summary

          -------

          I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out.

          Problem 1:

          In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things:

          1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information.

          2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states.

          What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair).

          Problem 2:

          Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait?

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 9520b95

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java f7ad064

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 7138d63

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 2c4a79e

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151

          Diff: https://reviews.apache.org/r/3435/diff

          Testing

          -------

          All unit tests pass sometimes. Some fail sometimes (generally the cases that fabricate new regions).

          Not ready for commit.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3435/#review4781 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/3435/#comment10510 > Is this wrong? Should it be "> 0" here and "< 0" below? Jimmy On 2012-01-25 17:24:41, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3435/ ----------------------------------------------------------- (Updated 2012-01-25 17:24:41) Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans. Summary ------- I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out. Problem 1: In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things: 1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information. 2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states. What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair). Problem 2: Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait? This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 9520b95 src/main/java/org/apache/hadoop/hbase/master/HMaster.java f7ad064 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2 src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 7138d63 src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57 src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 2c4a79e src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151 Diff: https://reviews.apache.org/r/3435/diff Testing ------- All unit tests pass sometimes. Some fail sometimes (generally the cases that fabricate new regions). Not ready for commit. Thanks, jmhsieh
          Hide
          Jonathan Hsieh added a comment -

          Luke,

          Good suggestion. I'll integrate that into the next revs.

          Show
          Jonathan Hsieh added a comment - Luke, Good suggestion. I'll integrate that into the next revs.
          Hide
          Luke Lu added a comment -

          It seems to me that there is nothing in the hbck that's hdfs specific. The comments/variables/methods that refer "Hdfs" should just use "Dfs", IMO.

          Show
          Luke Lu added a comment - It seems to me that there is nothing in the hbck that's hdfs specific. The comments/variables/methods that refer "Hdfs" should just use "Dfs", IMO.
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-01-25 18:01:32, Ted Yu wrote:

          > We should deprecate clearRegionFromTransition().

          done.

          On 2012-01-25 18:01:32, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 202

          > <https://reviews.apache.org/r/3435/diff/2/?file=68922#file68922line202>

          >

          > We should set interrupt flag.

          replaced with Thread.getCurrentThread().interrupt();

          On 2012-01-25 18:01:32, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 197

          > <https://reviews.apache.org/r/3435/diff/2/?file=68922#file68922line197>

          >

          > success is local variable.

          > Why don't we change return type to boolean and return its value ?

          I've cleaned this up to reuse the connection from an HBaseAdmin. v3 already has this update in some places – this is one of the places missed.

          On 2012-01-25 18:01:32, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1636

          > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line1636>

          >

          > This TODO has been implemented, so we can remove it.

          removed

          On 2012-01-25 18:01:32, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1131

          > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line1131>

          >

          > How about naming this method hasHdfsOnlyEdits() ?

          renamed to containsOnlyHdfsEdits

          On 2012-01-25 18:01:32, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1081

          > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line1081>

          >

          > This sentence should be moved before ' from ...'

          That code has been refactored in v3 but the message was a bit off. I've updated it.

          On 2012-01-25 18:01:32, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1083

          > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line1083>

          >

          > We should handle potential exception from this method.

          >

          > Maybe we should check the availability of this rpc outside the loop and set a flag indicating whether Master supports this RPC.

          This was something that I noted that I was going to handle in the next rev – checkout v3, I think it addresses the concern.

          • jmhsieh

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3435/#review4591
          -----------------------------------------------------------

          On 2012-01-25 17:24:41, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/3435/

          -----------------------------------------------------------

          (Updated 2012-01-25 17:24:41)

          Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.

          Summary

          -------

          I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out.

          Problem 1:

          In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things:

          1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information.

          2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states.

          What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair).

          Problem 2:

          Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait?

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 9520b95

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java f7ad064

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 7138d63

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 2c4a79e

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151

          Diff: https://reviews.apache.org/r/3435/diff

          Testing

          -------

          All unit tests pass sometimes. Some fail sometimes (generally the cases that fabricate new regions).

          Not ready for commit.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-01-25 18:01:32, Ted Yu wrote: > We should deprecate clearRegionFromTransition(). done. On 2012-01-25 18:01:32, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 202 > < https://reviews.apache.org/r/3435/diff/2/?file=68922#file68922line202 > > > We should set interrupt flag. replaced with Thread.getCurrentThread().interrupt(); On 2012-01-25 18:01:32, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 197 > < https://reviews.apache.org/r/3435/diff/2/?file=68922#file68922line197 > > > success is local variable. > Why don't we change return type to boolean and return its value ? I've cleaned this up to reuse the connection from an HBaseAdmin. v3 already has this update in some places – this is one of the places missed. On 2012-01-25 18:01:32, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1636 > < https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line1636 > > > This TODO has been implemented, so we can remove it. removed On 2012-01-25 18:01:32, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1131 > < https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line1131 > > > How about naming this method hasHdfsOnlyEdits() ? renamed to containsOnlyHdfsEdits On 2012-01-25 18:01:32, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1081 > < https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line1081 > > > This sentence should be moved before ' from ...' That code has been refactored in v3 but the message was a bit off. I've updated it. On 2012-01-25 18:01:32, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1083 > < https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line1083 > > > We should handle potential exception from this method. > > Maybe we should check the availability of this rpc outside the loop and set a flag indicating whether Master supports this RPC. This was something that I noted that I was going to handle in the next rev – checkout v3, I think it addresses the concern. jmhsieh ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3435/#review4591 ----------------------------------------------------------- On 2012-01-25 17:24:41, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3435/ ----------------------------------------------------------- (Updated 2012-01-25 17:24:41) Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans. Summary ------- I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out. Problem 1: In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things: 1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information. 2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states. What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair). Problem 2: Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait? This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 9520b95 src/main/java/org/apache/hadoop/hbase/master/HMaster.java f7ad064 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2 src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 7138d63 src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57 src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 2c4a79e src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151 Diff: https://reviews.apache.org/r/3435/diff Testing ------- All unit tests pass sometimes. Some fail sometimes (generally the cases that fabricate new regions). Not ready for commit. Thanks, jmhsieh
          Hide
          Jonathan Hsieh added a comment -

          It was also suggested that I need to worry about compactions due to a HRegion flush when I close regions during overlap merging. At least in 0.90, this is not actually necessary – the closeRegion HMaster side actually flushes but ignores the return value of internalFlushcache return flag that specifies if a region needs to be compacted.

          Show
          Jonathan Hsieh added a comment - It was also suggested that I need to worry about compactions due to a HRegion flush when I close regions during overlap merging. At least in 0.90, this is not actually necessary – the closeRegion HMaster side actually flushes but ignores the return value of internalFlushcache return flag that specifies if a region needs to be compacted.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3435/#review4591
          -----------------------------------------------------------

          We should deprecate clearRegionFromTransition().

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/3435/#comment10238>

          I think a boolean return value would help determine the outcome of the action.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/3435/#comment10237>

          This sentence should be moved before ' from ...'

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/3435/#comment10234>

          We should handle potential exception from this method.

          Maybe we should check the availability of this rpc outside the loop and set a flag indicating whether Master supports this RPC.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/3435/#comment10240>

          I would expect a boolean return value since we may return without throwing exception (line 1125)

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/3435/#comment10239>

          How about naming this method hasHdfsOnlyEdits() ?

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/3435/#comment10233>

          This TODO has been implemented, so we can remove it.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/3435/#comment10232>

          More action is needed beyond a WARN message, right ?

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java
          <https://reviews.apache.org/r/3435/#comment10235>

          success is local variable.
          Why don't we change return type to boolean and return its value ?

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java
          <https://reviews.apache.org/r/3435/#comment10236>

          We should set interrupt flag.

          • Ted

          On 2012-01-25 17:24:41, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/3435/

          -----------------------------------------------------------

          (Updated 2012-01-25 17:24:41)

          Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.

          Summary

          -------

          I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out.

          Problem 1:

          In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things:

          1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information.

          2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states.

          What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair).

          Problem 2:

          Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait?

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 9520b95

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java f7ad064

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 7138d63

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 2c4a79e

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151

          Diff: https://reviews.apache.org/r/3435/diff

          Testing

          -------

          All unit tests pass sometimes. Some fail sometimes (generally the cases that fabricate new regions).

          Not ready for commit.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3435/#review4591 ----------------------------------------------------------- We should deprecate clearRegionFromTransition(). src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/3435/#comment10238 > I think a boolean return value would help determine the outcome of the action. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/3435/#comment10237 > This sentence should be moved before ' from ...' src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/3435/#comment10234 > We should handle potential exception from this method. Maybe we should check the availability of this rpc outside the loop and set a flag indicating whether Master supports this RPC. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/3435/#comment10240 > I would expect a boolean return value since we may return without throwing exception (line 1125) src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/3435/#comment10239 > How about naming this method hasHdfsOnlyEdits() ? src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/3435/#comment10233 > This TODO has been implemented, so we can remove it. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/3435/#comment10232 > More action is needed beyond a WARN message, right ? src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java < https://reviews.apache.org/r/3435/#comment10235 > success is local variable. Why don't we change return type to boolean and return its value ? src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java < https://reviews.apache.org/r/3435/#comment10236 > We should set interrupt flag. Ted On 2012-01-25 17:24:41, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3435/ ----------------------------------------------------------- (Updated 2012-01-25 17:24:41) Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans. Summary ------- I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out. Problem 1: In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things: 1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information. 2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states. What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair). Problem 2: Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait? This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 9520b95 src/main/java/org/apache/hadoop/hbase/master/HMaster.java f7ad064 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2 src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 7138d63 src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57 src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 2c4a79e src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151 Diff: https://reviews.apache.org/r/3435/diff Testing ------- All unit tests pass sometimes. Some fail sometimes (generally the cases that fabricate new regions). Not ready for commit. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3435/
          -----------------------------------------------------------

          (Updated 2012-01-25 17:24:41.277326)

          Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.

          Changes
          -------

          This version includes updates after testing against real online but idle clusters with real induced corruptions. This was hbck was tested successfully against apache/0.90+this patch branch region servers and regionservers on cdh3u2 (an 0.90.4-based hbase without the new offline method).

          I'm going to post usage description and images I've created to explain this better on the JIRA.

          High level changes in this rev.

          • hbck now wraps calls to the offline method and will use unasssign if the target region server does not support offline.
          • restructured hdfs integrity repairs into more phases – when compound problems were present we'd get into a loop where orphan repair would cause new overlaps on a subsequent integrity repair iteration. This new approach should be deterministic. The new phases are 1) Find hdfs holes and patch (post condition: no more holes), 2) adopt orphan hdfs regions (post condition: no orphan data in hdfs) 3) reload and fix overlaps (precondition: no holes but overlaps possible; post condition: no overlaps). Previously integrity repairs would interate doing all three until it converged (but this didn't always happen in practice!).
          • Added more command line options that allow this hbck to only attempt certain repairs (which is necessary to get overlap repairs to work more deterministically, and needed in to get non-offline supporting hbases to converge)
          • Added a few more test cases for new corruptions.

          One big caveat with this rev is that the hbase was online but idle (no writes happening). It was also suggested that I need to worry about compactions when I close regions during overlap merging (JD – I didn't see anything in OnlineMerge – why wasn't this a concern there?). If so, I'd like advice on how to add guards to protect the user (is a glaring warning message or requiring confirmation sufficient?). I'm going to do some initial testing on online and active cases – but ideally would like this to come in follow on jiras.

          Summary
          -------

          I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out.

          Problem 1:

          In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things:

          1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information.
          2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states.

          What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair).

          Problem 2:

          Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait?

          This addresses bug HBASE-5128.
          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6
          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 9520b95
          src/main/java/org/apache/hadoop/hbase/master/HMaster.java f7ad064
          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d
          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b
          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2
          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 7138d63
          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57
          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 2c4a79e
          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151

          Diff: https://reviews.apache.org/r/3435/diff

          Testing
          -------

          All unit tests pass sometimes. Some fail sometimes (generally the cases that fabricate new regions).

          Not ready for commit.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3435/ ----------------------------------------------------------- (Updated 2012-01-25 17:24:41.277326) Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans. Changes ------- This version includes updates after testing against real online but idle clusters with real induced corruptions. This was hbck was tested successfully against apache/0.90+this patch branch region servers and regionservers on cdh3u2 (an 0.90.4-based hbase without the new offline method). I'm going to post usage description and images I've created to explain this better on the JIRA. High level changes in this rev. hbck now wraps calls to the offline method and will use unasssign if the target region server does not support offline. restructured hdfs integrity repairs into more phases – when compound problems were present we'd get into a loop where orphan repair would cause new overlaps on a subsequent integrity repair iteration. This new approach should be deterministic. The new phases are 1) Find hdfs holes and patch (post condition: no more holes), 2) adopt orphan hdfs regions (post condition: no orphan data in hdfs) 3) reload and fix overlaps (precondition: no holes but overlaps possible; post condition: no overlaps). Previously integrity repairs would interate doing all three until it converged (but this didn't always happen in practice!). Added more command line options that allow this hbck to only attempt certain repairs (which is necessary to get overlap repairs to work more deterministically, and needed in to get non-offline supporting hbases to converge) Added a few more test cases for new corruptions. One big caveat with this rev is that the hbase was online but idle (no writes happening). It was also suggested that I need to worry about compactions when I close regions during overlap merging (JD – I didn't see anything in OnlineMerge – why wasn't this a concern there?). If so, I'd like advice on how to add guards to protect the user (is a glaring warning message or requiring confirmation sufficient?). I'm going to do some initial testing on online and active cases – but ideally would like this to come in follow on jiras. Summary ------- I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out. Problem 1: In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things: 1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information. 2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states. What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair). Problem 2: Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait? This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs (updated) src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 9520b95 src/main/java/org/apache/hadoop/hbase/master/HMaster.java f7ad064 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2 src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java 7138d63 src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57 src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckComparator.java 2c4a79e src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151 Diff: https://reviews.apache.org/r/3435/diff Testing ------- All unit tests pass sometimes. Some fail sometimes (generally the cases that fabricate new regions). Not ready for commit. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-01-11 21:15:13, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1586

          > <https://reviews.apache.org/r/3435/diff/1/?file=67172#file67172line1586>

          >

          > Should be 'to end key'.

          update this and handful of other comments.

          On 2012-01-11 21:15:13, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1594

          > <https://reviews.apache.org/r/3435/diff/1/?file=67172#file67172line1594>

          >

          > Should insert some text between newRegion and region.

          updated

          On 2012-01-11 21:15:13, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1600

          > <https://reviews.apache.org/r/3435/diff/1/?file=67172#file67172line1600>

          >

          > This should be outside the for loop.

          done

          On 2012-01-11 21:15:13, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1602

          > <https://reviews.apache.org/r/3435/diff/1/?file=67172#file67172line1602>

          >

          > Space between > and 0.

          done

          • jmhsieh

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3435/#review4317
          -----------------------------------------------------------

          On 2012-01-13 22:49:33, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/3435/

          -----------------------------------------------------------

          (Updated 2012-01-13 22:49:33)

          Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.

          Summary

          -------

          I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out.

          Problem 1:

          In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things:

          1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information.

          2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states.

          What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair).

          Problem 2:

          Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait?

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 330a7cc

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java 3c7b68d

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 3e8729d

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java 4a09ce2

          Diff: https://reviews.apache.org/r/3435/diff

          Testing

          -------

          All unit tests pass sometimes. Some fail sometimes (generally the cases that fabricate new regions).

          Not ready for commit.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-01-11 21:15:13, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1586 > < https://reviews.apache.org/r/3435/diff/1/?file=67172#file67172line1586 > > > Should be 'to end key'. update this and handful of other comments. On 2012-01-11 21:15:13, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1594 > < https://reviews.apache.org/r/3435/diff/1/?file=67172#file67172line1594 > > > Should insert some text between newRegion and region. updated On 2012-01-11 21:15:13, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1600 > < https://reviews.apache.org/r/3435/diff/1/?file=67172#file67172line1600 > > > This should be outside the for loop. done On 2012-01-11 21:15:13, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1602 > < https://reviews.apache.org/r/3435/diff/1/?file=67172#file67172line1602 > > > Space between > and 0. done jmhsieh ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3435/#review4317 ----------------------------------------------------------- On 2012-01-13 22:49:33, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3435/ ----------------------------------------------------------- (Updated 2012-01-13 22:49:33) Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans. Summary ------- I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out. Problem 1: In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things: 1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information. 2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states. What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair). Problem 2: Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait? This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 330a7cc src/main/java/org/apache/hadoop/hbase/master/HMaster.java 3c7b68d src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2 src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 3e8729d src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java 4a09ce2 Diff: https://reviews.apache.org/r/3435/diff Testing ------- All unit tests pass sometimes. Some fail sometimes (generally the cases that fabricate new regions). Not ready for commit. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-01-14 05:43:38, Lars Hofhansl wrote:

          > src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java, line 586

          > <https://reviews.apache.org/r/3435/diff/2/?file=68919#file68919line586>

          >

          > I liked this better before

          I probably broke this out to be easier to step debug. I can restore.

          On 2012-01-14 05:43:38, Lars Hofhansl wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 154

          > <https://reviews.apache.org/r/3435/diff/2/?file=68922#file68922line154>

          >

          > No wait in case of exception. Is that by design?

          nice catch.

          On 2012-01-14 05:43:38, Lars Hofhansl wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1083

          > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line1083>

          >

          > I think you said in the intro, that you need to check the availability of this rpc.

          done in next version.

          On 2012-01-14 05:43:38, Lars Hofhansl wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1072

          > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line1072>

          >

          > <0.90.6?

          updated to 0.90.6, with the assumption that this feature will not make it there, (but hopefully in to a 0.90.7)

          On 2012-01-14 05:43:38, Lars Hofhansl wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2275

          > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line2275>

          >

          > I know this is not new, but this ErrorReporter is used for status messages as well as error reporting. Should maybe have a different name.

          >

          > Also should messages go to STDOUT (out) and error go to STDERR (err)?

          TODO – I'll follow up on this after the next round.

          On 2012-01-14 05:43:38, Lars Hofhansl wrote:

          > src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 1053

          > <https://reviews.apache.org/r/3435/diff/2/?file=68920#file68920line1053>

          >

          > Should we add a double check here that the region is in fact offline (by checking .META.) or is that too expensive/not-needed?

          >

          > I'm thinking, once this method exists folks will eventually called for other reasons.

          Currently, we needed this method to explicitly remove information from the Master's memory. In the cases where this is used, I've "directly" removed data from meta (Delete into .META.) and closed the regions on region servers directly (HRegionInterface#closeRegion).

          I haven't worked it out completely yet but it probably makes sense to fix closeRegion to properly add an param that will remove this in memory master state as well. I was under the gun get something working out, and now having accomplished this I'm definitely open to refactor this to make it saner and to clean this up more.

          On 2012-01-14 05:43:38, Lars Hofhansl wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 90

          > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line90>

          >

          > Nice documentation. This tool is awesome.

          thanks!

          • jmhsieh

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3435/#review4384
          -----------------------------------------------------------

          On 2012-01-13 22:49:33, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/3435/

          -----------------------------------------------------------

          (Updated 2012-01-13 22:49:33)

          Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.

          Summary

          -------

          I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out.

          Problem 1:

          In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things:

          1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information.

          2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states.

          What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair).

          Problem 2:

          Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait?

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 330a7cc

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java 3c7b68d

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 3e8729d

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java 4a09ce2

          Diff: https://reviews.apache.org/r/3435/diff

          Testing

          -------

          All unit tests pass sometimes. Some fail sometimes (generally the cases that fabricate new regions).

          Not ready for commit.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-01-14 05:43:38, Lars Hofhansl wrote: > src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java, line 586 > < https://reviews.apache.org/r/3435/diff/2/?file=68919#file68919line586 > > > I liked this better before I probably broke this out to be easier to step debug. I can restore. On 2012-01-14 05:43:38, Lars Hofhansl wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java, line 154 > < https://reviews.apache.org/r/3435/diff/2/?file=68922#file68922line154 > > > No wait in case of exception. Is that by design? nice catch. On 2012-01-14 05:43:38, Lars Hofhansl wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1083 > < https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line1083 > > > I think you said in the intro, that you need to check the availability of this rpc. done in next version. On 2012-01-14 05:43:38, Lars Hofhansl wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 1072 > < https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line1072 > > > <0.90.6? updated to 0.90.6, with the assumption that this feature will not make it there, (but hopefully in to a 0.90.7) On 2012-01-14 05:43:38, Lars Hofhansl wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 2275 > < https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line2275 > > > I know this is not new, but this ErrorReporter is used for status messages as well as error reporting. Should maybe have a different name. > > Also should messages go to STDOUT (out) and error go to STDERR (err)? TODO – I'll follow up on this after the next round. On 2012-01-14 05:43:38, Lars Hofhansl wrote: > src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 1053 > < https://reviews.apache.org/r/3435/diff/2/?file=68920#file68920line1053 > > > Should we add a double check here that the region is in fact offline (by checking .META.) or is that too expensive/not-needed? > > I'm thinking, once this method exists folks will eventually called for other reasons. Currently, we needed this method to explicitly remove information from the Master's memory. In the cases where this is used, I've "directly" removed data from meta (Delete into .META.) and closed the regions on region servers directly (HRegionInterface#closeRegion). I haven't worked it out completely yet but it probably makes sense to fix closeRegion to properly add an param that will remove this in memory master state as well. I was under the gun get something working out, and now having accomplished this I'm definitely open to refactor this to make it saner and to clean this up more. On 2012-01-14 05:43:38, Lars Hofhansl wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 90 > < https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line90 > > > Nice documentation. This tool is awesome. thanks! jmhsieh ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3435/#review4384 ----------------------------------------------------------- On 2012-01-13 22:49:33, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3435/ ----------------------------------------------------------- (Updated 2012-01-13 22:49:33) Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans. Summary ------- I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out. Problem 1: In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things: 1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information. 2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states. What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair). Problem 2: Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait? This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 330a7cc src/main/java/org/apache/hadoop/hbase/master/HMaster.java 3c7b68d src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2 src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 3e8729d src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java 4a09ce2 Diff: https://reviews.apache.org/r/3435/diff Testing ------- All unit tests pass sometimes. Some fail sometimes (generally the cases that fabricate new regions). Not ready for commit. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-01-14 00:15:01, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 91

          > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line91>

          >

          > I think '.META.' should be used.

          ok

          On 2012-01-14 00:15:01, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 118

          > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line118>

          >

          > Should read 'that it was assigned to'

          ok

          On 2012-01-14 00:15:01, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 154

          > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line154>

          >

          > This is about fixing region assignment, right ?

          > Better include that in javadoc.

          done

          On 2012-01-14 00:15:01, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 121

          > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line121>

          >

          > Should read 'repairs require hbase ...'

          >

          > 'to' at the end is not needed.

          done

          On 2012-01-14 00:15:01, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 172

          > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line172>

          >

          > Should read ' and correct '

          done

          On 2012-01-14 00:15:01, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 174

          > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line174>

          >

          > Would regionInfoMap be a better name ?

          done

          On 2012-01-14 00:15:01, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 270

          > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line270>

          >

          > Please correct this sentence's syntax.

          sure

          On 2012-01-14 00:15:01, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 280

          > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line280>

          >

          > We should impose maximum number of iterations for the loop, right ?

          good point.

          On 2012-01-14 00:15:01, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 287

          > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line287>

          >

          > Should read 'method requires cluster to be online ...'

          done.

          On 2012-01-14 00:15:01, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 289

          > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line289>

          >

          > Should read ' to be consistent'

          reworded

          On 2012-01-14 00:15:01, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 337

          > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line337>

          >

          > Should be called checkAndFixIntegrity()

          ok.

          On 2012-01-14 00:15:01, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 334

          > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line334>

          >

          > Should be called checkAndFixConsistency()

          ok

          On 2012-01-14 00:15:01, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 343

          > <https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line343>

          >

          > This sentence can be omitted.

          > If you keep it, please move it after the @return line.

          removed

          • jmhsieh

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3435/#review4379
          -----------------------------------------------------------

          On 2012-01-13 22:49:33, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/3435/

          -----------------------------------------------------------

          (Updated 2012-01-13 22:49:33)

          Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.

          Summary

          -------

          I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out.

          Problem 1:

          In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things:

          1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information.

          2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states.

          What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair).

          Problem 2:

          Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait?

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 330a7cc

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java 3c7b68d

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 3e8729d

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java 4a09ce2

          Diff: https://reviews.apache.org/r/3435/diff

          Testing

          -------

          All unit tests pass sometimes. Some fail sometimes (generally the cases that fabricate new regions).

          Not ready for commit.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-01-14 00:15:01, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 91 > < https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line91 > > > I think '.META.' should be used. ok On 2012-01-14 00:15:01, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 118 > < https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line118 > > > Should read 'that it was assigned to' ok On 2012-01-14 00:15:01, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 154 > < https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line154 > > > This is about fixing region assignment, right ? > Better include that in javadoc. done On 2012-01-14 00:15:01, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 121 > < https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line121 > > > Should read 'repairs require hbase ...' > > 'to' at the end is not needed. done On 2012-01-14 00:15:01, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 172 > < https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line172 > > > Should read ' and correct ' done On 2012-01-14 00:15:01, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 174 > < https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line174 > > > Would regionInfoMap be a better name ? done On 2012-01-14 00:15:01, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 270 > < https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line270 > > > Please correct this sentence's syntax. sure On 2012-01-14 00:15:01, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 280 > < https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line280 > > > We should impose maximum number of iterations for the loop, right ? good point. On 2012-01-14 00:15:01, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 287 > < https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line287 > > > Should read 'method requires cluster to be online ...' done. On 2012-01-14 00:15:01, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 289 > < https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line289 > > > Should read ' to be consistent' reworded On 2012-01-14 00:15:01, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 337 > < https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line337 > > > Should be called checkAndFixIntegrity() ok. On 2012-01-14 00:15:01, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 334 > < https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line334 > > > Should be called checkAndFixConsistency() ok On 2012-01-14 00:15:01, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java, line 343 > < https://reviews.apache.org/r/3435/diff/2/?file=68921#file68921line343 > > > This sentence can be omitted. > If you keep it, please move it after the @return line. removed jmhsieh ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3435/#review4379 ----------------------------------------------------------- On 2012-01-13 22:49:33, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3435/ ----------------------------------------------------------- (Updated 2012-01-13 22:49:33) Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans. Summary ------- I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out. Problem 1: In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things: 1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information. 2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states. What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair). Problem 2: Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait? This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 330a7cc src/main/java/org/apache/hadoop/hbase/master/HMaster.java 3c7b68d src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2 src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 3e8729d src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java 4a09ce2 Diff: https://reviews.apache.org/r/3435/diff Testing ------- All unit tests pass sometimes. Some fail sometimes (generally the cases that fabricate new regions). Not ready for commit. Thanks, jmhsieh
          Hide
          Doug Meil added a comment -

          Hey guys, a bunch of comments just wound up on a documentation ticket I just did (HBASE-5218) that I'm pretty sure were intended for this ticket.

          Show
          Doug Meil added a comment - Hey guys, a bunch of comments just wound up on a documentation ticket I just did ( HBASE-5218 ) that I'm pretty sure were intended for this ticket.
          Hide
          Jonathan Hsieh added a comment -

          I've been testing using failed splits generated by cycling the hbase master while doing a heavy write load with a high split frequency prior to HBASE-5196 patch. A subset of problems has been fixed automatically but it seems to be a class of problems with splitting regions that isn't being handled properly. This actually is probably the case we are most likely to encounter.

          Show
          Jonathan Hsieh added a comment - I've been testing using failed splits generated by cycling the hbase master while doing a heavy write load with a high split frequency prior to HBASE-5196 patch. A subset of problems has been fixed automatically but it seems to be a class of problems with splitting regions that isn't being handled properly. This actually is probably the case we are most likely to encounter.
          Hide
          Jonathan Hsieh added a comment -

          @Ted sounds good.

          Show
          Jonathan Hsieh added a comment - @Ted sounds good.
          Hide
          Jonathan Hsieh added a comment -

          @Ted sounds good.

          Show
          Jonathan Hsieh added a comment - @Ted sounds good.
          Hide
          Ted Yu added a comment - - edited

          I think we should keep offline() and deprecate clearRegionFromTransition().
          Let's remove clearRegionFromTransition() in another JIRA.

          Show
          Ted Yu added a comment - - edited I think we should keep offline() and deprecate clearRegionFromTransition(). Let's remove clearRegionFromTransition() in another JIRA.
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-01-13 23:33:00, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 1059

          > <https://reviews.apache.org/r/3435/diff/2/?file=68920#file68920line1059>

          >

          > Good question. They look the same to me.

          > I think one, possibly clearRegionFromTransition, should be removed.

          Think I should remove this in this patch or do a separate jira for it?

          On 2012-01-13 23:33:00, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java, line 1744

          > <https://reviews.apache.org/r/3435/diff/2/?file=68919#file68919line1744>

          >

          > I think you meant regionOffline()

          yes.

          • jmhsieh

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3435/#review4378
          -----------------------------------------------------------

          On 2012-01-13 22:49:33, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/3435/

          -----------------------------------------------------------

          (Updated 2012-01-13 22:49:33)

          Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.

          Summary

          -------

          I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out.

          Problem 1:

          In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things:

          1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information.

          2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states.

          What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair).

          Problem 2:

          Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait?

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 330a7cc

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java 3c7b68d

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 3e8729d

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java 4a09ce2

          Diff: https://reviews.apache.org/r/3435/diff

          Testing

          -------

          All unit tests pass sometimes. Some fail sometimes (generally the cases that fabricate new regions).

          Not ready for commit.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-01-13 23:33:00, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 1059 > < https://reviews.apache.org/r/3435/diff/2/?file=68920#file68920line1059 > > > Good question. They look the same to me. > I think one, possibly clearRegionFromTransition, should be removed. Think I should remove this in this patch or do a separate jira for it? On 2012-01-13 23:33:00, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java, line 1744 > < https://reviews.apache.org/r/3435/diff/2/?file=68919#file68919line1744 > > > I think you meant regionOffline() yes. jmhsieh ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3435/#review4378 ----------------------------------------------------------- On 2012-01-13 22:49:33, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3435/ ----------------------------------------------------------- (Updated 2012-01-13 22:49:33) Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans. Summary ------- I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out. Problem 1: In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things: 1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information. 2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states. What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair). Problem 2: Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait? This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 330a7cc src/main/java/org/apache/hadoop/hbase/master/HMaster.java 3c7b68d src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2 src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 3e8729d src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java 4a09ce2 Diff: https://reviews.apache.org/r/3435/diff Testing ------- All unit tests pass sometimes. Some fail sometimes (generally the cases that fabricate new regions). Not ready for commit. Thanks, jmhsieh
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Jon
          Let me check. May be take your time Jon before getting it through. Not a hurry.
          May be we can take it in next release? pls don't mind.

          Show
          ramkrishna.s.vasudevan added a comment - @Jon Let me check. May be take your time Jon before getting it through. Not a hurry. May be we can take it in next release? pls don't mind.
          Hide
          Jonathan Hsieh added a comment -

          @Ram

          I think I need a few days (test/polish) to get this completely ready – if you are willing to wait/review to get this through I'm willing to hack on it today/tommorrow to get it through.

          Show
          Jonathan Hsieh added a comment - @Ram I think I need a few days (test/polish) to get this completely ready – if you are willing to wait/review to get this through I'm willing to hack on it today/tommorrow to get it through.
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Jon
          You want this in 0.90.6 ? Actually i was planning to take a release cut by today?
          One more thing I was working on HBASE-5155 which changes some behaviour on Enable and Disable tables (in 0.90 branch). You can take a look at it(for your patch). I will check your patch also.

          Show
          ramkrishna.s.vasudevan added a comment - @Jon You want this in 0.90.6 ? Actually i was planning to take a release cut by today? One more thing I was working on HBASE-5155 which changes some behaviour on Enable and Disable tables (in 0.90 branch). You can take a look at it(for your patch). I will check your patch also.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3435/#review4384
          -----------------------------------------------------------

          Did a quick flyby... Looks great.

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
          <https://reviews.apache.org/r/3435/#comment9856>

          I liked this better before

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          <https://reviews.apache.org/r/3435/#comment9857>

          Should we add a double check here that the region is in fact offline (by checking .META.) or is that too expensive/not-needed?

          I'm thinking, once this method exists folks will eventually called for other reasons.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/3435/#comment9858>

          Nice documentation. This tool is awesome.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/3435/#comment9860>

          nice!

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/3435/#comment9861>

          Yeah, strange that we do not follow posix here.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/3435/#comment9863>

          <0.90.6?

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/3435/#comment9864>

          I think you said in the intro, that you need to check the availability of this rpc.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/3435/#comment9859>

          I know this is not new, but this ErrorReporter is used for status messages as well as error reporting. Should maybe have a different name.

          Also should messages go to STDOUT (out) and error go to STDERR (err)?

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java
          <https://reviews.apache.org/r/3435/#comment9865>

          No wait in case of exception. Is that by design?

          • Lars

          On 2012-01-13 22:49:33, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/3435/

          -----------------------------------------------------------

          (Updated 2012-01-13 22:49:33)

          Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.

          Summary

          -------

          I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out.

          Problem 1:

          In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things:

          1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information.

          2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states.

          What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair).

          Problem 2:

          Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait?

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 330a7cc

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java 3c7b68d

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 3e8729d

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java 4a09ce2

          Diff: https://reviews.apache.org/r/3435/diff

          Testing

          -------

          All unit tests pass sometimes. Some fail sometimes (generally the cases that fabricate new regions).

          Not ready for commit.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3435/#review4384 ----------------------------------------------------------- Did a quick flyby... Looks great. src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java < https://reviews.apache.org/r/3435/#comment9856 > I liked this better before src/main/java/org/apache/hadoop/hbase/master/HMaster.java < https://reviews.apache.org/r/3435/#comment9857 > Should we add a double check here that the region is in fact offline (by checking .META.) or is that too expensive/not-needed? I'm thinking, once this method exists folks will eventually called for other reasons. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/3435/#comment9858 > Nice documentation. This tool is awesome. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/3435/#comment9860 > nice! src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/3435/#comment9861 > Yeah, strange that we do not follow posix here. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/3435/#comment9863 > <0.90.6? src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/3435/#comment9864 > I think you said in the intro, that you need to check the availability of this rpc. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/3435/#comment9859 > I know this is not new, but this ErrorReporter is used for status messages as well as error reporting. Should maybe have a different name. Also should messages go to STDOUT (out) and error go to STDERR (err)? src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java < https://reviews.apache.org/r/3435/#comment9865 > No wait in case of exception. Is that by design? Lars On 2012-01-13 22:49:33, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3435/ ----------------------------------------------------------- (Updated 2012-01-13 22:49:33) Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans. Summary ------- I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out. Problem 1: In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things: 1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information. 2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states. What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair). Problem 2: Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait? This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 330a7cc src/main/java/org/apache/hadoop/hbase/master/HMaster.java 3c7b68d src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2 src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 3e8729d src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java 4a09ce2 Diff: https://reviews.apache.org/r/3435/diff Testing ------- All unit tests pass sometimes. Some fail sometimes (generally the cases that fabricate new regions). Not ready for commit. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3435/#review4379
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/3435/#comment9839>

          I think '.META.' should be used.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/3435/#comment9840>

          Should read 'that it was assigned to'

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/3435/#comment9841>

          Should read 'repairs require hbase ...'

          'to' at the end is not needed.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/3435/#comment9842>

          This is about fixing region assignment, right ?
          Better include that in javadoc.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/3435/#comment9843>

          Should read ' and correct '

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/3435/#comment9844>

          Would regionInfoMap be a better name ?

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/3435/#comment9845>

          Please correct this sentence's syntax.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/3435/#comment9846>

          We should impose maximum number of iterations for the loop, right ?

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/3435/#comment9848>

          Should read 'method requires cluster to be online ...'

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/3435/#comment9847>

          Should read ' to be consistent'

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/3435/#comment9849>

          The method is called Repair, so the return value should be number of errors fixed, right ?
          I think a Pair return value would allow both errors detected and errors fixed to be returned.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/3435/#comment9850>

          Should be called checkAndFixConsistency()

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/3435/#comment9851>

          Should be called checkAndFixIntegrity()

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/3435/#comment9852>

          This sentence can be omitted.
          If you keep it, please move it after the @return line.

          • Ted

          On 2012-01-13 22:49:33, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/3435/

          -----------------------------------------------------------

          (Updated 2012-01-13 22:49:33)

          Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.

          Summary

          -------

          I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out.

          Problem 1:

          In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things:

          1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information.

          2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states.

          What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair).

          Problem 2:

          Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait?

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 330a7cc

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java 3c7b68d

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 3e8729d

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java 4a09ce2

          Diff: https://reviews.apache.org/r/3435/diff

          Testing

          -------

          All unit tests pass sometimes. Some fail sometimes (generally the cases that fabricate new regions).

          Not ready for commit.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3435/#review4379 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/3435/#comment9839 > I think '.META.' should be used. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/3435/#comment9840 > Should read 'that it was assigned to' src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/3435/#comment9841 > Should read 'repairs require hbase ...' 'to' at the end is not needed. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/3435/#comment9842 > This is about fixing region assignment, right ? Better include that in javadoc. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/3435/#comment9843 > Should read ' and correct ' src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/3435/#comment9844 > Would regionInfoMap be a better name ? src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/3435/#comment9845 > Please correct this sentence's syntax. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/3435/#comment9846 > We should impose maximum number of iterations for the loop, right ? src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/3435/#comment9848 > Should read 'method requires cluster to be online ...' src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/3435/#comment9847 > Should read ' to be consistent' src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/3435/#comment9849 > The method is called Repair, so the return value should be number of errors fixed, right ? I think a Pair return value would allow both errors detected and errors fixed to be returned. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/3435/#comment9850 > Should be called checkAndFixConsistency() src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/3435/#comment9851 > Should be called checkAndFixIntegrity() src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/3435/#comment9852 > This sentence can be omitted. If you keep it, please move it after the @return line. Ted On 2012-01-13 22:49:33, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3435/ ----------------------------------------------------------- (Updated 2012-01-13 22:49:33) Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans. Summary ------- I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out. Problem 1: In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things: 1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information. 2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states. What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair). Problem 2: Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait? This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 330a7cc src/main/java/org/apache/hadoop/hbase/master/HMaster.java 3c7b68d src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2 src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 3e8729d src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java 4a09ce2 Diff: https://reviews.apache.org/r/3435/diff Testing ------- All unit tests pass sometimes. Some fail sometimes (generally the cases that fabricate new regions). Not ready for commit. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3435/#review4378
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
          <https://reviews.apache.org/r/3435/#comment9838>

          I think you meant regionOffline()

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          <https://reviews.apache.org/r/3435/#comment9837>

          Good question. They look the same to me.
          I think one, possibly clearRegionFromTransition, should be removed.

          • Ted

          On 2012-01-13 22:49:33, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/3435/

          -----------------------------------------------------------

          (Updated 2012-01-13 22:49:33)

          Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.

          Summary

          -------

          I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out.

          Problem 1:

          In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things:

          1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information.

          2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states.

          What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair).

          Problem 2:

          Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait?

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 330a7cc

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java 3c7b68d

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 3e8729d

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java 4a09ce2

          Diff: https://reviews.apache.org/r/3435/diff

          Testing

          -------

          All unit tests pass sometimes. Some fail sometimes (generally the cases that fabricate new regions).

          Not ready for commit.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3435/#review4378 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java < https://reviews.apache.org/r/3435/#comment9838 > I think you meant regionOffline() src/main/java/org/apache/hadoop/hbase/master/HMaster.java < https://reviews.apache.org/r/3435/#comment9837 > Good question. They look the same to me. I think one, possibly clearRegionFromTransition, should be removed. Ted On 2012-01-13 22:49:33, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3435/ ----------------------------------------------------------- (Updated 2012-01-13 22:49:33) Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans. Summary ------- I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out. Problem 1: In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things: 1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information. 2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states. What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair). Problem 2: Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait? This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 330a7cc src/main/java/org/apache/hadoop/hbase/master/HMaster.java 3c7b68d src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2 src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 3e8729d src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java 4a09ce2 Diff: https://reviews.apache.org/r/3435/diff Testing ------- All unit tests pass sometimes. Some fail sometimes (generally the cases that fabricate new regions). Not ready for commit. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-01-13 23:01:49, Alex Newman wrote:

          > src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 1048

          > <https://reviews.apache.org/r/3435/diff/2/?file=68920#file68920line1048>

          >

          > Lots of whitespaces

          Yup, I'll get them in the next pass – from my v2 comments, I still need to get a compatibility checking thing going on, and will get the new nits on that pass.

          • jmhsieh

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3435/#review4375
          -----------------------------------------------------------

          On 2012-01-13 22:49:33, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/3435/

          -----------------------------------------------------------

          (Updated 2012-01-13 22:49:33)

          Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.

          Summary

          -------

          I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out.

          Problem 1:

          In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things:

          1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information.

          2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states.

          What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair).

          Problem 2:

          Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait?

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 330a7cc

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java 3c7b68d

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 3e8729d

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java 4a09ce2

          Diff: https://reviews.apache.org/r/3435/diff

          Testing

          -------

          All unit tests pass sometimes. Some fail sometimes (generally the cases that fabricate new regions).

          Not ready for commit.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-01-13 23:01:49, Alex Newman wrote: > src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 1048 > < https://reviews.apache.org/r/3435/diff/2/?file=68920#file68920line1048 > > > Lots of whitespaces Yup, I'll get them in the next pass – from my v2 comments, I still need to get a compatibility checking thing going on, and will get the new nits on that pass. jmhsieh ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3435/#review4375 ----------------------------------------------------------- On 2012-01-13 22:49:33, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3435/ ----------------------------------------------------------- (Updated 2012-01-13 22:49:33) Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans. Summary ------- I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out. Problem 1: In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things: 1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information. 2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states. What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair). Problem 2: Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait? This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 330a7cc src/main/java/org/apache/hadoop/hbase/master/HMaster.java 3c7b68d src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2 src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 3e8729d src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java 4a09ce2 Diff: https://reviews.apache.org/r/3435/diff Testing ------- All unit tests pass sometimes. Some fail sometimes (generally the cases that fabricate new regions). Not ready for commit. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3435/#review4375
          -----------------------------------------------------------

          I noticed lots of extra whitespaces

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          <https://reviews.apache.org/r/3435/#comment9835>

          Lots of whitespaces

          • Alex

          On 2012-01-13 22:49:33, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/3435/

          -----------------------------------------------------------

          (Updated 2012-01-13 22:49:33)

          Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.

          Summary

          -------

          I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out.

          Problem 1:

          In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things:

          1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information.

          2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states.

          What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair).

          Problem 2:

          Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait?

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6

          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 330a7cc

          src/main/java/org/apache/hadoop/hbase/master/HMaster.java 3c7b68d

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 3e8729d

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java 4a09ce2

          Diff: https://reviews.apache.org/r/3435/diff

          Testing

          -------

          All unit tests pass sometimes. Some fail sometimes (generally the cases that fabricate new regions).

          Not ready for commit.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3435/#review4375 ----------------------------------------------------------- I noticed lots of extra whitespaces src/main/java/org/apache/hadoop/hbase/master/HMaster.java < https://reviews.apache.org/r/3435/#comment9835 > Lots of whitespaces Alex On 2012-01-13 22:49:33, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3435/ ----------------------------------------------------------- (Updated 2012-01-13 22:49:33) Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans. Summary ------- I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out. Problem 1: In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things: 1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information. 2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states. What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair). Problem 2: Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait? This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 330a7cc src/main/java/org/apache/hadoop/hbase/master/HMaster.java 3c7b68d src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2 src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 3e8729d src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java 4a09ce2 Diff: https://reviews.apache.org/r/3435/diff Testing ------- All unit tests pass sometimes. Some fail sometimes (generally the cases that fabricate new regions). Not ready for commit. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3435/
          -----------------------------------------------------------

          (Updated 2012-01-13 22:49:33.927353)

          Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.

          Changes
          -------

          Version 2

          Solved problem 1 by adding a new method to the master – offline – which properly removes in-memory state from the master's assignmentManager (which allows disable table to work properly and drop table to work properly). I haven't added api compatibility checks (to gracefully handle if this hbck is used on an 0.90.5 cluster) yet – that will be in the next version of the patch.

          Solved problem 2 by adding a waitUntilAssigned. The tests were looped and consistently pass now.

          This version now "sidelines" data instead of deleting data – so in the case where repairs go badly there is still a good chance for some manual recovery.

          Fixed a bunch of typo/spacing nits.. more to come.

          I still need to do some testing on real clusters-- I'm going to use the bug from HBASE-5196 or manually inject failures to generate a problematic tables.

          I also need to forward port to trunk/0.92.x.

          Summary
          -------

          I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out.

          Problem 1:

          In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things:

          1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information.
          2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states.

          What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair).

          Problem 2:

          Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait?

          This addresses bug HBASE-5128.
          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6
          src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 330a7cc
          src/main/java/org/apache/hadoop/hbase/master/HMaster.java 3c7b68d
          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d
          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b
          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2
          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57
          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 3e8729d
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java 4a09ce2

          Diff: https://reviews.apache.org/r/3435/diff

          Testing
          -------

          All unit tests pass sometimes. Some fail sometimes (generally the cases that fabricate new regions).

          Not ready for commit.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3435/ ----------------------------------------------------------- (Updated 2012-01-13 22:49:33.927353) Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans. Changes ------- Version 2 Solved problem 1 by adding a new method to the master – offline – which properly removes in-memory state from the master's assignmentManager (which allows disable table to work properly and drop table to work properly). I haven't added api compatibility checks (to gracefully handle if this hbck is used on an 0.90.5 cluster) yet – that will be in the next version of the patch. Solved problem 2 by adding a waitUntilAssigned. The tests were looped and consistently pass now. This version now "sidelines" data instead of deleting data – so in the case where repairs go badly there is still a good chance for some manual recovery. Fixed a bunch of typo/spacing nits.. more to come. I still need to do some testing on real clusters-- I'm going to use the bug from HBASE-5196 or manually inject failures to generate a problematic tables. I also need to forward port to trunk/0.92.x. Summary ------- I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out. Problem 1: In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things: 1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information. 2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states. What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair). Problem 2: Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait? This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs (updated) src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 330a7cc src/main/java/org/apache/hadoop/hbase/master/HMaster.java 3c7b68d src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2 src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 3e8729d src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java 4a09ce2 Diff: https://reviews.apache.org/r/3435/diff Testing ------- All unit tests pass sometimes. Some fail sometimes (generally the cases that fabricate new regions). Not ready for commit. Thanks, jmhsieh
          Hide
          Jonathan Hsieh added a comment -

          I'm working on it. Was working on some of the TODOs and got caught with another snag. It will come soon.

          Show
          Jonathan Hsieh added a comment - I'm working on it. Was working on some of the TODOs and got caught with another snag. It will come soon.
          Hide
          Ted Yu added a comment -

          @Jonathan:
          Can we see your patch ?
          Compatibility checks sound great.

          Show
          Ted Yu added a comment - @Jonathan: Can we see your patch ? Compatibility checks sound great.
          Hide
          Jonathan Hsieh added a comment -

          I still need to do some more actual clusters testing, but I'm going to post another version that solved problem #1 and #2 later tonight.

          #1 – added offline(byte[] regionname) method to master ipc interface.
          #2 – added code to wait for region to exit RIT status before moving on. Test doesn't seem flakey anymore. (all these tests seem to pass about 25 times in row now).

          I really would like to have this in the 0.90.6 release if possible – any complaints if I added some compatibility checks to see if it can use the new API is present and blare some some mean sounding warnings if you attempt to use the overlap fixing feature against a version that does not support it? (it will mostly work but likely require a hmaster restart to be "clean" again).

          Show
          Jonathan Hsieh added a comment - I still need to do some more actual clusters testing, but I'm going to post another version that solved problem #1 and #2 later tonight. #1 – added offline(byte[] regionname) method to master ipc interface. #2 – added code to wait for region to exit RIT status before moving on. Test doesn't seem flakey anymore. (all these tests seem to pass about 25 times in row now). I really would like to have this in the 0.90.6 release if possible – any complaints if I added some compatibility checks to see if it can use the new API is present and blare some some mean sounding warnings if you attempt to use the overlap fixing feature against a version that does not support it? (it will mostly work but likely require a hmaster restart to be "clean" again).
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3435/#review4317
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/3435/#comment9714>

          Should be 'to end key'.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/3435/#comment9715>

          Should insert some text between newRegion and region.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/3435/#comment9716>

          This should be outside the for loop.

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          <https://reviews.apache.org/r/3435/#comment9717>

          Space between > and 0.

          • Ted

          On 2012-01-11 12:46:37, jmhsieh wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/3435/

          -----------------------------------------------------------

          (Updated 2012-01-11 12:46:37)

          Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.

          Summary

          -------

          I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out.

          Problem 1:

          In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things:

          1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information.

          2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states.

          What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair).

          Problem 2:

          Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait?

          This addresses bug HBASE-5128.

          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d

          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b

          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2

          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION

          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57

          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 3e8729d

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151

          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java 4a09ce2

          Diff: https://reviews.apache.org/r/3435/diff

          Testing

          -------

          All unit tests pass sometimes. Some fail sometimes (generally the cases that fabricate new regions).

          Not ready for commit.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3435/#review4317 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/3435/#comment9714 > Should be 'to end key'. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/3435/#comment9715 > Should insert some text between newRegion and region. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/3435/#comment9716 > This should be outside the for loop. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java < https://reviews.apache.org/r/3435/#comment9717 > Space between > and 0. Ted On 2012-01-11 12:46:37, jmhsieh wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3435/ ----------------------------------------------------------- (Updated 2012-01-11 12:46:37) Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans. Summary ------- I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out. Problem 1: In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things: 1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information. 2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states. What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair). Problem 2: Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait? This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs ----- src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2 src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 3e8729d src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java 4a09ce2 Diff: https://reviews.apache.org/r/3435/diff Testing ------- All unit tests pass sometimes. Some fail sometimes (generally the cases that fabricate new regions). Not ready for commit. Thanks, jmhsieh
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/3435/
          -----------------------------------------------------------

          (Updated 2012-01-11 12:46:37.524636)

          Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans.

          Changes
          -------

          Fixed bug link. Added JD.

          JD – the code that is similar to merging is

          • #handleOverlapGroup
          • inMeta && !inHdfs && isDeployed (in another rev I've added an unassign and believe I still have the disable/delete problem).

          Summary
          -------

          I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out.

          Problem 1:

          In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things:

          1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information.
          2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states.

          What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair).

          Problem 2:

          Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait?

          This addresses bug HBASE-5128.
          https://issues.apache.org/jira/browse/HBASE-5128

          Diffs


          src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d
          src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b
          src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2
          src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION
          src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57
          src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 3e8729d
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151
          src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java 4a09ce2

          Diff: https://reviews.apache.org/r/3435/diff

          Testing
          -------

          All unit tests pass sometimes. Some fail sometimes (generally the cases that fabricate new regions).

          Not ready for commit.

          Thanks,

          jmhsieh

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3435/ ----------------------------------------------------------- (Updated 2012-01-11 12:46:37.524636) Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel Cryans. Changes ------- Fixed bug link. Added JD. JD – the code that is similar to merging is #handleOverlapGroup inMeta && !inHdfs && isDeployed (in another rev I've added an unassign and believe I still have the disable/delete problem). Summary ------- I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out. Problem 1: In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things: 1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information. 2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states. What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair). Problem 2: Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait? This addresses bug HBASE-5128 . https://issues.apache.org/jira/browse/HBASE-5128 Diffs src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java 29e8bb2 src/main/java/org/apache/hadoop/hbase/util/hbck/TableIntegrityErrorHandler.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java a640d57 src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java dbb97f8 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java 3e8729d src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java 11a1151 src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java 4a09ce2 Diff: https://reviews.apache.org/r/3435/diff Testing ------- All unit tests pass sometimes. Some fail sometimes (generally the cases that fabricate new regions). Not ready for commit. Thanks, jmhsieh
          Hide
          Jonathan Hsieh added a comment -

          The code in HBASE-1621 code does something similar to my problem cases so it might be the solution as well – apparently meta regioninfos has an offline flag. (not sure if this is just trunk though).

          Show
          Jonathan Hsieh added a comment - The code in HBASE-1621 code does something similar to my problem cases so it might be the solution as well – apparently meta regioninfos has an offline flag. (not sure if this is just trunk though).
          Hide
          Jonathan Hsieh added a comment -

          @Ted,

          For #1. I'd ideally like the tool to be backwards compatible with existing 0.90's. I think this version will work for older versions in cases where the problem is table region holes. This problem only affects when attempting to repair overlapping regions. If I need to modify servers to update the unassign/close api, I'll probably put warnings on the code so that the user is aware of potential issues if using hbck to fix older versions (or possibly ask the user to failover to another master).

          For #2, makes sense – I'll spend more time digging into what is "in-motion" causing the flaky tests.

          Show
          Jonathan Hsieh added a comment - @Ted, For #1. I'd ideally like the tool to be backwards compatible with existing 0.90's. I think this version will work for older versions in cases where the problem is table region holes. This problem only affects when attempting to repair overlapping regions. If I need to modify servers to update the unassign/close api, I'll probably put warnings on the code so that the user is aware of potential issues if using hbck to fix older versions (or possibly ask the user to failover to another master). For #2, makes sense – I'll spend more time digging into what is "in-motion" causing the flaky tests.
          Hide
          Ted Yu added a comment -

          For problem #1, I think AssignmentManager.unassign() needs to be modified - currently it only removes regions from internal map upon getting RemoteException.

          For problem #2, I think hbck should wait. This scenario may happen in production.

          Show
          Ted Yu added a comment - For problem #1, I think AssignmentManager.unassign() needs to be modified - currently it only removes regions from internal map upon getting RemoteException. For problem #2, I think hbck should wait. This scenario may happen in production.
          Hide
          Jonathan Hsieh added a comment -

          I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out.

          Problem 1:

          In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things:

          1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information.
          2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states.

          What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair).

          Problem 2:

          Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait?

          Show
          Jonathan Hsieh added a comment - I'm posting a preliminary version that I'm currently testing on real clusters. The tests are flakey on the 0.90 branch (so there is something async that I didn't synchronize properly), and there are a few more TODO's I want to knock out before this is ready for full review to be considered for committing. It's got some problems I need some advice figuring out. Problem 1: In the unit tests, I have a few cases where I fabricate new regions and try to force the overlapping regions to be closed. For some of these, I cannot delete a table after it is repaired without causing subsequent tests to fail. I think this is due to a few things: 1) The disable table handler uses in-memory assignment manager state while delete uses in META assignment information. 2) Currently I'm using the sneaky closeRegion that purposely doesn't go through the master and in turn doesn't modify in-memory state – disable uses out of date in-memory region assignments. If I use the unassign method sends RIT transitions to the master, but which ends up attempting to assign it again, causing timing/transient states. What is a good way to clear the HMaster's assignment manager's assignment data for particular regions or to force it to re-read from META? (without modifying the 0.90 HBase's it is meant to repair). Problem 2: Sometimes test fail reporting HOLE_IN_REGION_CHAIN and SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused with each other and basically something is still happening asynchronously. I think this is the new region is being assigned and is still transitioning. Sound about right? To make the unit test deterministic, should hbck wait for these to settle or should just the unit test wait?
          Hide
          Jonathan Hsieh added a comment -

          I've been working on a new version of hbck that solves a whole bunch of potential problems in HBase tables. Currently it is implemented for a variant of 0.90 in mind – there will likely be some minor work to port to stock 0.90.5, and significant work required to port it to trunk / 0.92.

          Show
          Jonathan Hsieh added a comment - I've been working on a new version of hbck that solves a whole bunch of potential problems in HBase tables. Currently it is implemented for a variant of 0.90 in mind – there will likely be some minor work to port to stock 0.90.5, and significant work required to port it to trunk / 0.92.

            People

            • Assignee:
              Jonathan Hsieh
              Reporter:
              Jonathan Hsieh
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development