HBase
  1. HBase
  2. HBASE-5128

[uber hbck] Online automated repair of table integrity and region consistency problems

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.90.5, 0.92.0, 0.94.0, 0.95.2
    • Fix Version/s: 0.94.0, 0.95.0
    • Component/s: hbck
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      HBaseFsck (hbck) has been updated with new repair capabilities. hbck is a tool for checking the region consistency and the table integrity invariants of a running HBase cluster. Checking region consistency verifies that .META., region deployment on region servers and the state of data in HDFS (.regioninfo files) all are in accordance. Table integrity checks verify that all possible row keys resolve to exactly one region of a table -- e.g. there are no individual degenerate or backwards regions; no holes between regions; and no overlapping regions. Previously hbck had the ability to diagnose inconsistencies but only had the ability to repair deployment region consistency problems. The updated version now has been augmented with the ability repair region consistency problems in .META. (by patching holes), repair overlapping regions (via merging), patch region holes (by fabricating new regions), and detecting and adopting orphaned regions (by fabricating new .regioninfo file if it is missing in a region's dir).

      Caveats:
      * The new hbck selects repairs assuming that HDFS as ground truth, the previous version treated .META. as ground truth.
      * The hbck '-fix' option is present but deprecated and replaced with '-fixAssignments' option.
      * This tool adds APIs in 0.90.7, 0.92.2 and 0.94.0 for clean repairs. The 0.90 version of the tool is compatible with HBase 0.90+, but may require restarting the master or individual individual regionserver for table enable/disable/delete commands to work properly.
      Show
      HBaseFsck (hbck) has been updated with new repair capabilities. hbck is a tool for checking the region consistency and the table integrity invariants of a running HBase cluster. Checking region consistency verifies that .META., region deployment on region servers and the state of data in HDFS (.regioninfo files) all are in accordance. Table integrity checks verify that all possible row keys resolve to exactly one region of a table -- e.g. there are no individual degenerate or backwards regions; no holes between regions; and no overlapping regions. Previously hbck had the ability to diagnose inconsistencies but only had the ability to repair deployment region consistency problems. The updated version now has been augmented with the ability repair region consistency problems in .META. (by patching holes), repair overlapping regions (via merging), patch region holes (by fabricating new regions), and detecting and adopting orphaned regions (by fabricating new .regioninfo file if it is missing in a region's dir). Caveats: * The new hbck selects repairs assuming that HDFS as ground truth, the previous version treated .META. as ground truth. * The hbck '-fix' option is present but deprecated and replaced with '-fixAssignments' option. * This tool adds APIs in 0.90.7, 0.92.2 and 0.94.0 for clean repairs. The 0.90 version of the tool is compatible with HBase 0.90+, but may require restarting the master or individual individual regionserver for table enable/disable/delete commands to work properly.

      Description

      The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations. However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems. This updated version should be able to handle all cases (including a new orphan regiondir case). When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.

      Here's the approach (from the comment of at the top of the new version of the file).

      /**
       * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
       * table integrity.  
       * 
       * Region consistency checks verify that META, region deployment on
       * region servers and the state of data in HDFS (.regioninfo files) all are in
       * accordance. 
       * 
       * Table integrity checks verify that that all possible row keys can resolve to
       * exactly one region of a table.  This means there are no individual degenerate
       * or backwards regions; no holes between regions; and that there no overlapping
       * regions. 
       * 
       * The general repair strategy works in these steps.
       * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
       * 2) Repair Region Consistency with META and assignments
       * 
       * For table integrity repairs, the tables their region directories are scanned
       * for .regioninfo files.  Each table's integrity is then verified.  If there 
       * are any orphan regions (regions with no .regioninfo files), or holes, new 
       * regions are fabricated.  Backwards regions are sidelined as well as empty
       * degenerate (endkey==startkey) regions.  If there are any overlapping regions,
       * a new region is created and all data is merged into the new region.  
       * 
       * Table integrity repairs deal solely with HDFS and can be done offline -- the
       * hbase region servers or master do not need to be running.  These phase can be
       * use to completely reconstruct the META table in an offline fashion. 
       * 
       * Region consistency requires three conditions -- 1) valid .regioninfo file 
       * present in an hdfs region dir,  2) valid row with .regioninfo data in META,
       * and 3) a region is deployed only at the regionserver that is was assigned to.
       * 
       * Region consistency requires hbck to contact the HBase master and region
       * servers, so the connect() must first be called successfully.  Much of the
       * region consistency information is transient and less risky to repair.
       */
      
      1. 5128-trunk.addendum
        6 kB
        Ted Yu
      2. hbase-5128-0.90-v2.patch
        165 kB
        Jonathan Hsieh
      3. hbase-5128-0.90-v2b.patch
        152 kB
        Jonathan Hsieh
      4. hbase-5128-0.90-v4.patch
        155 kB
        Jonathan Hsieh
      5. hbase-5128-0.92-v2.patch
        151 kB
        Jonathan Hsieh
      6. hbase-5128-0.92-v4.patch
        155 kB
        Jonathan Hsieh
      7. hbase-5128-0.94-v2.patch
        151 kB
        Jonathan Hsieh
      8. hbase-5128-0.94-v4.patch
        155 kB
        Jonathan Hsieh
      9. hbase-5128-trunk.patch
        153 kB
        Jonathan Hsieh
      10. hbase-5128-trunk-v2.patch
        151 kB
        Jonathan Hsieh
      11. hbase-5128-v3.patch
        154 kB
        Jonathan Hsieh
      12. hbase-5128-v4.patch
        156 kB
        Jonathan Hsieh

        Issue Links

          Activity

          Jonathan Hsieh created issue -
          Jonathan Hsieh made changes -
          Field Original Value New Value
          Description The current (0.90.5, 0.92.0rc2) versions of hbck detect most of the invariant violations (orphans is new). However with '-fix' it can only automatically handle deployment problems with region consistency cases. This updated version should be able to handle all cases. When complete will likely deprecate the OfflineMetaRepair tool and subsume several META hole related problems.

          {code}
          /**
           * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
           * table integrity.
           *
           * Region consistency checks verify that META, region deployment on
           * region servers and the state of data in HDFS (.regioninfo files) all are in
           * accordance.
           *
           * Table integrity checks verify that that all possible row keys can resolve to
           * exactly one region of a table. This means there are no individual degenerate
           * or backwards regions; no holes between regions; and that there no overlapping
           * regions.
           *
           * The general repair strategy works in these steps.
           * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
           * 2) Repair Region Consistency with META and assignments
           *
           * For table integrity repairs, the tables their region directories are scanned
           * for .regioninfo files. Each table's integrity is then verified. If there
           * are any orphan regions (regions with no .regioninfo files), or holes, new
           * regions are fabricated. Backwards regions are sidelined as well as empty
           * degenerate (endkey==startkey) regions. If there are any overlapping regions,
           * a new region is created and all data is merged into the new region.
           *
           * Table integrity repairs deal solely with HDFS and can be done offline -- the
           * hbase region servers or master do not need to be running. These phase can be
           * use to completely reconstruct the META table in an offline fashion.
           *
           * Region consistency requires three conditions -- 1) valid .regioninfo file
           * present in an hdfs region dir, 2) valid row with .regioninfo data in META,
           * and 3) a region is deployed only at the regionserver that is was assigned to.
           *
           * Region consistency requires hbck to contact the HBase master and region
           * servers, so the connect() must first be called successfully. Much of the
           * region consistency information is transient and less risky to repair.
           */
          {code}

          The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations. However with '-fix' it can only automatically handle deployment problems with region consistency cases. This updated version should be able to handle all cases (including a new orphan regiondir case). When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.

          Here's the approach (from the comment of at the top of the new version of the file).
          {code}
          /**
           * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
           * table integrity.
           *
           * Region consistency checks verify that META, region deployment on
           * region servers and the state of data in HDFS (.regioninfo files) all are in
           * accordance.
           *
           * Table integrity checks verify that that all possible row keys can resolve to
           * exactly one region of a table. This means there are no individual degenerate
           * or backwards regions; no holes between regions; and that there no overlapping
           * regions.
           *
           * The general repair strategy works in these steps.
           * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
           * 2) Repair Region Consistency with META and assignments
           *
           * For table integrity repairs, the tables their region directories are scanned
           * for .regioninfo files. Each table's integrity is then verified. If there
           * are any orphan regions (regions with no .regioninfo files), or holes, new
           * regions are fabricated. Backwards regions are sidelined as well as empty
           * degenerate (endkey==startkey) regions. If there are any overlapping regions,
           * a new region is created and all data is merged into the new region.
           *
           * Table integrity repairs deal solely with HDFS and can be done offline -- the
           * hbase region servers or master do not need to be running. These phase can be
           * use to completely reconstruct the META table in an offline fashion.
           *
           * Region consistency requires three conditions -- 1) valid .regioninfo file
           * present in an hdfs region dir, 2) valid row with .regioninfo data in META,
           * and 3) a region is deployed only at the regionserver that is was assigned to.
           *
           * Region consistency requires hbck to contact the HBase master and region
           * servers, so the connect() must first be called successfully. Much of the
           * region consistency information is transient and less risky to repair.
           */
          {code}

          Jonathan Hsieh made changes -
          Description The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations. However with '-fix' it can only automatically handle deployment problems with region consistency cases. This updated version should be able to handle all cases (including a new orphan regiondir case). When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.

          Here's the approach (from the comment of at the top of the new version of the file).
          {code}
          /**
           * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
           * table integrity.
           *
           * Region consistency checks verify that META, region deployment on
           * region servers and the state of data in HDFS (.regioninfo files) all are in
           * accordance.
           *
           * Table integrity checks verify that that all possible row keys can resolve to
           * exactly one region of a table. This means there are no individual degenerate
           * or backwards regions; no holes between regions; and that there no overlapping
           * regions.
           *
           * The general repair strategy works in these steps.
           * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
           * 2) Repair Region Consistency with META and assignments
           *
           * For table integrity repairs, the tables their region directories are scanned
           * for .regioninfo files. Each table's integrity is then verified. If there
           * are any orphan regions (regions with no .regioninfo files), or holes, new
           * regions are fabricated. Backwards regions are sidelined as well as empty
           * degenerate (endkey==startkey) regions. If there are any overlapping regions,
           * a new region is created and all data is merged into the new region.
           *
           * Table integrity repairs deal solely with HDFS and can be done offline -- the
           * hbase region servers or master do not need to be running. These phase can be
           * use to completely reconstruct the META table in an offline fashion.
           *
           * Region consistency requires three conditions -- 1) valid .regioninfo file
           * present in an hdfs region dir, 2) valid row with .regioninfo data in META,
           * and 3) a region is deployed only at the regionserver that is was assigned to.
           *
           * Region consistency requires hbck to contact the HBase master and region
           * servers, so the connect() must first be called successfully. Much of the
           * region consistency information is transient and less risky to repair.
           */
          {code}

          The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region consistency and table integrity invariant violations. However with '-fix' it can only automatically repair region consistency cases having to do with deployment problems. This updated version should be able to handle all cases (including a new orphan regiondir case). When complete will likely deprecate the OfflineMetaRepair tool and subsume several open META-hole related issue.

          Here's the approach (from the comment of at the top of the new version of the file).
          {code}
          /**
           * HBaseFsck (hbck) is a tool for checking and repairing region consistency and
           * table integrity.
           *
           * Region consistency checks verify that META, region deployment on
           * region servers and the state of data in HDFS (.regioninfo files) all are in
           * accordance.
           *
           * Table integrity checks verify that that all possible row keys can resolve to
           * exactly one region of a table. This means there are no individual degenerate
           * or backwards regions; no holes between regions; and that there no overlapping
           * regions.
           *
           * The general repair strategy works in these steps.
           * 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
           * 2) Repair Region Consistency with META and assignments
           *
           * For table integrity repairs, the tables their region directories are scanned
           * for .regioninfo files. Each table's integrity is then verified. If there
           * are any orphan regions (regions with no .regioninfo files), or holes, new
           * regions are fabricated. Backwards regions are sidelined as well as empty
           * degenerate (endkey==startkey) regions. If there are any overlapping regions,
           * a new region is created and all data is merged into the new region.
           *
           * Table integrity repairs deal solely with HDFS and can be done offline -- the
           * hbase region servers or master do not need to be running. These phase can be
           * use to completely reconstruct the META table in an offline fashion.
           *
           * Region consistency requires three conditions -- 1) valid .regioninfo file
           * present in an hdfs region dir, 2) valid row with .regioninfo data in META,
           * and 3) a region is deployed only at the regionserver that is was assigned to.
           *
           * Region consistency requires hbck to contact the HBase master and region
           * servers, so the connect() must first be called successfully. Much of the
           * region consistency information is transient and less risky to repair.
           */
          {code}

          Jonathan Hsieh made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          Jonathan Hsieh made changes -
          Affects Version/s 0.92.0 [ 12314223 ]
          Affects Version/s 0.90.5 [ 12317145 ]
          Component/s hbck [ 12315702 ]
          Jonathan Hsieh made changes -
          Link This issue is related to HBASE-1621 [ HBASE-1621 ]
          Jonathan Hsieh made changes -
          Attachment hbase-5128-trunk.patch [ 12517811 ]
          Jonathan Hsieh made changes -
          Status In Progress [ 3 ] Patch Available [ 10002 ]
          Affects Version/s 0.94.0 [ 12316419 ]
          Affects Version/s 0.96.0 [ 12320040 ]
          Jonathan Hsieh made changes -
          Link This issue requires HBASE-5588 [ HBASE-5588 ]
          Jonathan Hsieh made changes -
          Link This issue requires HBASE-5589 [ HBASE-5589 ]
          Jonathan Hsieh made changes -
          Link This issue requires HBASE-5563 [ HBASE-5563 ]
          Ted Yu made changes -
          Link This issue relates to HBASE-5599 [ HBASE-5599 ]
          Jonathan Hsieh made changes -
          Attachment hbase-5128-trunk-v2.patch [ 12519364 ]
          Jonathan Hsieh made changes -
          Attachment hbase-5128-0.94-v2.patch [ 12519381 ]
          Attachment hbase-5128-0.92-v2.patch [ 12519382 ]
          Jonathan Hsieh made changes -
          Attachment hbase-5128-0.90-v2.patch [ 12519401 ]
          Jonathan Hsieh made changes -
          Fix Version/s 0.90.7 [ 12319481 ]
          Fix Version/s 0.92.2 [ 12319888 ]
          Fix Version/s 0.94.0 [ 12316419 ]
          Fix Version/s 0.96.0 [ 12320040 ]
          Jonathan Hsieh made changes -
          Attachment hbase-5128-0.90-v2b.patch [ 12519405 ]
          Ted Yu made changes -
          Comment [ -1 overall. Here are the results of testing the latest attachment
            http://issues.apache.org/jira/secure/attachment/12519382/hbase-5128-0.92-v2.patch
            against trunk revision .

              +1 @author. The patch does not contain any @author tags.

              +1 tests included. The patch appears to include 21 new or modified tests.

              -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1252//console

          This message is automatically generated. ]
          Ted Yu made changes -
          Comment [ -1 overall. Here are the results of testing the latest attachment
            http://issues.apache.org/jira/secure/attachment/12519401/hbase-5128-0.90-v2.patch
            against trunk revision .

              +1 @author. The patch does not contain any @author tags.

              +1 tests included. The patch appears to include 15 new or modified tests.

              -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1253//console

          This message is automatically generated. ]
          Ted Yu made changes -
          Comment [ -1 overall. Here are the results of testing the latest attachment
            http://issues.apache.org/jira/secure/attachment/12519405/hbase-5128-0.90-v2b.patch
            against trunk revision .

              +1 @author. The patch does not contain any @author tags.

              +1 tests included. The patch appears to include 15 new or modified tests.

              -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1254//console

          This message is automatically generated. ]
          Jonathan Hsieh made changes -
          Attachment hbase-5128-v3.patch [ 12519538 ]
          Jonathan Hsieh made changes -
          Attachment hbase-5128-0.92-v4.patch [ 12519647 ]
          Attachment hbase-5128-0.94-v4.patch [ 12519648 ]
          Attachment hbase-5128-v4.patch [ 12519649 ]
          Jonathan Hsieh made changes -
          Attachment hbase-5128-0.90-v4.patch [ 12519754 ]
          Jonathan Hsieh made changes -
          Summary [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online. [uber hbck] Online automated repair of table integrity and region consistency problems
          Jonathan Hsieh made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Resolution Fixed [ 1 ]
          Ted Yu made changes -
          Attachment 5128-trunk.addendum [ 12519778 ]
          Jonathan Hsieh made changes -
          Link This issue is depended upon by HBASE-5634 [ HBASE-5634 ]
          Jonathan Hsieh made changes -
          Release Note HBaseFsck (hbck) has been updated with new repair capabilities. hbck is a tool for checking the region consistency and the table integrity invariants of a running HBase cluster. Checking region consistency verifies that .META., region deployment on region servers and the state of data in HDFS (.regioninfo files) all are in accordance. Table integrity checks verify that all possible row keys resolve to exactly one region of a table -- e.g. there are no individual degenerate or backwards regions; no holes between regions; and no overlapping regions. Previously hbck had the ability to diagnose inconsistencies but only had the ability to repair deployment region consistency problems. The updated version now has been augmented with the ability repair region consistency problems in .META. (by patching holes), repair overlapping regions (via merging), patch region holes (by fabricating new regions), and detecting and adopting orphaned regions (by fabricating new .regioninfo file if it is missing in a region's dir).

          Caveats:
          * The new hbck selects repairs assuming that HDFS as ground truth, the previous version treated .META. as ground truth.
          * The hbck '-fix' option is present but deprecated and replaced with -fixAssignments option.
          * This tool adds APIs in 0.90.7, 0.92.2 and 0.94.0 for clean repairs. The 0.90 version of he tool is compatible with HBase 0.90+, but may require restarting the master or individual individual regionserver for table enable/disable/delete commands to work properly.
          Jonathan Hsieh made changes -
          Release Note HBaseFsck (hbck) has been updated with new repair capabilities. hbck is a tool for checking the region consistency and the table integrity invariants of a running HBase cluster. Checking region consistency verifies that .META., region deployment on region servers and the state of data in HDFS (.regioninfo files) all are in accordance. Table integrity checks verify that all possible row keys resolve to exactly one region of a table -- e.g. there are no individual degenerate or backwards regions; no holes between regions; and no overlapping regions. Previously hbck had the ability to diagnose inconsistencies but only had the ability to repair deployment region consistency problems. The updated version now has been augmented with the ability repair region consistency problems in .META. (by patching holes), repair overlapping regions (via merging), patch region holes (by fabricating new regions), and detecting and adopting orphaned regions (by fabricating new .regioninfo file if it is missing in a region's dir).

          Caveats:
          * The new hbck selects repairs assuming that HDFS as ground truth, the previous version treated .META. as ground truth.
          * The hbck '-fix' option is present but deprecated and replaced with -fixAssignments option.
          * This tool adds APIs in 0.90.7, 0.92.2 and 0.94.0 for clean repairs. The 0.90 version of he tool is compatible with HBase 0.90+, but may require restarting the master or individual individual regionserver for table enable/disable/delete commands to work properly.
          HBaseFsck (hbck) has been updated with new repair capabilities. hbck is a tool for checking the region consistency and the table integrity invariants of a running HBase cluster. Checking region consistency verifies that .META., region deployment on region servers and the state of data in HDFS (.regioninfo files) all are in accordance. Table integrity checks verify that all possible row keys resolve to exactly one region of a table -- e.g. there are no individual degenerate or backwards regions; no holes between regions; and no overlapping regions. Previously hbck had the ability to diagnose inconsistencies but only had the ability to repair deployment region consistency problems. The updated version now has been augmented with the ability repair region consistency problems in .META. (by patching holes), repair overlapping regions (via merging), patch region holes (by fabricating new regions), and detecting and adopting orphaned regions (by fabricating new .regioninfo file if it is missing in a region's dir).

          Caveats:
          * The new hbck selects repairs assuming that HDFS as ground truth, the previous version treated .META. as ground truth.
          * The hbck '-fix' option is present but deprecated and replaced with '-fixAssignments' option.
          * This tool adds APIs in 0.90.7, 0.92.2 and 0.94.0 for clean repairs. The 0.90 version of the tool is compatible with HBase 0.90+, but may require restarting the master or individual individual regionserver for table enable/disable/delete commands to work properly.
          Ted Yu made changes -
          Link This issue relates to HBASE-5630 [ HBASE-5630 ]
          Jeff Hammerbacher made changes -
          Link This issue relates to HBASE-5719 [ HBASE-5719 ]
          Jonathan Hsieh made changes -
          Link This issue is related to HBASE-4379 [ HBASE-4379 ]
          Jonathan Hsieh made changes -
          Link This issue duplicates HBASE-4094 [ HBASE-4094 ]
          Jonathan Hsieh made changes -
          Link This issue is depended upon by HBASE-5781 [ HBASE-5781 ]
          Jeff Hammerbacher made changes -
          Link This issue relates to HBASE-5628 [ HBASE-5628 ]
          Lars Hofhansl made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          stack made changes -
          Fix Version/s 0.95.0 [ 12324094 ]
          Fix Version/s 0.94.0 [ 12316419 ]
          Fix Version/s 0.90.7 [ 12319481 ]
          Fix Version/s 0.92.2 [ 12319888 ]
          Fix Version/s 0.96.0 [ 12320040 ]
          Lars Hofhansl made changes -
          Fix Version/s 0.94.0 [ 12316419 ]

            People

            • Assignee:
              Jonathan Hsieh
              Reporter:
              Jonathan Hsieh
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development