Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0.0, 1.1.0
    • Component/s: None
    • Labels:
      None

      Description

      HBCK should be updated to have a check for whether the replicas are assigned to the right machines (default and non-default replicas ideally should not be in the same server if there is more than one server in the cluster and such scenarios). Jonathan Hsieh suggested this in HBASE-10362.

      1. 10674-1.txt
        16 kB
        Devaraj Das
      2. 10674-1.2.txt
        34 kB
        Devaraj Das
      3. 10674-1.3.txt
        34 kB
        Devaraj Das
      4. 10674.addendum
        1.0 kB
        Ted Yu

        Activity

        Hide
        Nick Dimiduk added a comment -

        Closing issues released in 1.1.0.

        Show
        Nick Dimiduk added a comment - Closing issues released in 1.1.0.
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-1.1 #86 (See https://builds.apache.org/job/HBase-1.1/86/)
        HBASE-10674. HBCK should be updated to do replica related checks (ddas: rev 895e9228a0f5036a9e9c4c481e796fa1fbc816de)

        • hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
        • hbase-client/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java
        • hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java
        • hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-1.1 #86 (See https://builds.apache.org/job/HBase-1.1/86/ ) HBASE-10674 . HBCK should be updated to do replica related checks (ddas: rev 895e9228a0f5036a9e9c4c481e796fa1fbc816de) hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java hbase-client/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java
        Hide
        Devaraj Das added a comment -

        Committed to branch-1.

        Show
        Devaraj Das added a comment - Committed to branch-1.
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-TRUNK #5369 (See https://builds.apache.org/job/HBase-TRUNK/5369/)
        HBASE-10674 HBCK should be updated to do replica related checks Addendum (tedyu: rev ab757169a19be668e7a7d2e52c1da01d4810e0c2)

        • hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-TRUNK #5369 (See https://builds.apache.org/job/HBase-TRUNK/5369/ ) HBASE-10674 HBCK should be updated to do replica related checks Addendum (tedyu: rev ab757169a19be668e7a7d2e52c1da01d4810e0c2) hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java
        Hide
        Ted Yu added a comment -

        Addendum fixes compilation.

        Show
        Ted Yu added a comment - Addendum fixes compilation.
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-TRUNK #5368 (See https://builds.apache.org/job/HBase-TRUNK/5368/)
        HBASE-10674. HBCK should be updated to do replica related checks (ddas: rev 8562752143f57744ac522ace1a2b196c45b428d4)

        • hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java
        • hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java
        • hbase-client/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java
        • hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-TRUNK #5368 (See https://builds.apache.org/job/HBase-TRUNK/5368/ ) HBASE-10674 . HBCK should be updated to do replica related checks (ddas: rev 8562752143f57744ac522ace1a2b196c45b428d4) hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java hbase-client/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
        Hide
        Devaraj Das added a comment -

        What I committed with the review comments incorporated.

        Show
        Devaraj Das added a comment - What I committed with the review comments incorporated.
        Hide
        Devaraj Das added a comment -

        Committed. Thanks Alex Newman & Nick Dimiduk for reviews.

        Show
        Devaraj Das added a comment - Committed. Thanks Alex Newman & Nick Dimiduk for reviews.
        Hide
        Alex Newman added a comment -

        Oooh one other thing

        in HBaseFsck.java
        /**

        • Puts the specified HRegionInfo into META.
          */
          public static void fixMetaHoleOnline(Configuration conf,
          HRegionInfo hri) throws IOException { HTable meta = new HTable(conf, TableName.META_TABLE_NAME); MetaTableAccessor.addRegionToMeta(meta, hri); meta.close(); }

        no longer seems to be used

        Show
        Alex Newman added a comment - Oooh one other thing in HBaseFsck.java /** Puts the specified HRegionInfo into META. */ public static void fixMetaHoleOnline(Configuration conf, HRegionInfo hri) throws IOException { HTable meta = new HTable(conf, TableName.META_TABLE_NAME); MetaTableAccessor.addRegionToMeta(meta, hri); meta.close(); } no longer seems to be used
        Hide
        Alex Newman added a comment -

        Agreed, my vote does't count but +1

        Show
        Alex Newman added a comment - Agreed, my vote does't count but +1
        Hide
        Nick Dimiduk added a comment -
        +    Result r = meta.get(get);
        +    RegionLocations rl = MetaTableAccessor.getRegionLocations(r);
        +    if (rl == null) return;
        

        Should this be an error instead of silently returning?

        +          for (HRegionLocation h : rl.getRegionLocations()) {
        +            if (h == null || h.getRegionInfo() == null) {
        +              continue;
        +            }
        

        And this should be an error as well, especially if h.getRegionInfo() is null?

        Nice tests. Just when I was thinking "what about this case?" I find a test for it.

        Questions aside, LGTM, +1.

        Show
        Nick Dimiduk added a comment - + Result r = meta.get(get); + RegionLocations rl = MetaTableAccessor.getRegionLocations(r); + if (rl == null) return; Should this be an error instead of silently returning? + for (HRegionLocation h : rl.getRegionLocations()) { + if (h == null || h.getRegionInfo() == null) { + continue; + } And this should be an error as well, especially if h.getRegionInfo() is null? Nice tests. Just when I was thinking "what about this case?" I find a test for it. Questions aside, LGTM, +1.
        Hide
        Alex Newman added a comment -

        Also I noticed A commented out line in HBaseFsck

        @@ -3514,6 +3517,7 @@ public class HBaseFsck extends Configured {
        // check to see if the existence of this region matches the region in M
        for (HRegionInfo r:regions)

        { HbckInfo hbi = hbck.getOrCreateInfo(r.getEncodedName()); //if (!RegionReplicaUtil.isDefaultReplica(r)) hbi.setSkipChecks(true) hbi.addServer(r, rsinfo); }
        Show
        Alex Newman added a comment - Also I noticed A commented out line in HBaseFsck @@ -3514,6 +3517,7 @@ public class HBaseFsck extends Configured { // check to see if the existence of this region matches the region in M for (HRegionInfo r:regions) { HbckInfo hbi = hbck.getOrCreateInfo(r.getEncodedName()); //if (!RegionReplicaUtil.isDefaultReplica(r)) hbi.setSkipChecks(true) hbi.addServer(r, rsinfo); }
        Hide
        Alex Newman added a comment -

        Woops accidently hit add

        ....
        if (numReplicas > 1) {
        Random r = new Random();

        Shouldn't this be?

        • ServerName[] serversArr = servers.toArray(new ServerName[0]);
          + ServerName[] serversArr = servers.toArray(new ServerName[servers.size()])

        You have a spare todo around
        ---- a/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java
        +++ b/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java
        @@ -301,6 +301,7 @@ public class TestHBaseFsck {

        • @param metaRow if true remove region's row from META
        • @param hdfs if true remove region's dir in HDFS
        • @param regionInfoOnly if true remove a region dir's .regioninfo file
        • @param replicaId TODO
          */
          private void deleteRegion(Configuration conf, final HTableDescriptor htd,
        Show
        Alex Newman added a comment - Woops accidently hit add .... if (numReplicas > 1) { Random r = new Random(); Shouldn't this be? ServerName[] serversArr = servers.toArray(new ServerName [0] ); + ServerName[] serversArr = servers.toArray(new ServerName [servers.size()] ) You have a spare todo around ---- a/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java +++ b/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java @@ -301,6 +301,7 @@ public class TestHBaseFsck { @param metaRow if true remove region's row from META @param hdfs if true remove region's dir in HDFS @param regionInfoOnly if true remove a region dir's .regioninfo file @param replicaId TODO */ private void deleteRegion(Configuration conf, final HTableDescriptor htd,
        Hide
        Alex Newman added a comment -

        OK I have no commit bit, that being said.....

        What's going on here, can we make it into a function? This has gotten pretty hairy
        +++ b/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
        @@ -2949,6 +2949,7 @@ public class HBaseFsck extends Configured {
        } else

        { m = new MetaEntry(hri, sn, ts, null, null); }

        HbckInfo previous = regionInfoMap.get(hri.getEncodedName());
        if (previous == null)

        { regionInfoMap.put(hri.getEncodedName(), new HbckInfo(m)); .... /TODO What does first online hri mean? Maybe a javadoc? return FSUtils.getTableName(tableDir); }

        else {
        // return the info from the first online hri
        for (OnlineEntry e : deployedEntries)

        { return e.hri.getTable(); }

        This is weirdly formatted. Maybe run an autoformat
        ++ b/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.ja
        @@ -186,12 +186,14 @@ public class HBaseFsckRepair {

        • Puts the specified HRegionInfo into META with replica related columns
          */
          public static void fixMetaHoleOnlineAndAddReplicas(Configuration conf,
        • HRegionInfo hri, Collection<ServerName> servers, int numReplicas) throws
        Show
        Alex Newman added a comment - OK I have no commit bit, that being said..... What's going on here, can we make it into a function? This has gotten pretty hairy +++ b/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java @@ -2949,6 +2949,7 @@ public class HBaseFsck extends Configured { } else { m = new MetaEntry(hri, sn, ts, null, null); } HbckInfo previous = regionInfoMap.get(hri.getEncodedName()); if (previous == null) { regionInfoMap.put(hri.getEncodedName(), new HbckInfo(m)); .... /TODO What does first online hri mean? Maybe a javadoc? return FSUtils.getTableName(tableDir); } else { // return the info from the first online hri for (OnlineEntry e : deployedEntries) { return e.hri.getTable(); } This is weirdly formatted. Maybe run an autoformat ++ b/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.ja @@ -186,12 +186,14 @@ public class HBaseFsckRepair { Puts the specified HRegionInfo into META with replica related columns */ public static void fixMetaHoleOnlineAndAddReplicas(Configuration conf, HRegionInfo hri, Collection<ServerName> servers, int numReplicas) throws
        Hide
        Devaraj Das added a comment -

        Yes,Alex Newman ready to review.

        Show
        Devaraj Das added a comment - Yes, Alex Newman ready to review.
        Hide
        Alex Newman added a comment -

        Is this still work in progress or ready to review?

        Show
        Alex Newman added a comment - Is this still work in progress or ready to review?
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12658715/10674-1.2.txt
        against trunk revision .
        ATTACHMENT ID: 12658715

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 lineLengths. The patch does not introduce lines longer than 100

        +1 site. The mvn site goal succeeds with this patch.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTransaction
        org.apache.hadoop.hbase.migration.TestNamespaceUpgrade
        org.apache.hadoop.hbase.master.TestMasterOperationsForRegionReplicas
        org.apache.hadoop.hbase.regionserver.TestRegionReplicas
        org.apache.hadoop.hbase.client.TestReplicasClient
        org.apache.hadoop.hbase.master.TestRestartCluster
        org.apache.hadoop.hbase.TestRegionRebalancing
        org.apache.hadoop.hbase.TestIOFencing

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/10229//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10229//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10229//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10229//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10229//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10229//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10229//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10229//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10229//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10229//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/10229//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658715/10674-1.2.txt against trunk revision . ATTACHMENT ID: 12658715 +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified tests. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 site . The mvn site goal succeeds with this patch. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTransaction org.apache.hadoop.hbase.migration.TestNamespaceUpgrade org.apache.hadoop.hbase.master.TestMasterOperationsForRegionReplicas org.apache.hadoop.hbase.regionserver.TestRegionReplicas org.apache.hadoop.hbase.client.TestReplicasClient org.apache.hadoop.hbase.master.TestRestartCluster org.apache.hadoop.hbase.TestRegionRebalancing org.apache.hadoop.hbase.TestIOFencing Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/10229//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10229//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10229//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10229//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10229//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10229//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10229//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10229//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10229//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10229//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/10229//console This message is automatically generated.
        Hide
        Devaraj Das added a comment -

        This patch addresses the cases where some replicas are not deployed in the cluster, or excess replicas are deployed in the cluster. It also deals with plugging meta holes with the appropriate replica qualifiers.

        Show
        Devaraj Das added a comment - This patch addresses the cases where some replicas are not deployed in the cluster, or excess replicas are deployed in the cluster. It also deals with plugging meta holes with the appropriate replica qualifiers.
        Hide
        Devaraj Das added a comment -

        This is a WIP patch. It addresses the issue to do with plugging hdfs holes (when a hdfs hole is plugged, the old region is closed and a new region is created; this patch adds closing/opening for the corresponding replicas).

        Show
        Devaraj Das added a comment - This is a WIP patch. It addresses the issue to do with plugging hdfs holes (when a hdfs hole is plugged, the old region is closed and a new region is created; this patch adds closing/opening for the corresponding replicas).

          People

          • Assignee:
            Devaraj Das
            Reporter:
            Devaraj Das
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development