HBase
  1. HBase
  2. HBASE-10153

improve VerifyReplication to compute BADROWS more accurately

    Details

    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      VerifyReplicaiton reports the following counters besides the existing ones:

      ONLY_IN_SOURCE_TABLE_ROWS: number of rows found only in source
      ONLY_IN_PEER_TABLE_ROWS: number of rows found only in peer
      CONTENT_DIFFERENT_ROWS: number of rows whose contents are different between source and peer
      Show
      VerifyReplicaiton reports the following counters besides the existing ones: ONLY_IN_SOURCE_TABLE_ROWS: number of rows found only in source ONLY_IN_PEER_TABLE_ROWS: number of rows found only in peer CONTENT_DIFFERENT_ROWS: number of rows whose contents are different between source and peer

      Description

      VerifyReplicaiton could compare the source table with its peer table and compute BADROWS. However, the current BADROWS computing method might not be accurate enough. For example, if source table contains rows as

      {r1, r2, r3, r4}

      and peer table contains rows as

      {r1, r3, r4}

      BADROWS will be 3 because 'r2' in source table will make all the later row comparisons fail. Will it be better if the BADROWS is computed to 1 in this situation? Maybe, we can compute the BADROWS more accurately in merge comparison?

      1. 10153-0.98.txt
        5 kB
        Ted Yu
      2. 10153-v2-trunk.txt
        5 kB
        Ted Yu
      3. HBASE-10153-0.94-v1.patch
        4 kB
        cuijianwei
      4. HBASE-10153-trunk.patch
        5 kB
        cuijianwei

        Activity

        Hide
        cuijianwei added a comment -

        This patch try to improve BADROWS computing including:
        1. BADROWS is refined to ONLY_IN_SOURCE_TABLE_ROWS, ONLY_IN_PEER_TABLE_ROWS and CONTENT_DIFFERENT_ROWS.
        2. compute these counters in merge comparison between source and peer table.

        Show
        cuijianwei added a comment - This patch try to improve BADROWS computing including: 1. BADROWS is refined to ONLY_IN_SOURCE_TABLE_ROWS, ONLY_IN_PEER_TABLE_ROWS and CONTENT_DIFFERENT_ROWS. 2. compute these counters in merge comparison between source and peer table.
        Hide
        chendihao added a comment -

        Without this improvement, there're so many BADROWS while these tow tables are consistent. It's a little misleading.

        Please have a look. stack Lars Hofhansl

        Show
        chendihao added a comment - Without this improvement, there're so many BADROWS while these tow tables are consistent. It's a little misleading. Please have a look. stack Lars Hofhansl
        Hide
        Ted Yu added a comment -

        Can you provide patch for trunk ?

        Show
        Ted Yu added a comment - Can you provide patch for trunk ?
        Hide
        cuijianwei added a comment -

        Ted Yu, thanks for your concern, I add a patch for trunk, please have a look.

        Show
        cuijianwei added a comment - Ted Yu , thanks for your concern, I add a patch for trunk, please have a look.
        Hide
        stack added a comment -

        This looks like nice utility.

        This looks like a 'fix', an important change?

        • scan.setStartRow(value.getRow());
          + scan.setStartRow(tableSplit.getStartRow());
          + scan.setStopRow(tableSplit.getEndRow());

        Patch LGTM.

        Anyone more familiar with this tool want to chime in?

        Show
        stack added a comment - This looks like nice utility. This looks like a 'fix', an important change? scan.setStartRow(value.getRow()); + scan.setStartRow(tableSplit.getStartRow()); + scan.setStopRow(tableSplit.getEndRow()); Patch LGTM. Anyone more familiar with this tool want to chime in?
        Hide
        Andrew Purtell added a comment -

        There was no additional comment. Will commit shortly.

        Show
        Andrew Purtell added a comment - There was no additional comment. Will commit shortly.
        Hide
        Andrew Purtell added a comment -

        Current patch is stale. Any chance of an update cuijianwei ? I promise a quick commit after.

        Show
        Andrew Purtell added a comment - Current patch is stale. Any chance of an update cuijianwei ? I promise a quick commit after.
        Hide
        Ted Yu added a comment -

        Rebased patch

        Show
        Ted Yu added a comment - Rebased patch
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12672577/10153-v2-trunk.txt
        against trunk revision .
        ATTACHMENT ID: 12672577

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 lineLengths. The patch does not introduce lines longer than 100

        +1 site. The mvn site goal succeeds with this patch.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.master.TestDistributedLogSplitting

        -1 core zombie tests. There are 2 zombie test(s): at org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFilesSplitRecovery.testSplitWhileBulkLoadPhase(TestLoadIncrementalHFilesSplitRecovery.java:339)
        at org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles.testSimpleLoad(TestLoadIncrementalHFiles.java:100)

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672577/10153-v2-trunk.txt against trunk revision . ATTACHMENT ID: 12672577 +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 site . The mvn site goal succeeds with this patch. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.master.TestDistributedLogSplitting -1 core zombie tests . There are 2 zombie test(s): at org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFilesSplitRecovery.testSplitWhileBulkLoadPhase(TestLoadIncrementalHFilesSplitRecovery.java:339) at org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles.testSimpleLoad(TestLoadIncrementalHFiles.java:100) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//console This message is automatically generated.
        Hide
        Ted Yu added a comment -

        Patch for 0.98
        Andrew Purtell:
        Do you want to take one more look ?

        Show
        Ted Yu added a comment - Patch for 0.98 Andrew Purtell : Do you want to take one more look ?
        Hide
        Andrew Purtell added a comment -

        Skimmed it, lgtm

        Show
        Andrew Purtell added a comment - Skimmed it, lgtm
        Hide
        Ted Yu added a comment -

        Thanks for the patch, Jianwei

        Thanks for the reviews.

        Show
        Ted Yu added a comment - Thanks for the patch, Jianwei Thanks for the reviews.
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #537 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/537/)
        HBASE-10153 improve VerifyReplication to compute BADROWS more accurately (Jianwei) (tedyu: rev a3cfd5233dfbfdd57ac445acd0886df2f8bae895)

        • hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #537 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/537/ ) HBASE-10153 improve VerifyReplication to compute BADROWS more accurately (Jianwei) (tedyu: rev a3cfd5233dfbfdd57ac445acd0886df2f8bae895) hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-0.98 #565 (See https://builds.apache.org/job/HBase-0.98/565/)
        HBASE-10153 improve VerifyReplication to compute BADROWS more accurately (Jianwei) (tedyu: rev a3cfd5233dfbfdd57ac445acd0886df2f8bae895)

        • hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-0.98 #565 (See https://builds.apache.org/job/HBase-0.98/565/ ) HBASE-10153 improve VerifyReplication to compute BADROWS more accurately (Jianwei) (tedyu: rev a3cfd5233dfbfdd57ac445acd0886df2f8bae895) hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-1.0 #266 (See https://builds.apache.org/job/HBase-1.0/266/)
        HBASE-10153 improve VerifyReplication to compute BADROWS more accurately (Jianwei) (tedyu: rev a2fe4d6700c83a467b053f2d04115c69a27f3c79)

        • hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-1.0 #266 (See https://builds.apache.org/job/HBase-1.0/266/ ) HBASE-10153 improve VerifyReplication to compute BADROWS more accurately (Jianwei) (tedyu: rev a2fe4d6700c83a467b053f2d04115c69a27f3c79) hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-TRUNK #5613 (See https://builds.apache.org/job/HBase-TRUNK/5613/)
        HBASE-10153 improve VerifyReplication to compute BADROWS more accurately (Jianwei) (tedyu: rev 8dbf7b22381dab18f9af13318c16181c42824d46)

        • hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-TRUNK #5613 (See https://builds.apache.org/job/HBase-TRUNK/5613/ ) HBASE-10153 improve VerifyReplication to compute BADROWS more accurately (Jianwei) (tedyu: rev 8dbf7b22381dab18f9af13318c16181c42824d46) hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java
        Hide
        Enis Soztutar added a comment -

        Closing this issue after 0.99.1 release.

        Show
        Enis Soztutar added a comment - Closing this issue after 0.99.1 release.

          People

          • Assignee:
            cuijianwei
            Reporter:
            cuijianwei
          • Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development