HBase
  1. HBase
  2. HBASE-10153

improve VerifyReplication to compute BADROWS more accurately

    Details

    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      VerifyReplicaiton reports the following counters besides the existing ones:

      ONLY_IN_SOURCE_TABLE_ROWS: number of rows found only in source
      ONLY_IN_PEER_TABLE_ROWS: number of rows found only in peer
      CONTENT_DIFFERENT_ROWS: number of rows whose contents are different between source and peer
      Show
      VerifyReplicaiton reports the following counters besides the existing ones: ONLY_IN_SOURCE_TABLE_ROWS: number of rows found only in source ONLY_IN_PEER_TABLE_ROWS: number of rows found only in peer CONTENT_DIFFERENT_ROWS: number of rows whose contents are different between source and peer

      Description

      VerifyReplicaiton could compare the source table with its peer table and compute BADROWS. However, the current BADROWS computing method might not be accurate enough. For example, if source table contains rows as

      {r1, r2, r3, r4}

      and peer table contains rows as

      {r1, r3, r4}

      BADROWS will be 3 because 'r2' in source table will make all the later row comparisons fail. Will it be better if the BADROWS is computed to 1 in this situation? Maybe, we can compute the BADROWS more accurately in merge comparison?

      1. 10153-0.98.txt
        5 kB
        Ted Yu
      2. 10153-v2-trunk.txt
        5 kB
        Ted Yu
      3. HBASE-10153-0.94-v1.patch
        4 kB
        cuijianwei
      4. HBASE-10153-trunk.patch
        5 kB
        cuijianwei

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        293d 12h 47m 1 Ted Yu 02/Oct/14 17:41
        Patch Available Patch Available Open Open
        3h 52m 1 Ted Yu 02/Oct/14 21:34
        Open Open Resolved Resolved
        6h 57m 1 Ted Yu 03/Oct/14 04:31
        Resolved Resolved Closed Closed
        141d 20h 9m 1 Enis Soztutar 21/Feb/15 23:41
        Enis Soztutar made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Enis Soztutar added a comment -

        Closing this issue after 0.99.1 release.

        Show
        Enis Soztutar added a comment - Closing this issue after 0.99.1 release.
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-TRUNK #5613 (See https://builds.apache.org/job/HBase-TRUNK/5613/)
        HBASE-10153 improve VerifyReplication to compute BADROWS more accurately (Jianwei) (tedyu: rev 8dbf7b22381dab18f9af13318c16181c42824d46)

        • hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-TRUNK #5613 (See https://builds.apache.org/job/HBase-TRUNK/5613/ ) HBASE-10153 improve VerifyReplication to compute BADROWS more accurately (Jianwei) (tedyu: rev 8dbf7b22381dab18f9af13318c16181c42824d46) hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-1.0 #266 (See https://builds.apache.org/job/HBase-1.0/266/)
        HBASE-10153 improve VerifyReplication to compute BADROWS more accurately (Jianwei) (tedyu: rev a2fe4d6700c83a467b053f2d04115c69a27f3c79)

        • hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-1.0 #266 (See https://builds.apache.org/job/HBase-1.0/266/ ) HBASE-10153 improve VerifyReplication to compute BADROWS more accurately (Jianwei) (tedyu: rev a2fe4d6700c83a467b053f2d04115c69a27f3c79) hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-0.98 #565 (See https://builds.apache.org/job/HBase-0.98/565/)
        HBASE-10153 improve VerifyReplication to compute BADROWS more accurately (Jianwei) (tedyu: rev a3cfd5233dfbfdd57ac445acd0886df2f8bae895)

        • hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-0.98 #565 (See https://builds.apache.org/job/HBase-0.98/565/ ) HBASE-10153 improve VerifyReplication to compute BADROWS more accurately (Jianwei) (tedyu: rev a3cfd5233dfbfdd57ac445acd0886df2f8bae895) hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #537 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/537/)
        HBASE-10153 improve VerifyReplication to compute BADROWS more accurately (Jianwei) (tedyu: rev a3cfd5233dfbfdd57ac445acd0886df2f8bae895)

        • hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #537 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/537/ ) HBASE-10153 improve VerifyReplication to compute BADROWS more accurately (Jianwei) (tedyu: rev a3cfd5233dfbfdd57ac445acd0886df2f8bae895) hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java
        Ted Yu made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Ted Yu added a comment -

        Thanks for the patch, Jianwei

        Thanks for the reviews.

        Show
        Ted Yu added a comment - Thanks for the patch, Jianwei Thanks for the reviews.
        Ted Yu made changes -
        Release Note VerifyReplicaiton reports the following counters besides the existing ones:

        ONLY_IN_SOURCE_TABLE_ROWS: number of rows found only in source
        ONLY_IN_PEER_TABLE_ROWS: number of rows found only in peer
        CONTENT_DIFFERENT_ROWS: number of rows whose contents are different between source and peer
        Hide
        Andrew Purtell added a comment -

        Skimmed it, lgtm

        Show
        Andrew Purtell added a comment - Skimmed it, lgtm
        Ted Yu made changes -
        Attachment 10153-0.98.txt [ 12672627 ]
        Hide
        Ted Yu added a comment -

        Patch for 0.98
        Andrew Purtell:
        Do you want to take one more look ?

        Show
        Ted Yu added a comment - Patch for 0.98 Andrew Purtell : Do you want to take one more look ?
        Ted Yu made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12672577/10153-v2-trunk.txt
        against trunk revision .
        ATTACHMENT ID: 12672577

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 lineLengths. The patch does not introduce lines longer than 100

        +1 site. The mvn site goal succeeds with this patch.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.master.TestDistributedLogSplitting

        -1 core zombie tests. There are 2 zombie test(s): at org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFilesSplitRecovery.testSplitWhileBulkLoadPhase(TestLoadIncrementalHFilesSplitRecovery.java:339)
        at org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles.testSimpleLoad(TestLoadIncrementalHFiles.java:100)

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672577/10153-v2-trunk.txt against trunk revision . ATTACHMENT ID: 12672577 +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 site . The mvn site goal succeeds with this patch. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.master.TestDistributedLogSplitting -1 core zombie tests . There are 2 zombie test(s): at org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFilesSplitRecovery.testSplitWhileBulkLoadPhase(TestLoadIncrementalHFilesSplitRecovery.java:339) at org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles.testSimpleLoad(TestLoadIncrementalHFiles.java:100) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11190//console This message is automatically generated.
        Ted Yu made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hadoop Flags Reviewed [ 10343 ]
        Ted Yu made changes -
        Attachment 10153-v2-trunk.txt [ 12672577 ]
        Hide
        Ted Yu added a comment -

        Rebased patch

        Show
        Ted Yu added a comment - Rebased patch
        Hide
        Andrew Purtell added a comment -

        Current patch is stale. Any chance of an update cuijianwei ? I promise a quick commit after.

        Show
        Andrew Purtell added a comment - Current patch is stale. Any chance of an update cuijianwei ? I promise a quick commit after.
        Hide
        Andrew Purtell added a comment -

        There was no additional comment. Will commit shortly.

        Show
        Andrew Purtell added a comment - There was no additional comment. Will commit shortly.
        stack made changes -
        Component/s Operability [ 12321806 ]
        Hide
        stack added a comment -

        This looks like nice utility.

        This looks like a 'fix', an important change?

        • scan.setStartRow(value.getRow());
          + scan.setStartRow(tableSplit.getStartRow());
          + scan.setStopRow(tableSplit.getEndRow());

        Patch LGTM.

        Anyone more familiar with this tool want to chime in?

        Show
        stack added a comment - This looks like nice utility. This looks like a 'fix', an important change? scan.setStartRow(value.getRow()); + scan.setStartRow(tableSplit.getStartRow()); + scan.setStopRow(tableSplit.getEndRow()); Patch LGTM. Anyone more familiar with this tool want to chime in?
        Enis Soztutar made changes -
        Fix Version/s 0.99.1 [ 12328551 ]
        Fix Version/s 0.99.0 [ 12325675 ]
        Hide
        cuijianwei added a comment -

        Ted Yu, thanks for your concern, I add a patch for trunk, please have a look.

        Show
        cuijianwei added a comment - Ted Yu , thanks for your concern, I add a patch for trunk, please have a look.
        cuijianwei made changes -
        Attachment HBASE-10153-trunk.patch [ 12664593 ]
        Ted Yu made changes -
        Fix Version/s 0.98.7 [ 12327560 ]
        Fix Version/s 0.98.6 [ 12327376 ]
        Ted Yu made changes -
        Fix Version/s 0.99.0 [ 12325675 ]
        Fix Version/s 2.0.0 [ 12327188 ]
        Fix Version/s 0.98.6 [ 12327376 ]
        Ted Yu made changes -
        Assignee cuijianwei [ cuijianwei ]
        Hide
        Ted Yu added a comment -

        Can you provide patch for trunk ?

        Show
        Ted Yu added a comment - Can you provide patch for trunk ?
        Hide
        chendihao added a comment -

        Without this improvement, there're so many BADROWS while these tow tables are consistent. It's a little misleading.

        Please have a look. stack Lars Hofhansl

        Show
        chendihao added a comment - Without this improvement, there're so many BADROWS while these tow tables are consistent. It's a little misleading. Please have a look. stack Lars Hofhansl
        cuijianwei made changes -
        Description VerifyReplicaiton could compare the source table with its peer table and compute BADROWS. However, the current BADROWS computing method might not be accurate enough. For example, if source table contains rows as {r1, r2, r3, r4} and peer table contains rows as {r1, r3, r4}, the BADROWS counter will be 3 because 'r2' in source table will make all the later comparisons fail. Will it be better if the BADROWS is computed to 1 in this situation? Maybe, we can compute the BADROWS more accurately in merge comparison? VerifyReplicaiton could compare the source table with its peer table and compute BADROWS. However, the current BADROWS computing method might not be accurate enough. For example, if source table contains rows as {r1, r2, r3, r4} and peer table contains rows as {r1, r3, r4} BADROWS will be 3 because 'r2' in source table will make all the later row comparisons fail. Will it be better if the BADROWS is computed to 1 in this situation? Maybe, we can compute the BADROWS more accurately in merge comparison?
        cuijianwei made changes -
        Field Original Value New Value
        Attachment HBASE-10153-0.94-v1.patch [ 12618537 ]
        Hide
        cuijianwei added a comment -

        This patch try to improve BADROWS computing including:
        1. BADROWS is refined to ONLY_IN_SOURCE_TABLE_ROWS, ONLY_IN_PEER_TABLE_ROWS and CONTENT_DIFFERENT_ROWS.
        2. compute these counters in merge comparison between source and peer table.

        Show
        cuijianwei added a comment - This patch try to improve BADROWS computing including: 1. BADROWS is refined to ONLY_IN_SOURCE_TABLE_ROWS, ONLY_IN_PEER_TABLE_ROWS and CONTENT_DIFFERENT_ROWS. 2. compute these counters in merge comparison between source and peer table.
        cuijianwei created issue -

          People

          • Assignee:
            cuijianwei
            Reporter:
            cuijianwei
          • Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development