Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-10249

TestReplicationSyncUpTool fails because failover takes too long

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.98.0, 0.96.2, 0.99.0, 0.94.17
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      This change also fixes a potential data loss issue when using ZK multi actions because region servers could try to failover themselves (the replication sync up tool acts as a RS too)

      Description

      New issue to keep track of this.

      1. HBASE-10249-0.94-v0.patch
        2 kB
        Jean-Daniel Cryans
      2. HBASE-10249-0.94-v1.patch
        2 kB
        Jean-Daniel Cryans
      3. HBASE-10249-trunk-v0.patch
        5 kB
        Demai Ni
      4. HBASE-10249-trunk-v1.patch
        3 kB
        Jean-Daniel Cryans

        Issue Links

          Activity

          Hide
          nidmhbase Demai Ni added a comment -

          Lars Hofhansl, thanks. I am just back and will work on it.

          Show
          nidmhbase Demai Ni added a comment - Lars Hofhansl , thanks. I am just back and will work on it.
          Hide
          nidmhbase Demai Ni added a comment -

          Lars Hofhansl, I still couldn't recreate the problem locally. the Patch added a loop to run syncup a few times, with the hope the timing issue will be avoided by multiple run. If still fails, a debug info is added to check the source.

          Show
          nidmhbase Demai Ni added a comment - Lars Hofhansl , I still couldn't recreate the problem locally. the Patch added a loop to run syncup a few times, with the hope the timing issue will be avoided by multiple run. If still fails, a debug info is added to check the source.
          Hide
          lhofhansl Lars Hofhansl added a comment -

          Looks reasonable to me. Will commit soon.

          Show
          lhofhansl Lars Hofhansl added a comment - Looks reasonable to me. Will commit soon.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12620874/HBASE-10249-trunk-v0.patch
          against trunk revision .
          ATTACHMENT ID: 12620874

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop1.1. The patch compiles against the hadoop 1.1 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          -1 site. The patch appears to cause mvn site goal to fail.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8306//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8306//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8306//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8306//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8306//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8306//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8306//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8306//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8306//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8306//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8306//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12620874/HBASE-10249-trunk-v0.patch against trunk revision . ATTACHMENT ID: 12620874 +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop1.1 . The patch compiles against the hadoop 1.1 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8306//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8306//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8306//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8306//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8306//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8306//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8306//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8306//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8306//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8306//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8306//console This message is automatically generated.
          Hide
          nidmhbase Demai Ni added a comment -

          Lars, thanks. Hopefully, it works this time.... Demai

          Show
          nidmhbase Demai Ni added a comment - Lars, thanks. Hopefully, it works this time.... Demai
          Hide
          yuzhihong@gmail.com Ted Yu added a comment -

          Integrated to trunk.

          Thanks for the patch, Demai.

          Let's see how the test goes in the next builds.

          Show
          yuzhihong@gmail.com Ted Yu added a comment - Integrated to trunk. Thanks for the patch, Demai. Let's see how the test goes in the next builds.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in HBase-TRUNK #4772 (See https://builds.apache.org/job/HBase-TRUNK/4772/)
          HBASE-10249 Intermittent TestReplicationSyncUpTool failure (tedyu: rev 1554367)

          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSyncUpTool.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in HBase-TRUNK #4772 (See https://builds.apache.org/job/HBase-TRUNK/4772/ ) HBASE-10249 Intermittent TestReplicationSyncUpTool failure (tedyu: rev 1554367) /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSyncUpTool.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in HBase-TRUNK-on-Hadoop-1.1 #30 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/30/)
          HBASE-10249 Intermittent TestReplicationSyncUpTool failure (tedyu: rev 1554367)

          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSyncUpTool.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in HBase-TRUNK-on-Hadoop-1.1 #30 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/30/ ) HBASE-10249 Intermittent TestReplicationSyncUpTool failure (tedyu: rev 1554367) /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSyncUpTool.java
          Hide
          yuzhihong@gmail.com Ted Yu added a comment -

          Andrew Purtell:
          TestReplicationSyncUpTool has been passing lately on HBase-TRUNK-on-Hadoop-1.1 and HBase-TRUNK.

          Do you want this in 0.98 ?

          Show
          yuzhihong@gmail.com Ted Yu added a comment - Andrew Purtell : TestReplicationSyncUpTool has been passing lately on HBase-TRUNK-on-Hadoop-1.1 and HBase-TRUNK. Do you want this in 0.98 ?
          Hide
          lhofhansl Lars Hofhansl added a comment -

          This is a test fix, I'll commit this to 0.94 in any case.

          Show
          lhofhansl Lars Hofhansl added a comment - This is a test fix, I'll commit this to 0.94 in any case.
          Hide
          nidmhbase Demai Ni added a comment -

          Lars Hofhansl,Ted Yu, Thanks a lot, and happy New Year! … Demai

          Show
          nidmhbase Demai Ni added a comment - Lars Hofhansl , Ted Yu , Thanks a lot, and happy New Year! … Demai
          Hide
          apurtell Andrew Purtell added a comment -

          Do you want this in 0.98 ?

          +1

          Show
          apurtell Andrew Purtell added a comment - Do you want this in 0.98 ? +1
          Hide
          apurtell Andrew Purtell added a comment - - edited

          Committed to 0.98 as r1554867.

          Show
          apurtell Andrew Purtell added a comment - - edited Committed to 0.98 as r1554867.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #46 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/46/)
          HBASE-10249. Intermittent TestReplicationSyncUpTool failure (Demai Ni) (apurtell: rev 1554867)

          • /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSyncUpTool.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #46 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/46/ ) HBASE-10249 . Intermittent TestReplicationSyncUpTool failure (Demai Ni) (apurtell: rev 1554867) /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSyncUpTool.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in HBase-0.98 #49 (See https://builds.apache.org/job/HBase-0.98/49/)
          HBASE-10249. Intermittent TestReplicationSyncUpTool failure (Demai Ni) (apurtell: rev 1554867)

          • /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSyncUpTool.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in HBase-0.98 #49 (See https://builds.apache.org/job/HBase-0.98/49/ ) HBASE-10249 . Intermittent TestReplicationSyncUpTool failure (Demai Ni) (apurtell: rev 1554867) /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSyncUpTool.java
          Hide
          lhofhansl Lars Hofhansl added a comment -

          stack, I assume you want this in 0.96?

          Show
          lhofhansl Lars Hofhansl added a comment - stack , I assume you want this in 0.96?
          Hide
          lhofhansl Lars Hofhansl added a comment -

          Committed to 0.94.

          Show
          lhofhansl Lars Hofhansl added a comment - Committed to 0.94.
          Hide
          stack stack added a comment -

          +1 for 0.96. Thanks Lars.

          Show
          stack stack added a comment - +1 for 0.96. Thanks Lars.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in HBase-0.94-security #377 (See https://builds.apache.org/job/HBase-0.94-security/377/)
          HBASE-10249 Intermittent TestReplicationSyncUpTool failure (Demai Ni) (larsh: rev 1554937)

          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSyncUpTool.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in HBase-0.94-security #377 (See https://builds.apache.org/job/HBase-0.94-security/377/ ) HBASE-10249 Intermittent TestReplicationSyncUpTool failure (Demai Ni) (larsh: rev 1554937) /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSyncUpTool.java
          Hide
          lhofhansl Lars Hofhansl added a comment -

          Cool. Committed to 0.96 as well.

          Show
          lhofhansl Lars Hofhansl added a comment - Cool. Committed to 0.96 as well.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in HBase-0.94-JDK7 #7 (See https://builds.apache.org/job/HBase-0.94-JDK7/7/)
          HBASE-10249 Intermittent TestReplicationSyncUpTool failure (Demai Ni) (larsh: rev 1554937)

          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSyncUpTool.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in HBase-0.94-JDK7 #7 (See https://builds.apache.org/job/HBase-0.94-JDK7/7/ ) HBASE-10249 Intermittent TestReplicationSyncUpTool failure (Demai Ni) (larsh: rev 1554937) /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSyncUpTool.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in hbase-0.96 #248 (See https://builds.apache.org/job/hbase-0.96/248/)
          HBASE-10249 Intermittent TestReplicationSyncUpTool failure (Demai Ni) (larsh: rev 1554964)

          • /hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSyncUpTool.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in hbase-0.96 #248 (See https://builds.apache.org/job/hbase-0.96/248/ ) HBASE-10249 Intermittent TestReplicationSyncUpTool failure (Demai Ni) (larsh: rev 1554964) /hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSyncUpTool.java
          Hide
          hudson Hudson added a comment -

          ABORTED: Integrated in HBase-0.94 #1245 (See https://builds.apache.org/job/HBase-0.94/1245/)
          HBASE-10249 Intermittent TestReplicationSyncUpTool failure (Demai Ni) (larsh: rev 1554937)

          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSyncUpTool.java
          Show
          hudson Hudson added a comment - ABORTED: Integrated in HBase-0.94 #1245 (See https://builds.apache.org/job/HBase-0.94/1245/ ) HBASE-10249 Intermittent TestReplicationSyncUpTool failure (Demai Ni) (larsh: rev 1554937) /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSyncUpTool.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in hbase-0.96-hadoop2 #168 (See https://builds.apache.org/job/hbase-0.96-hadoop2/168/)
          HBASE-10249 Intermittent TestReplicationSyncUpTool failure (Demai Ni) (larsh: rev 1554964)

          • /hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSyncUpTool.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in hbase-0.96-hadoop2 #168 (See https://builds.apache.org/job/hbase-0.96-hadoop2/168/ ) HBASE-10249 Intermittent TestReplicationSyncUpTool failure (Demai Ni) (larsh: rev 1554964) /hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSyncUpTool.java
          Show
          yuzhihong@gmail.com Ted Yu added a comment - Oops, it failed again: https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/50/testReport/junit/org.apache.hadoop.hbase.replication/TestReplicationSyncUpTool/testSyncUpTool/
          Hide
          nidmhbase Demai Ni added a comment -

          too bad.... this is a tough one. The debug info shows that the data at source is correct. I need to re-exam the logic of both the testcase and the syncup Tool.

          sorry about all the troubles.

          Show
          nidmhbase Demai Ni added a comment - too bad.... this is a tough one. The debug info shows that the data at source is correct. I need to re-exam the logic of both the testcase and the syncup Tool. sorry about all the troubles.
          Hide
          apurtell Andrew Purtell added a comment -

          Let's reopen the issue.

          Show
          apurtell Andrew Purtell added a comment - Let's reopen the issue.
          Hide
          lhofhansl Lars Hofhansl added a comment -

          sorry about all the troubles

          That's what the tests are for
          Replication is in principle asynchronous, it might still just be an issue with the test.

          Show
          lhofhansl Lars Hofhansl added a comment - sorry about all the troubles That's what the tests are for Replication is in principle asynchronous, it might still just be an issue with the test.
          Hide
          jmhsieh Jonathan Hsieh added a comment -

          FWIW, I found it fails consistently if we turn zk multi on.

          In hbaes-default.xml/site.xml

           </property>
             <property>
               <name>hbase.zookeeper.useMulti</name>
          -    <value>false</value>
          +    <value>true</value>
               <description>Instructs HBase to make use of ZooKeeper's multi-update functionality.
               This allows certain ZooKeeper operations to complete more quickly and prevents some issues
               with rare ZooKeeper failure scenarios (see the release note of HBASE-6710 for an example).
               IMPORTANT: only set this to true if all ZooKeeper servers in the cluster are on version 3.4+
               and will not be downgraded.  ZooKeeper versions before 3.4 do not support multi-update and will
               not fail gracefully if multi-update is invoked (see ZOOKEEPER-1495).
          
          Show
          jmhsieh Jonathan Hsieh added a comment - FWIW, I found it fails consistently if we turn zk multi on. In hbaes-default.xml/site.xml </property> <property> <name>hbase.zookeeper.useMulti</name> - <value> false </value> + <value> true </value> <description>Instructs HBase to make use of ZooKeeper's multi-update functionality. This allows certain ZooKeeper operations to complete more quickly and prevents some issues with rare ZooKeeper failure scenarios (see the release note of HBASE-6710 for an example). IMPORTANT: only set this to true if all ZooKeeper servers in the cluster are on version 3.4+ and will not be downgraded. ZooKeeper versions before 3.4 do not support multi-update and will not fail gracefully if multi-update is invoked (see ZOOKEEPER-1495).
          Hide
          nidmhbase Demai Ni added a comment -

          Jonathan,

          Thank you so much. I actually ran out of ideas on this one and I never was able to reproduce the error. Now it is a clue I can follow instead of shooting into dark. Appreciate it

          Demai on the run

          Show
          nidmhbase Demai Ni added a comment - Jonathan, Thank you so much. I actually ran out of ideas on this one and I never was able to reproduce the error. Now it is a clue I can follow instead of shooting into dark. Appreciate it Demai on the run
          Hide
          jdcryans Jean-Daniel Cryans added a comment -

          Two things I've noticed that I'm fixing in the attached patch for 0.94:

          • The multi path doesn't check if the znode that we're moving is ours, so we end up deleting our own queue (!!!).
          • Looking at the link for the latest failure, we do check that in the non-multi path but when we do it it takes a few hundreds of milliseconds. It seems that they all end up counting towards the 10 seconds limit that we have in order to clear all the queues. I moved the checking of the path before the sleeping in NodeFailoverWorker.run so that we don't waste time on ourselves.

          Regardless, this code is racy:

              int numberOfOldSource = 1; // default wait once
                while (numberOfOldSource > 0) {
                  Thread.sleep(SLEEP_TIME);
                  numberOfOldSource = manager.getOldSources().size();
              }
          

          We basically say "let's wait 10 seconds and see if we can transfer all the queues during that time". If some queues are still being transferred, and the others we did transfer are already done, they won't count as an oldSource, and so we can miss them. The most extreme case is moving 1 queue with enough znodes that it takes more than 10 seconds to move (I've seen that), in which case the sync tool will stop even though there might be many more queues to transfer.

          Show
          jdcryans Jean-Daniel Cryans added a comment - Two things I've noticed that I'm fixing in the attached patch for 0.94: The multi path doesn't check if the znode that we're moving is ours, so we end up deleting our own queue (!!!). Looking at the link for the latest failure, we do check that in the non-multi path but when we do it it takes a few hundreds of milliseconds. It seems that they all end up counting towards the 10 seconds limit that we have in order to clear all the queues. I moved the checking of the path before the sleeping in NodeFailoverWorker.run so that we don't waste time on ourselves. Regardless, this code is racy: int numberOfOldSource = 1; // default wait once while (numberOfOldSource > 0) { Thread.sleep(SLEEP_TIME); numberOfOldSource = manager.getOldSources().size(); } We basically say "let's wait 10 seconds and see if we can transfer all the queues during that time". If some queues are still being transferred, and the others we did transfer are already done, they won't count as an oldSource, and so we can miss them. The most extreme case is moving 1 queue with enough znodes that it takes more than 10 seconds to move (I've seen that), in which case the sync tool will stop even though there might be many more queues to transfer.
          Hide
          jdcryans Jean-Daniel Cryans added a comment -

          Actually the last patch's new method wasn't named correctly, new patch includes this cosmetic change.

          Show
          jdcryans Jean-Daniel Cryans added a comment - Actually the last patch's new method wasn't named correctly, new patch includes this cosmetic change.
          Hide
          jmhsieh Jonathan Hsieh added a comment - - edited

          Is the patch relevent to and does it apply to trunk/0.98/0.96 as well?

          Show
          jmhsieh Jonathan Hsieh added a comment - - edited Is the patch relevent to and does it apply to trunk/0.98/0.96 as well?
          Hide
          jdcryans Jean-Daniel Cryans added a comment -

          In trunk, 0.98, and 0.96 the check is in place but it's done after we sleep so hitting the race is what makes it fail.

          Show
          jdcryans Jean-Daniel Cryans added a comment - In trunk, 0.98, and 0.96 the check is in place but it's done after we sleep so hitting the race is what makes it fail.
          Hide
          jdcryans Jean-Daniel Cryans added a comment -

          Patch for trunk, kind of the same thing. Doing it I also saw that I missed something the isThisOurZnode in the v1 0.94 patch, shouldn't talk about parents (not a functional change though).

          Show
          jdcryans Jean-Daniel Cryans added a comment - Patch for trunk, kind of the same thing. Doing it I also saw that I missed something the isThisOurZnode in the v1 0.94 patch, shouldn't talk about parents (not a functional change though).
          Hide
          apurtell Andrew Purtell added a comment -

          I skimmed the trunk patch, +1 for 0.98

          Show
          apurtell Andrew Purtell added a comment - I skimmed the trunk patch, +1 for 0.98
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12623538/HBASE-10249-trunk-v1.patch
          against trunk revision .
          ATTACHMENT ID: 12623538

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

          +1 hadoop1.1. The patch compiles against the hadoop 1.1 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          -1 site. The patch appears to cause mvn site goal to fail.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8451//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8451//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8451//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8451//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8451//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8451//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8451//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8451//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8451//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8451//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8451//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12623538/HBASE-10249-trunk-v1.patch against trunk revision . ATTACHMENT ID: 12623538 +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop1.1 . The patch compiles against the hadoop 1.1 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8451//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8451//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8451//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8451//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8451//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8451//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8451//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8451//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8451//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8451//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8451//console This message is automatically generated.
          Hide
          nidmhbase Demai Ni added a comment -

          Jean-Daniel Cryans,Jonathan Hsieh, many thanks to both of you. You guys are great and fast. I was occupied by my 'day-time' job, just come back to this and found out the patch is there and testing is done. Appreciate it.

          Show
          nidmhbase Demai Ni added a comment - Jean-Daniel Cryans , Jonathan Hsieh , many thanks to both of you. You guys are great and fast. I was occupied by my 'day-time' job, just come back to this and found out the patch is there and testing is done. Appreciate it.
          Hide
          nidmhbase Demai Ni added a comment -

          assigned to Jean-Daniel Cryans who fixed the bug and deserves the credit, where I ran out of idea.

          Show
          nidmhbase Demai Ni added a comment - assigned to Jean-Daniel Cryans who fixed the bug and deserves the credit, where I ran out of idea.
          Hide
          jdcryans Jean-Daniel Cryans added a comment -

          Demai Ni No problem, credits to Jon for asking me to look at it

          Lars Hofhansl you good with this for 0.94 sir?

          Show
          jdcryans Jean-Daniel Cryans added a comment - Demai Ni No problem, credits to Jon for asking me to look at it Lars Hofhansl you good with this for 0.94 sir?
          Hide
          lhofhansl Lars Hofhansl added a comment -

          Yeah, scary stuff, especially since we have multi enabled here. +1

          Should change the title to something more descriptive since this in an actual bug in the replication code.

          The condition here is always false, so removing has no effect, right?

                if (parent.equals(rsServerNameZnode)) {
                  LOG.warn("Won't lock because this is us, we're dead!");
                  return false;
                }
          
          Show
          lhofhansl Lars Hofhansl added a comment - Yeah, scary stuff, especially since we have multi enabled here. +1 Should change the title to something more descriptive since this in an actual bug in the replication code. The condition here is always false, so removing has no effect, right? if (parent.equals(rsServerNameZnode)) { LOG.warn( "Won't lock because this is us, we're dead!" ); return false ; }
          Hide
          lhofhansl Lars Hofhansl added a comment -

          you gonna commit Jean-Daniel Cryans?

          Show
          lhofhansl Lars Hofhansl added a comment - you gonna commit Jean-Daniel Cryans ?
          Hide
          stack stack added a comment -

          Lars Hofhansl Jean-Daniel Cryans, as per usual, is off in 'exotic locations' till Tuesday at least: Cancun this time.

          Show
          stack stack added a comment - Lars Hofhansl Jean-Daniel Cryans , as per usual, is off in 'exotic locations' till Tuesday at least: Cancun this time.
          Hide
          lhofhansl Lars Hofhansl added a comment -

          He's living the life.
          Alright. Looks good to me, I'll do some more test and commit if all looks good.

          Show
          lhofhansl Lars Hofhansl added a comment - He's living the life. Alright. Looks good to me, I'll do some more test and commit if all looks good.
          Hide
          jdcryans Jean-Daniel Cryans added a comment -

          He's living the life.

          Are you not?

          Should change the title to something more descriptive since this in an actual bug in the replication code.

          Well the tool is racy. It can still fail, but it's much much less likely. Agree the title needs to be changed.

          The condition here is always false, so removing has no effect, right?

          The check just happens sooner now.

          Looks good to me, I'll do some more test and commit if all looks good.

          Since I'm back from $exotic_location, you mind if I commit? Your testing came back ok?

          Show
          jdcryans Jean-Daniel Cryans added a comment - He's living the life. Are you not? Should change the title to something more descriptive since this in an actual bug in the replication code. Well the tool is racy. It can still fail, but it's much much less likely. Agree the title needs to be changed. The condition here is always false, so removing has no effect, right? The check just happens sooner now. Looks good to me, I'll do some more test and commit if all looks good. Since I'm back from $exotic_location, you mind if I commit? Your testing came back ok?
          Hide
          lhofhansl Lars Hofhansl added a comment -

          Didn't get to test this. But it looks good. +1 on commit.

          Show
          lhofhansl Lars Hofhansl added a comment - Didn't get to test this. But it looks good. +1 on commit.
          Hide
          jdcryans Jean-Daniel Cryans added a comment -

          Committed everywhere, thanks for the reviews guys and sorry I was off drinking tequila for a few days.

          Show
          jdcryans Jean-Daniel Cryans added a comment - Committed everywhere, thanks for the reviews guys and sorry I was off drinking tequila for a few days.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in HBase-0.94-on-Hadoop-2 #2 (See https://builds.apache.org/job/HBase-0.94-on-Hadoop-2/2/)
          HBASE-10249 TestReplicationSyncUpTool fails because failover takes too long (jdcryans: rev 1560198)

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in HBase-0.94-on-Hadoop-2 #2 (See https://builds.apache.org/job/HBase-0.94-on-Hadoop-2/2/ ) HBASE-10249 TestReplicationSyncUpTool fails because failover takes too long (jdcryans: rev 1560198) /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in HBase-0.94-security #391 (See https://builds.apache.org/job/HBase-0.94-security/391/)
          HBASE-10249 TestReplicationSyncUpTool fails because failover takes too long (jdcryans: rev 1560198)

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in HBase-0.94-security #391 (See https://builds.apache.org/job/HBase-0.94-security/391/ ) HBASE-10249 TestReplicationSyncUpTool fails because failover takes too long (jdcryans: rev 1560198) /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in HBase-0.94-JDK7 #31 (See https://builds.apache.org/job/HBase-0.94-JDK7/31/)
          HBASE-10249 TestReplicationSyncUpTool fails because failover takes too long (jdcryans: rev 1560198)

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in HBase-0.94-JDK7 #31 (See https://builds.apache.org/job/HBase-0.94-JDK7/31/ ) HBASE-10249 TestReplicationSyncUpTool fails because failover takes too long (jdcryans: rev 1560198) /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in HBase-TRUNK #4844 (See https://builds.apache.org/job/HBase-TRUNK/4844/)
          HBASE-10249 TestReplicationSyncUpTool fails because failover takes too long (jdcryans: rev 1560201)

          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationQueues.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationQueuesZKImpl.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in HBase-TRUNK #4844 (See https://builds.apache.org/job/HBase-TRUNK/4844/ ) HBASE-10249 TestReplicationSyncUpTool fails because failover takes too long (jdcryans: rev 1560201) /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationQueues.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationQueuesZKImpl.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in HBase-0.94 #1264 (See https://builds.apache.org/job/HBase-0.94/1264/)
          HBASE-10249 TestReplicationSyncUpTool fails because failover takes too long (jdcryans: rev 1560198)

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in HBase-0.94 #1264 (See https://builds.apache.org/job/HBase-0.94/1264/ ) HBASE-10249 TestReplicationSyncUpTool fails because failover takes too long (jdcryans: rev 1560198) /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in hbase-0.96 #266 (See https://builds.apache.org/job/hbase-0.96/266/)
          HBASE-10249 TestReplicationSyncUpTool fails because failover takes too long (jdcryans: rev 1560199)

          • /hbase/branches/0.96/hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationQueues.java
          • /hbase/branches/0.96/hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationQueuesZKImpl.java
          • /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in hbase-0.96 #266 (See https://builds.apache.org/job/hbase-0.96/266/ ) HBASE-10249 TestReplicationSyncUpTool fails because failover takes too long (jdcryans: rev 1560199) /hbase/branches/0.96/hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationQueues.java /hbase/branches/0.96/hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationQueuesZKImpl.java /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in hbase-0.96-hadoop2 #183 (See https://builds.apache.org/job/hbase-0.96-hadoop2/183/)
          HBASE-10249 TestReplicationSyncUpTool fails because failover takes too long (jdcryans: rev 1560199)

          • /hbase/branches/0.96/hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationQueues.java
          • /hbase/branches/0.96/hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationQueuesZKImpl.java
          • /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in hbase-0.96-hadoop2 #183 (See https://builds.apache.org/job/hbase-0.96-hadoop2/183/ ) HBASE-10249 TestReplicationSyncUpTool fails because failover takes too long (jdcryans: rev 1560199) /hbase/branches/0.96/hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationQueues.java /hbase/branches/0.96/hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationQueuesZKImpl.java /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in HBase-0.98 #100 (See https://builds.apache.org/job/HBase-0.98/100/)
          HBASE-10249 TestReplicationSyncUpTool fails because failover takes too long (jdcryans: rev 1560200)

          • /hbase/branches/0.98/hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationQueues.java
          • /hbase/branches/0.98/hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationQueuesZKImpl.java
          • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in HBase-0.98 #100 (See https://builds.apache.org/job/HBase-0.98/100/ ) HBASE-10249 TestReplicationSyncUpTool fails because failover takes too long (jdcryans: rev 1560200) /hbase/branches/0.98/hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationQueues.java /hbase/branches/0.98/hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationQueuesZKImpl.java /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in HBase-TRUNK-on-Hadoop-1.1 #61 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/61/)
          HBASE-10249 TestReplicationSyncUpTool fails because failover takes too long (jdcryans: rev 1560201)

          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationQueues.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationQueuesZKImpl.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in HBase-TRUNK-on-Hadoop-1.1 #61 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/61/ ) HBASE-10249 TestReplicationSyncUpTool fails because failover takes too long (jdcryans: rev 1560201) /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationQueues.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationQueuesZKImpl.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #94 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/94/)
          HBASE-10249 TestReplicationSyncUpTool fails because failover takes too long (jdcryans: rev 1560200)

          • /hbase/branches/0.98/hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationQueues.java
          • /hbase/branches/0.98/hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationQueuesZKImpl.java
          • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #94 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/94/ ) HBASE-10249 TestReplicationSyncUpTool fails because failover takes too long (jdcryans: rev 1560200) /hbase/branches/0.98/hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationQueues.java /hbase/branches/0.98/hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationQueuesZKImpl.java /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
          Show
          yuzhihong@gmail.com Ted Yu added a comment - Pardon me, it failed lately: https://builds.apache.org/job/HBase-TRUNK/4895/testReport/junit/org.apache.hadoop.hbase.replication/TestReplicationSyncUpTool/testSyncUpTool/

            People

            • Assignee:
              jdcryans Jean-Daniel Cryans
              Reporter:
              lhofhansl Lars Hofhansl
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development