Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-1506

Re-try DNS hostname -> IP resolution if node connection fails

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 3.4.5, 3.4.6
    • Fix Version/s: 3.4.7, 3.5.0, 3.6.0
    • Component/s: server
    • Labels:
    • Environment:

      Ubuntu 11.04 64-bit

    • Release Note:
      Hide
      Tests pass with this patch.
      This patch is for the branch-3.4 branch ONLY.
      Show
      Tests pass with this patch. This patch is for the branch-3.4 branch ONLY.

      Description

      In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (<= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance.

      However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname->IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node – which at three nodes means we periodically lose quorum.

      The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately.

      1. zk-dns-caching-refresh.patch
        7 kB
        Michael Lasevich
      2. ZOOKEEPER-1506.patch
        2 kB
        Michi Mutsuzaki
      3. ZOOKEEPER-1506.patch
        3 kB
        Michi Mutsuzaki
      4. ZOOKEEPER-1506.patch
        3 kB
        Michi Mutsuzaki
      5. ZOOKEEPER-1506.patch
        6 kB
        Michi Mutsuzaki
      6. ZOOKEEPER-1506.patch
        6 kB
        Michi Mutsuzaki
      7. ZOOKEEPER-1506.patch
        7 kB
        Michi Mutsuzaki
      8. ZOOKEEPER-1506.patch
        5 kB
        Michi Mutsuzaki
      9. ZOOKEEPER-1506-fix.patch
        0.7 kB
        Michi Mutsuzaki
      10. Zookeeper-1506.patch
        40 kB
        Robert P. Thille
      11. ZOOKEEPER-1506.patch
        39 kB
        Robert P. Thille
      12. ZOOKEEPER-1506.patch
        36 kB
        Robert P. Thille
      13. ZOOKEEPER-1506.patch
        35 kB
        Robert P. Thille
      14. ZOOKEEPER-1506.patch
        35 kB
        Robert P. Thille

        Issue Links

          Activity

          Hide
          rgs Raul Gutierrez Segales added a comment -

          Great works, thanks Robert P. Thille and Flavio Junqueira!

          Show
          rgs Raul Gutierrez Segales added a comment - Great works, thanks Robert P. Thille and Flavio Junqueira !
          Hide
          fpj Flavio Junqueira added a comment -

          +1, thanks Robert P. Thille.

          Committed revision 1704903. (branch 3.4)

          Show
          fpj Flavio Junqueira added a comment - +1, thanks Robert P. Thille . Committed revision 1704903. (branch 3.4)
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12761513/ZOOKEEPER-1506.patch
          against trunk revision 1702378.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 57 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2884//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12761513/ZOOKEEPER-1506.patch against trunk revision 1702378. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 57 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2884//console This message is automatically generated.
          Hide
          rthille Robert P. Thille added a comment -

          I may have screwed up the 'Fix Version/s' on the issue. I thought it was just applying to the patch I was submitting... Also, I didn't see any way to attach the patch with "submit patch", so I just used the 'More->Attache-Files' item I've been using.

          Show
          rthille Robert P. Thille added a comment - I may have screwed up the 'Fix Version/s' on the issue. I thought it was just applying to the patch I was submitting... Also, I didn't see any way to attach the patch with "submit patch", so I just used the 'More->Attache-Files' item I've been using.
          Hide
          rthille Robert P. Thille added a comment -

          Updated patch to apply against branch-3.4@4277404

          Show
          rthille Robert P. Thille added a comment - Updated patch to apply against branch-3.4@4277404
          Hide
          fpj Flavio Junqueira added a comment -

          There is still one conflict in BaseSysTest. For the next patch, make sure to hit the "Submit patch" button at the top, please. This way QA runs and tell you if the patch applies fine, although it will only check trunk.

          Show
          fpj Flavio Junqueira added a comment - There is still one conflict in BaseSysTest. For the next patch, make sure to hit the "Submit patch" button at the top, please. This way QA runs and tell you if the patch applies fine, although it will only check trunk.
          Hide
          rthille Robert P. Thille added a comment -

          New patch version with whitespace issues fixed.

          Show
          rthille Robert P. Thille added a comment - New patch version with whitespace issues fixed.
          Hide
          rthille Robert P. Thille added a comment -

          Ah, I see what I'm doing wrong. I have my editor strip trailing whitespace, but the How to Contribute page ( http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute ) asks that you not "reformat code unrelated to the bug being fixed", so I was generating the patch with 'git diff --no-prefix -b' go ignore the whitespace changes, but that seems to generate a patch that will not apply. I'll manually fixup the patch and resubmit.

          Show
          rthille Robert P. Thille added a comment - Ah, I see what I'm doing wrong. I have my editor strip trailing whitespace, but the How to Contribute page ( http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute ) asks that you not "reformat code unrelated to the bug being fixed", so I was generating the patch with 'git diff --no-prefix -b' go ignore the whitespace changes, but that seems to generate a patch that will not apply. I'll manually fixup the patch and resubmit.
          Hide
          fpj Flavio Junqueira added a comment - - edited

          The patch still doesn't apply cleanly for me, here is the output of patch:

          patching file .gitignore
          patching file CHANGES.txt
          patching file src/java/main/org/apache/zookeeper/server/quorum/Learner.java
          patching file src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java
          patching file src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java
          Hunk #2 FAILED at 95.
          Hunk #3 FAILED at 116.
          2 out of 3 hunks FAILED -- saving rejects to file src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java.rej
          patching file src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java
          patching file src/java/systest/org/apache/zookeeper/test/system/BaseSysTest.java
          patching file src/java/systest/org/apache/zookeeper/test/system/QuorumPeerInstance.java
          patching file src/java/test/org/apache/zookeeper/server/quorum/CnxManagerTest.java
          Hunk #2 FAILED at 164.
          1 out of 2 hunks FAILED -- saving rejects to file src/java/test/org/apache/zookeeper/server/quorum/CnxManagerTest.java.rej
          patching file src/java/test/org/apache/zookeeper/server/quorum/FLEBackwardElectionRoundTest.java
          patching file src/java/test/org/apache/zookeeper/server/quorum/FLECompatibilityTest.java
          patching file src/java/test/org/apache/zookeeper/server/quorum/FLEDontCareTest.java
          patching file src/java/test/org/apache/zookeeper/server/quorum/FLELostMessageTest.java
          patching file src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java
          patching file src/java/test/org/apache/zookeeper/test/FLENewEpochTest.java
          patching file src/java/test/org/apache/zookeeper/test/FLEPredicateTest.java
          patching file src/java/test/org/apache/zookeeper/test/FLERestartTest.java
          patching file src/java/test/org/apache/zookeeper/test/FLETest.java
          patching file src/java/test/org/apache/zookeeper/test/FLEZeroWeightTest.java
          patching file src/java/test/org/apache/zookeeper/test/HierarchicalQuorumTest.java
          Hunk #1 FAILED at 128.
          1 out of 1 hunk FAILED -- saving rejects to file src/java/test/org/apache/zookeeper/test/HierarchicalQuorumTest.java.rej
          patching file src/java/test/org/apache/zookeeper/test/LENonTerminateTest.java
          patching file src/java/test/org/apache/zookeeper/test/LETest.java
          patching file src/java/test/org/apache/zookeeper/test/QuorumBase.java
          Hunk #1 FAILED at 118.
          Hunk #2 FAILED at 227.
          2 out of 2 hunks FAILED -- saving rejects to file src/java/test/org/apache/zookeeper/test/QuorumBase.java.rej
          patching file src/java/test/org/apache/zookeeper/test/QuorumUtil.java
          patching file src/java/test/org/apache/zookeeper/test/TruncateTest.java
          Hunk #1 succeeded at 201 with fuzz 1.
          

          btw, you don't have to change CHANGES.txt yourself, in fact, it is probably best that you don't.

          Show
          fpj Flavio Junqueira added a comment - - edited The patch still doesn't apply cleanly for me, here is the output of patch: patching file .gitignore patching file CHANGES.txt patching file src/java/main/org/apache/zookeeper/server/quorum/Learner.java patching file src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java patching file src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java Hunk #2 FAILED at 95. Hunk #3 FAILED at 116. 2 out of 3 hunks FAILED -- saving rejects to file src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java.rej patching file src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java patching file src/java/systest/org/apache/zookeeper/test/system/BaseSysTest.java patching file src/java/systest/org/apache/zookeeper/test/system/QuorumPeerInstance.java patching file src/java/test/org/apache/zookeeper/server/quorum/CnxManagerTest.java Hunk #2 FAILED at 164. 1 out of 2 hunks FAILED -- saving rejects to file src/java/test/org/apache/zookeeper/server/quorum/CnxManagerTest.java.rej patching file src/java/test/org/apache/zookeeper/server/quorum/FLEBackwardElectionRoundTest.java patching file src/java/test/org/apache/zookeeper/server/quorum/FLECompatibilityTest.java patching file src/java/test/org/apache/zookeeper/server/quorum/FLEDontCareTest.java patching file src/java/test/org/apache/zookeeper/server/quorum/FLELostMessageTest.java patching file src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java patching file src/java/test/org/apache/zookeeper/test/FLENewEpochTest.java patching file src/java/test/org/apache/zookeeper/test/FLEPredicateTest.java patching file src/java/test/org/apache/zookeeper/test/FLERestartTest.java patching file src/java/test/org/apache/zookeeper/test/FLETest.java patching file src/java/test/org/apache/zookeeper/test/FLEZeroWeightTest.java patching file src/java/test/org/apache/zookeeper/test/HierarchicalQuorumTest.java Hunk #1 FAILED at 128. 1 out of 1 hunk FAILED -- saving rejects to file src/java/test/org/apache/zookeeper/test/HierarchicalQuorumTest.java.rej patching file src/java/test/org/apache/zookeeper/test/LENonTerminateTest.java patching file src/java/test/org/apache/zookeeper/test/LETest.java patching file src/java/test/org/apache/zookeeper/test/QuorumBase.java Hunk #1 FAILED at 118. Hunk #2 FAILED at 227. 2 out of 2 hunks FAILED -- saving rejects to file src/java/test/org/apache/zookeeper/test/QuorumBase.java.rej patching file src/java/test/org/apache/zookeeper/test/QuorumUtil.java patching file src/java/test/org/apache/zookeeper/test/TruncateTest.java Hunk #1 succeeded at 201 with fuzz 1. btw, you don't have to change CHANGES.txt yourself, in fact, it is probably best that you don't.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12755248/ZOOKEEPER-1506.patch
          against trunk revision 1702378.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 57 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2871//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12755248/ZOOKEEPER-1506.patch against trunk revision 1702378. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 57 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2871//console This message is automatically generated.
          Hide
          rthille Robert P. Thille added a comment -

          New version of the ZOOKEEPER-1506.patch against branch-3.4 which fixes style issues.

          Show
          rthille Robert P. Thille added a comment - New version of the ZOOKEEPER-1506 .patch against branch-3.4 which fixes style issues.
          Hide
          rthille Robert P. Thille added a comment -

          The call to the QuorumServer constructor sets 'type' to null, but the QuorumServer constructor checks for null and doesn't set the instance's type variable in that case, so it still gets the default of PARTICIPANT.

          Show
          rthille Robert P. Thille added a comment - The call to the QuorumServer constructor sets 'type' to null, but the QuorumServer constructor checks for null and doesn't set the instance's type variable in that case, so it still gets the default of PARTICIPANT.
          Hide
          fpj Flavio Junqueira added a comment -

          don't think it matters if type is null there. I'm not dereferencing type, just comparing it to the constant

          Indeed, but there is still the issue that the default was PARTICIPANT and you're changing to null. I haven't tried to determine the implications of the change, but it sounds better to just keep what it is currently.

          10.1.1.x isn't necessarily a dead address (and isn't in our network), so the tests would actually hit running Zookeeper servers, so I switched it to the standard ( https://tools.ietf.org/html/rfc5735 ) TEST-NET-1 address range.

          Sounds good.

          Show
          fpj Flavio Junqueira added a comment - don't think it matters if type is null there. I'm not dereferencing type, just comparing it to the constant Indeed, but there is still the issue that the default was PARTICIPANT and you're changing to null. I haven't tried to determine the implications of the change, but it sounds better to just keep what it is currently. 10.1.1.x isn't necessarily a dead address (and isn't in our network), so the tests would actually hit running Zookeeper servers, so I switched it to the standard ( https://tools.ietf.org/html/rfc5735 ) TEST- NET-1 address range. Sounds good.
          Hide
          rthille Robert P. Thille added a comment -

          0. Sorry, I'll regen the patch
          1. The style stuff is largely from the patches I applied, but I'll clean that up and re-submit the patch
          2. I'm no Java expert, but don't think it matters if type is null there. I'm not dereferencing type, just comparing it to the constant...
          3. 10.1.1.x isn't necessarily a dead address (and isn't in our network), so the tests would actually hit running Zookeeper servers, so I switched it to the standard ( https://tools.ietf.org/html/rfc5735 ) TEST-NET-1 address range.

          Show
          rthille Robert P. Thille added a comment - 0. Sorry, I'll regen the patch 1. The style stuff is largely from the patches I applied, but I'll clean that up and re-submit the patch 2. I'm no Java expert, but don't think it matters if type is null there. I'm not dereferencing type, just comparing it to the constant... 3. 10.1.1.x isn't necessarily a dead address (and isn't in our network), so the tests would actually hit running Zookeeper servers, so I switched it to the standard ( https://tools.ietf.org/html/rfc5735 ) TEST- NET-1 address range.
          Hide
          fpj Flavio Junqueira added a comment -

          A few issues I observed with this patch:

          1. Please follow our coding style and add spaces accordingly, e.g., `if (parts.length>2)` should be `if (parts.length > 2)`
          2. type can be null here `if (type == LearnerType.OBSERVER)` in the case of parts.length being less or equal to 3 and will throw an NPE
          3. Why have you changed the address range here `String deadAddress = new String("192.0.2." + finalOctet);`?
          Show
          fpj Flavio Junqueira added a comment - A few issues I observed with this patch: Please follow our coding style and add spaces accordingly, e.g., `if (parts.length>2)` should be `if (parts.length > 2)` type can be null here `if (type == LearnerType.OBSERVER)` in the case of parts.length being less or equal to 3 and will throw an NPE Why have you changed the address range here `String deadAddress = new String("192.0.2." + finalOctet);`?
          Hide
          fpj Flavio Junqueira added a comment -

          Robert P. Thille could you generate the patch with --no-prefix, please?

          Show
          fpj Flavio Junqueira added a comment - Robert P. Thille could you generate the patch with --no-prefix, please?
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12752610/ZOOKEEPER-1506.patch
          against trunk revision 1697551.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 70 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2844//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12752610/ZOOKEEPER-1506.patch against trunk revision 1697551. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 70 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2844//console This message is automatically generated.
          Hide
          rthille Robert P. Thille added a comment -

          Rebased my zookeeper-1506 patch against branch-3.4's HEAD

          Show
          rthille Robert P. Thille added a comment - Rebased my zookeeper-1506 patch against branch-3.4's HEAD
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12752535/Zookeeper-1506.patch
          against trunk revision 1697551.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 70 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2842//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12752535/Zookeeper-1506.patch against trunk revision 1697551. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 70 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2842//console This message is automatically generated.
          Hide
          rthille Robert P. Thille added a comment -

          Here's the patch I mentioned in a previous comment. Fixes 1506 and the tests pass. However, it's based on release-3.4.6, not the 3.4 branch.

          Show
          rthille Robert P. Thille added a comment - Here's the patch I mentioned in a previous comment. Fixes 1506 and the tests pass. However, it's based on release-3.4.6, not the 3.4 branch.
          Hide
          rthille Robert P. Thille added a comment -

          I've got a patch (against release-3.4.6) which we're using in-house which includes fixes to the tests. Not sure how applicable it'd be to the 3.4 branch (we wanted minimal changes to the stable release). I had to add one more call to s.recreateSocketAddresses() in Learner.java to get it to function properly with my (not-included, too dependent on our test environment) integration tests. I'm sending a request to Legal to get the release approval (likely a rubber-stamp).

          Show
          rthille Robert P. Thille added a comment - I've got a patch (against release-3.4.6) which we're using in-house which includes fixes to the tests. Not sure how applicable it'd be to the 3.4 branch (we wanted minimal changes to the stable release). I had to add one more call to s.recreateSocketAddresses() in Learner.java to get it to function properly with my (not-included, too dependent on our test environment) integration tests. I'm sending a request to Legal to get the release approval (likely a rubber-stamp).
          Hide
          rgs Raul Gutierrez Segales added a comment -

          André Cruz: sorry, I dropped the ball here. I'll prepare a patch for 3.4 later today, and we'll include it on the 3.4.7 release. Thanks!

          Show
          rgs Raul Gutierrez Segales added a comment - André Cruz : sorry, I dropped the ball here. I'll prepare a patch for 3.4 later today, and we'll include it on the 3.4.7 release. Thanks!
          Hide
          edevil André Cruz added a comment -

          Any news regarding the inclusion of this fix in a stable zookeeper version?

          Show
          edevil André Cruz added a comment - Any news regarding the inclusion of this fix in a stable zookeeper version?
          Hide
          rgs Raul Gutierrez Segales added a comment -

          Looks like we closed this one by accident, there hasn't been a backport to 3.4 yet. I'll prepare a patch.

          cc: Michi Mutsuzaki

          Show
          rgs Raul Gutierrez Segales added a comment - Looks like we closed this one by accident, there hasn't been a backport to 3.4 yet. I'll prepare a patch. cc: Michi Mutsuzaki
          Hide
          rgs Raul Gutierrez Segales added a comment -

          I'll go ahead and close this again Michi Mutsuzaki.

          Show
          rgs Raul Gutierrez Segales added a comment - I'll go ahead and close this again Michi Mutsuzaki .
          Hide
          rgs Raul Gutierrez Segales added a comment -

          So I created a build with ZOOKEEPER-1506 removed and I still get the problem.

          It's probably due the getHostName() calls that you pointed out. These calls can actually generate reverse lookups according to:

          http://download.java.net/jdk7/archive/b123/docs/api/java/net/InetSocketAddress.html#getHostName%28%29

          However, these calls have been introduced by ZOOKEEPER-107 (according to git-blame). I think we should avoid them, though lets do that in another ticket.

          In conclusion, if you have a bad resolver or bogus reverse lookups (as is the case in my test scenario): you'll have issues because of these calls.

          Show
          rgs Raul Gutierrez Segales added a comment - So I created a build with ZOOKEEPER-1506 removed and I still get the problem. It's probably due the getHostName() calls that you pointed out. These calls can actually generate reverse lookups according to: http://download.java.net/jdk7/archive/b123/docs/api/java/net/InetSocketAddress.html#getHostName%28%29 However, these calls have been introduced by ZOOKEEPER-107 (according to git-blame). I think we should avoid them, though lets do that in another ticket. In conclusion, if you have a bad resolver or bogus reverse lookups (as is the case in my test scenario): you'll have issues because of these calls.
          Hide
          michim Michi Mutsuzaki added a comment -

          Thanks Raul for testing this. I'd try replacing calls to getHostName to getHostString. For example, I found another one in QuorumCnxManager.java:

          org/apache/zookeeper/server/quorum/QuorumCnxManager.java: String addr = self.getElectionAddress().getHostName() + ":" + self.getElectionAddress().getPort();

          Show
          michim Michi Mutsuzaki added a comment - Thanks Raul for testing this. I'd try replacing calls to getHostName to getHostString. For example, I found another one in QuorumCnxManager.java: org/apache/zookeeper/server/quorum/QuorumCnxManager.java: String addr = self.getElectionAddress().getHostName() + ":" + self.getElectionAddress().getPort();
          Hide
          rgs Raul Gutierrez Segales added a comment -

          It doesn't. I got a consistent repro by first firewalling the participant with id 0, to force that code path.

          I'll try reverting the patch entirely and see if that helps.

          Show
          rgs Raul Gutierrez Segales added a comment - It doesn't. I got a consistent repro by first firewalling the participant with id 0, to force that code path. I'll try reverting the patch entirely and see if that helps.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12726674/ZOOKEEPER-1506-fix.patch
          against trunk revision 1672934.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2641//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2641//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2641//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726674/ZOOKEEPER-1506-fix.patch against trunk revision 1672934. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2641//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2641//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2641//console This message is automatically generated.
          Hide
          rgs Raul Gutierrez Segales added a comment -

          actually, maybe it does. not sure my first try was clean. couldn't get a repro after the 2nd try.

          Show
          rgs Raul Gutierrez Segales added a comment - actually, maybe it does. not sure my first try was clean. couldn't get a repro after the 2nd try.
          Hide
          rgs Raul Gutierrez Segales added a comment -

          hmmm, it does not help Michi Mutsuzaki

          Show
          rgs Raul Gutierrez Segales added a comment - hmmm, it does not help Michi Mutsuzaki
          Hide
          rgs Raul Gutierrez Segales added a comment -

          testing right now

          Show
          rgs Raul Gutierrez Segales added a comment - testing right now
          Hide
          michim Michi Mutsuzaki added a comment -

          Raul, could you try ZOOKEEPER-1506-fix.patch and see if it fixes the problem? Thanks!

          Show
          michim Michi Mutsuzaki added a comment - Raul, could you try ZOOKEEPER-1506 -fix.patch and see if it fixes the problem? Thanks!
          Hide
          michim Michi Mutsuzaki added a comment -

          The patch uses HostNameUtils.getHostString(), which supposedly avoid reverse lookup. Maybe there is a bug in HostNameUtils.getHostString()?

          We can replace HostNameUtils with InetSocketAddress.getHostString since we now require java7.

          Show
          michim Michi Mutsuzaki added a comment - The patch uses HostNameUtils.getHostString(), which supposedly avoid reverse lookup. Maybe there is a bug in HostNameUtils.getHostString()? We can replace HostNameUtils with InetSocketAddress.getHostString since we now require java7.
          Hide
          rgs Raul Gutierrez Segales added a comment -

          I am running elections (in a 5 participants + 1 observer cluster) as part of validating the 3.5.1 alpha rc proposed by Michi Mutsuzaki. I am getting this from time to time:

          https://gist.github.com/rgs1/d11822799fdbbfa5d5f2

          I only have IP addresses in zoo.cfg and this patch seems to be triggering a reverse lookup (IP-> hostname). Given that in my current setup (a test setup, with systemd-nspawn containers) hostnames don't necessarily resolve back (i.e.: hostname -> IP doesn't work), participants might end up unable to connect to the leader if it's initially unavailable.

          Is the reverse lookup (IP -> hostname) something expected with this patch or a side effect? I don't see why we'd ever want/need that reverse lookup given that it could be problematic in some setups.

          Thoughts?

          p.s.: will post my entire, reproducible, setup a bit later.

          Show
          rgs Raul Gutierrez Segales added a comment - I am running elections (in a 5 participants + 1 observer cluster) as part of validating the 3.5.1 alpha rc proposed by Michi Mutsuzaki . I am getting this from time to time: https://gist.github.com/rgs1/d11822799fdbbfa5d5f2 I only have IP addresses in zoo.cfg and this patch seems to be triggering a reverse lookup (IP-> hostname). Given that in my current setup (a test setup, with systemd-nspawn containers) hostnames don't necessarily resolve back (i.e.: hostname -> IP doesn't work), participants might end up unable to connect to the leader if it's initially unavailable. Is the reverse lookup (IP -> hostname) something expected with this patch or a side effect? I don't see why we'd ever want/need that reverse lookup given that it could be problematic in some setups. Thoughts? p.s.: will post my entire, reproducible, setup a bit later.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in ZooKeeper-trunk #2656 (See https://builds.apache.org/job/ZooKeeper-trunk/2656/)
          ZOOKEEPER-1506 Re-try DNS hostname -> IP resolution if node connection fails(Michi Mutsuzaki via rakeshr) (rakeshr: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1672436)

          • /zookeeper/trunk/CHANGES.txt
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in ZooKeeper-trunk #2656 (See https://builds.apache.org/job/ZooKeeper-trunk/2656/ ) ZOOKEEPER-1506 Re-try DNS hostname -> IP resolution if node connection fails(Michi Mutsuzaki via rakeshr) (rakeshr: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1672436 ) /zookeeper/trunk/CHANGES.txt /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java
          Hide
          rakeshr Rakesh R added a comment -

          Committed to trunk : http://svn.apache.org/viewvc?view=revision&revision=1672436
          Committed to br3.5 : http://svn.apache.org/viewvc?view=revision&revision=1672438

          Since the patch has conflicts in 3.4 branch, am not resolving this issue now. Michi Mutsuzaki will you generate a 3.4 patch?

          Show
          rakeshr Rakesh R added a comment - Committed to trunk : http://svn.apache.org/viewvc?view=revision&revision=1672436 Committed to br3.5 : http://svn.apache.org/viewvc?view=revision&revision=1672438 Since the patch has conflicts in 3.4 branch, am not resolving this issue now. Michi Mutsuzaki will you generate a 3.4 patch?
          Hide
          rakeshr Rakesh R added a comment -

          I'll commit this shortly.

          Thanks Flavio Junqueira, Camille Fournier, all others for the reviews and Michi Mutsuzaki, Michael Lasevich for taking care this issue.

          Show
          rakeshr Rakesh R added a comment - I'll commit this shortly. Thanks Flavio Junqueira , Camille Fournier , all others for the reviews and Michi Mutsuzaki , Michael Lasevich for taking care this issue.
          Hide
          fournc Camille Fournier added a comment -

          +1

          Show
          fournc Camille Fournier added a comment - +1
          Hide
          fpj Flavio Junqueira added a comment -

          +1, lgtm. Camille Fournier, is it a go?

          Show
          fpj Flavio Junqueira added a comment - +1, lgtm. Camille Fournier , is it a go?
          Hide
          rakeshr Rakesh R added a comment -

          Thanks Michi Mutsuzaki for the fix. +1 latest patch looks good to me.

          Show
          rakeshr Rakesh R added a comment - Thanks Michi Mutsuzaki for the fix. +1 latest patch looks good to me.
          Hide
          michim Michi Mutsuzaki added a comment -

          Camille Fournier Flavio Junqueira could you take a look at the patch? Once this gets checked in, I'll create a release candidate for 3.5.1.

          Show
          michim Michi Mutsuzaki added a comment - Camille Fournier Flavio Junqueira could you take a look at the patch? Once this gets checked in, I'll create a release candidate for 3.5.1.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12707249/ZOOKEEPER-1506.patch
          against trunk revision 1669062.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2589//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2589//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2589//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12707249/ZOOKEEPER-1506.patch against trunk revision 1669062. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2589//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2589//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2589//console This message is automatically generated.
          Hide
          michim Michi Mutsuzaki added a comment -

          Yes that makes sense. I updated the patch to leave the retry logic as is. Please take a look when you get a chance. Thanks!

          Show
          michim Michi Mutsuzaki added a comment - Yes that makes sense. I updated the patch to leave the retry logic as is. Please take a look when you get a chance. Thanks!
          Hide
          fpj Flavio Junqueira added a comment -

          The change looks good, thanks. About the retry logic, the change seems gratuitous. I don´t see a big issue with removing it, but it sounds best to leave the retry logic as is because it is unrelated to the fix here. If needed, propose it in a new jira. Does it make sense?

          Show
          fpj Flavio Junqueira added a comment - The change looks good, thanks. About the retry logic, the change seems gratuitous. I don´t see a big issue with removing it, but it sounds best to leave the retry logic as is because it is unrelated to the fix here. If needed, propose it in a new jira. Does it make sense?
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12707161/ZOOKEEPER-1506.patch
          against trunk revision 1669060.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2588//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2588//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2588//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12707161/ZOOKEEPER-1506.patch against trunk revision 1669060. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2588//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2588//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2588//console This message is automatically generated.
          Hide
          michim Michi Mutsuzaki added a comment -

          Thanks Flavio Junqueira, yeah I think you are right. I'll change QuorumPeer.connectNewPeers to call connectOne(long sid) and make connectOne(long sid, InetSocketAddress electionAddr) private so that everybody goes through connectOne(long sid). The only other place we use connectOne(long sid, InetSocketAddress electionAddr) is QuorumCnxManager.receiveConnection() but this one is ok since it creates a new InetSocketAddress explicitly.

          Show
          michim Michi Mutsuzaki added a comment - Thanks Flavio Junqueira , yeah I think you are right. I'll change QuorumPeer.connectNewPeers to call connectOne(long sid) and make connectOne(long sid, InetSocketAddress electionAddr) private so that everybody goes through connectOne(long sid). The only other place we use connectOne(long sid, InetSocketAddress electionAddr) is QuorumCnxManager.receiveConnection() but this one is ok since it creates a new InetSocketAddress explicitly.
          Hide
          fpj Flavio Junqueira added a comment -

          One clarification here. Don't we have to guarantee that all code paths to connectOne (both signatures) recreate addresses? It looks like it isn't the case with this patch. For example, QuorumPeer.connectNewPeers invokes the version of connectOne that doesn't recreate.

          Show
          fpj Flavio Junqueira added a comment - One clarification here. Don't we have to guarantee that all code paths to connectOne (both signatures) recreate addresses? It looks like it isn't the case with this patch. For example, QuorumPeer.connectNewPeers invokes the version of connectOne that doesn't recreate.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12704647/ZOOKEEPER-1506.patch
          against trunk revision 1666768.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2570//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2570//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2570//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704647/ZOOKEEPER-1506.patch against trunk revision 1666768. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2570//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2570//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2570//console This message is automatically generated.
          Hide
          michim Michi Mutsuzaki added a comment -

          The retry was added a while back in ZOOKEEPER-512. Flavio Junqueira let me know if I should leave the retry logic as is. Thanks!

          Show
          michim Michi Mutsuzaki added a comment - The retry was added a while back in ZOOKEEPER-512 . Flavio Junqueira let me know if I should leave the retry logic as is. Thanks!
          Hide
          fournc Camille Fournier added a comment -

          The only concern I have is removing the retries. I think it's probably right to do but does it imply documentation changes anywhere that we need to make? Michi Mutsuzaki?

          Show
          fournc Camille Fournier added a comment - The only concern I have is removing the retries. I think it's probably right to do but does it imply documentation changes anywhere that we need to make? Michi Mutsuzaki ?
          Hide
          michim Michi Mutsuzaki added a comment -

          Camille FournierRakesh R could you take a look at the patch? I'd like to get this in for 3.5.1. Thanks!

          Show
          michim Michi Mutsuzaki added a comment - Camille Fournier Rakesh R could you take a look at the patch? I'd like to get this in for 3.5.1. Thanks!
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12675762/ZOOKEEPER-1506.patch
          against trunk revision 1665315.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2559//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2559//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2559//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12675762/ZOOKEEPER-1506.patch against trunk revision 1665315. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2559//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2559//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2559//console This message is automatically generated.
          Hide
          rakeshr Rakesh R added a comment -

          Perhaps we just have to agree to not having a test case for this

          I also failed to find a better approach to write unit test for this. I agree to not having a test case.

          Show
          rakeshr Rakesh R added a comment - Perhaps we just have to agree to not having a test case for this I also failed to find a better approach to write unit test for this. I agree to not having a test case.
          Hide
          ksaritek koray sariteke added a comment -

          We also have that problem, when will it be submitted to release?

          Show
          ksaritek koray sariteke added a comment - We also have that problem, when will it be submitted to release?
          Hide
          fpj Flavio Junqueira added a comment -

          Modifying /etc/hosts from a unit test doesn't sound like an appropriate thing to do, so I'm still not sure how to write a test for this. Perhaps we just have to agree to not having a test case for this.

          Show
          fpj Flavio Junqueira added a comment - Modifying /etc/hosts from a unit test doesn't sound like an appropriate thing to do, so I'm still not sure how to write a test for this. Perhaps we just have to agree to not having a test case for this.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12675762/ZOOKEEPER-1506.patch
          against trunk revision 1632209.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2395//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2395//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2395//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12675762/ZOOKEEPER-1506.patch against trunk revision 1632209. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2395//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2395//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2395//console This message is automatically generated.
          Hide
          michim Michi Mutsuzaki added a comment -

          I manually tested the patch by modifying /etc/hosts and verified that quorum connection manager picks up new ip addresses. As a part of this change, I removed the retry limit in the qcm listener. Otherwise the listener dies after 3 retries if it fails to resolve the hostname.

          Any volunteer for adding a unit test? If not, maybe we could check this in after manually verifying the fix.

          Show
          michim Michi Mutsuzaki added a comment - I manually tested the patch by modifying /etc/hosts and verified that quorum connection manager picks up new ip addresses. As a part of this change, I removed the retry limit in the qcm listener. Otherwise the listener dies after 3 retries if it fails to resolve the hostname. Any volunteer for adding a unit test? If not, maybe we could check this in after manually verifying the fix.
          Hide
          ramyan Ramya Bharathi Nimmagadda added a comment -

          Like the idea. There is a similar file in Windows at "%SystemRoot%\system32\drivers\etc\hosts" that could be updated.

          Here's the format :
          #The IP address should be placed in the first column followed by the corresponding host name.
          #The IP address and the host name should be separated by at least one space.

          Show
          ramyan Ramya Bharathi Nimmagadda added a comment - Like the idea. There is a similar file in Windows at "%SystemRoot%\system32\drivers\etc\hosts" that could be updated. Here's the format : #The IP address should be placed in the first column followed by the corresponding host name. #The IP address and the host name should be separated by at least one space.
          Hide
          michim Michi Mutsuzaki added a comment -

          We can modify /etc/hosts (or a local hosts file specified in HOSTALIASES environment variable) to simulate ip address change. This wont' work on windows though.

          Show
          michim Michi Mutsuzaki added a comment - We can modify /etc/hosts (or a local hosts file specified in HOSTALIASES environment variable) to simulate ip address change. This wont' work on windows though.
          Hide
          fpj Flavio Junqueira added a comment -

          I'm not sure I have any great suggestion for how to do a test case for this fix. Any suggestion?

          Show
          fpj Flavio Junqueira added a comment - I'm not sure I have any great suggestion for how to do a test case for this fix. Any suggestion?
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12648826/ZOOKEEPER-1506.patch
          against trunk revision 1632209.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2394//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2394//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2394//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12648826/ZOOKEEPER-1506.patch against trunk revision 1632209. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2394//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2394//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2394//console This message is automatically generated.
          Hide
          ramyan Ramya Bharathi Nimmagadda added a comment -

          Thanks Flavio.
          Michi Mutsuzaki Would you be able to submit a new patch with unit tests included? Thanks

          Show
          ramyan Ramya Bharathi Nimmagadda added a comment - Thanks Flavio. Michi Mutsuzaki Would you be able to submit a new patch with unit tests included? Thanks
          Hide
          fpj Flavio Junqueira added a comment -

          Let me clarify my previous point. The description of this jira talks about bringing up a new instance to upgrade/replace a ZK node. It it starting a ZK node from scratch, essentially dropping the state of the previous incarnation of the node, that I was trying to point out that could be a problem. Resolving names again as this patch proposes is fine. We should get this one in.

          Show
          fpj Flavio Junqueira added a comment - Let me clarify my previous point. The description of this jira talks about bringing up a new instance to upgrade/replace a ZK node. It it starting a ZK node from scratch, essentially dropping the state of the previous incarnation of the node, that I was trying to point out that could be a problem. Resolving names again as this patch proposes is fine. We should get this one in.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12648826/ZOOKEEPER-1506.patch
          against trunk revision 1623916.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2337//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2337//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2337//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12648826/ZOOKEEPER-1506.patch against trunk revision 1623916. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2337//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2337//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2337//console This message is automatically generated.
          Hide
          fpj Flavio Junqueira added a comment -

          I think there is a deeper problem here that I'm worried about. If you point a name to a different server starting from scratch, then it is like the server state has been wiped out from the perspective of the ZK replication, which can lead to state loss in some corner cases. One way around this is to use reconfiguration: remove and re-introduce the server. Is something like this doable for you?

          Show
          fpj Flavio Junqueira added a comment - I think there is a deeper problem here that I'm worried about. If you point a name to a different server starting from scratch, then it is like the server state has been wiped out from the perspective of the ZK replication, which can lead to state loss in some corner cases. One way around this is to use reconfiguration: remove and re-introduce the server. Is something like this doable for you?
          Hide
          Jbuss Joshua Buss added a comment -

          This is very, very annoying. Please get the fix into master as soon as possible. Thanks from another heavy user of cloud services (this affects any environment where you use the same hostnames for configuration purposes but dynamically re-build instances which get new IPs).

          Show
          Jbuss Joshua Buss added a comment - This is very, very annoying. Please get the fix into master as soon as possible. Thanks from another heavy user of cloud services (this affects any environment where you use the same hostnames for configuration purposes but dynamically re-build instances which get new IPs).
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12648826/ZOOKEEPER-1506.patch
          against trunk revision 1600481.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2125//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2125//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2125//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12648826/ZOOKEEPER-1506.patch against trunk revision 1600481. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2125//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2125//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2125//console This message is automatically generated.
          Hide
          michim Michi Mutsuzaki added a comment -

          Ok thanks Mufeed. I'll update the patch to handle the case you reported.

          Show
          michim Michi Mutsuzaki added a comment - Ok thanks Mufeed. I'll update the patch to handle the case you reported.
          Hide
          mufeed.usman MUFEED USMAN added a comment -

          Nope. I do not have any patch installed. Happened to hit this JIRA during my search to understand the working relationship between ZooKeeper and DNS when the service was disrupted and did not have an automatic recovery.

          Show
          mufeed.usman MUFEED USMAN added a comment - Nope. I do not have any patch installed. Happened to hit this JIRA during my search to understand the working relationship between ZooKeeper and DNS when the service was disrupted and did not have an automatic recovery.
          Hide
          michim Michi Mutsuzaki added a comment -

          Hi Mufeed,

          Good point, I can update this.addr and this.electionAddr only if the InetSocketAddress constructor actually resolved the hostname.

          By the way, are you running ZooKeeper with this patch applied? Does the patch fix the original problem?

          Show
          michim Michi Mutsuzaki added a comment - Hi Mufeed, Good point, I can update this.addr and this.electionAddr only if the InetSocketAddress constructor actually resolved the hostname. By the way, are you running ZooKeeper with this patch applied? Does the patch fix the original problem?
          Hide
          mufeed.usman MUFEED USMAN added a comment -

          In the case where no IPs change, but a mere DNS outage occurs (for some unforeseen reason); shouldn't the ZKs be able to able to make use of the cached info and restore connectivity once the DNS is back up?

          In the case I'm handling I faced the following:

          2014-02-06 20:22:54,438 [myid:0] - INFO [WorkerReceiver[myid=0]:FastLeaderElection@542] - Notification: 0 (n.leader), 0x200000171 (n.zxid), 0x1 (n.round), LOOKING (n.state), 0 (n.sid), 0x2 (
          n.peerEPoch), LOOKING (my state)
          2014-02-06 20:22:54,451 [myid:0] - WARN [WorkerSender[myid=0]:QuorumCnxManager@368] - Cannot open channel to 1 at election address node02:3888
          java.net.UnknownHostException: node02

          And only ZK restarts restored services.

          Show
          mufeed.usman MUFEED USMAN added a comment - In the case where no IPs change, but a mere DNS outage occurs (for some unforeseen reason); shouldn't the ZKs be able to able to make use of the cached info and restore connectivity once the DNS is back up? In the case I'm handling I faced the following: 2014-02-06 20:22:54,438 [myid:0] - INFO [WorkerReceiver [myid=0] :FastLeaderElection@542] - Notification: 0 (n.leader), 0x200000171 (n.zxid), 0x1 (n.round), LOOKING (n.state), 0 (n.sid), 0x2 ( n.peerEPoch), LOOKING (my state) 2014-02-06 20:22:54,451 [myid:0] - WARN [WorkerSender [myid=0] :QuorumCnxManager@368] - Cannot open channel to 1 at election address node02:3888 java.net.UnknownHostException: node02 And only ZK restarts restored services.
          Hide
          fpj Flavio Junqueira added a comment -

          Diwaker Gupta, is the patch here good for you, have you had a chance to review it?

          Show
          fpj Flavio Junqueira added a comment - Diwaker Gupta , is the patch here good for you, have you had a chance to review it?
          Hide
          diwaker Diwaker Gupta added a comment -

          This is a severe issue in any environment that uses DNSMasq or relies on DHCP for DNS mappings. Would really like to see this fixed for 3.5.0, or better yet, the next bugfix release in 3.4.x if there's going to be one.

          Show
          diwaker Diwaker Gupta added a comment - This is a severe issue in any environment that uses DNSMasq or relies on DHCP for DNS mappings. Would really like to see this fixed for 3.5.0, or better yet, the next bugfix release in 3.4.x if there's going to be one.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12642801/ZOOKEEPER-1506.patch
          against trunk revision 1591175.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2077//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2077//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2077//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12642801/ZOOKEEPER-1506.patch against trunk revision 1591175. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2077//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2077//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2077//console This message is automatically generated.
          Hide
          michim Michi Mutsuzaki added a comment -

          I've been running FollowerResyncConcurrencyTest for a while, but I can't reproduce the failure. Maybe it's a transient failure.

          Show
          michim Michi Mutsuzaki added a comment - I've been running FollowerResyncConcurrencyTest for a while, but I can't reproduce the failure. Maybe it's a transient failure.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12642801/ZOOKEEPER-1506.patch
          against trunk revision 1591175.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2076//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2076//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2076//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12642801/ZOOKEEPER-1506.patch against trunk revision 1591175. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2076//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2076//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2076//console This message is automatically generated.
          Hide
          michim Michi Mutsuzaki added a comment -

          Thanks Michael, it's good to know that you've been using this patch without an issue.

          Show
          michim Michi Mutsuzaki added a comment - Thanks Michael, it's good to know that you've been using this patch without an issue.
          Hide
          mlasevich Michael Lasevich added a comment -

          Michi Mutsuzaki Thanks for taking this over, I clearly lost track of it. For what its worth, we have been running this patch in production for almost a year with no issues but it would be nice to have it properly merged. Thank you.

          Show
          mlasevich Michael Lasevich added a comment - Michi Mutsuzaki Thanks for taking this over, I clearly lost track of it. For what its worth, we have been running this patch in production for almost a year with no issues but it would be nice to have it properly merged. Thank you.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12642801/ZOOKEEPER-1506.patch
          against trunk revision 1591175.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2075//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2075//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2075//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12642801/ZOOKEEPER-1506.patch against trunk revision 1591175. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2075//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2075//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2075//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12642801/ZOOKEEPER-1506.patch
          against trunk revision 1591175.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2074//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2074//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2074//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12642801/ZOOKEEPER-1506.patch against trunk revision 1591175. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2074//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2074//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2074//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12642754/ZOOKEEPER-1506.patch
          against trunk revision 1591175.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2072//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2072//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2072//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12642754/ZOOKEEPER-1506.patch against trunk revision 1591175. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2072//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2072//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2072//console This message is automatically generated.
          Hide
          michim Michi Mutsuzaki added a comment -

          Rebased and cleaned up the patch. No test yet.

          Show
          michim Michi Mutsuzaki added a comment - Rebased and cleaned up the patch. No test yet.
          Hide
          michim Michi Mutsuzaki added a comment -

          I'd like to get this fixed in 3.5. I'll rebase the patch and address Flavio's comments. I'm still not sure how we should test this though.

          Show
          michim Michi Mutsuzaki added a comment - I'd like to get this fixed in 3.5. I'll rebase the patch and address Flavio's comments. I'm still not sure how we should test this though.
          Hide
          robert.kamphuis Robert Kamphuis added a comment -

          As this seems not to progress much, some tips for people working in AWS or similar enough environments. This is likely not the only, nor the best, but it is working for me.

          • configure the zookeeper ensemble servers to connect to the elastic-IP-address in stead of the hostname
          • on a serious failure of one of the servers, boot a replacement, and re-assign the corresponding elastic-ip to that server.
          • others will reconnect correctly
          • you will need to setup the Security group to explicitly enable the interconnect to 2888/3888(/2181) or your ports of choice for the elasticIPs to enable the connections to work.
          • downsides:
            1. traffic between zookeeper servers goes via whatever boxes doing the elastic-ip to server mapping - bigger latency. My measurements as an example: ping using private IPs vs elastic- IPs: 0.8 ms vs 1.4 msec (500 byte packets - servers in two different AZs in US-east)
            2. you will need to pay for this traffic whereas when using the names which are mapped to the internal IPs you would not.

          Also: for the clients, I am using as connect string static DNS records with names like: zookeeperN.<domain> pointing to the ec2-A-B-C-D.compute-1.amazonaws.com - thus pointing to the elastic-ip's name and not the IPs. These are mapped by EC2 to the active private IPs after assigning the elastic-ip to an instance. The clients will be recognised properly as from the correct security group(s). No need to add all the client IPs - of which I have many, and changing set; just add the clients security groups access to the the zookeeper security group.

          BTW: if someone knows of good resources running zookeeper and curator-based clients in AWS I would kindly like to know where...

          Show
          robert.kamphuis Robert Kamphuis added a comment - As this seems not to progress much, some tips for people working in AWS or similar enough environments. This is likely not the only, nor the best, but it is working for me. configure the zookeeper ensemble servers to connect to the elastic-IP-address in stead of the hostname on a serious failure of one of the servers, boot a replacement, and re-assign the corresponding elastic-ip to that server. others will reconnect correctly you will need to setup the Security group to explicitly enable the interconnect to 2888/3888(/2181) or your ports of choice for the elasticIPs to enable the connections to work. downsides: traffic between zookeeper servers goes via whatever boxes doing the elastic-ip to server mapping - bigger latency. My measurements as an example: ping using private IPs vs elastic- IPs: 0.8 ms vs 1.4 msec (500 byte packets - servers in two different AZs in US-east) you will need to pay for this traffic whereas when using the names which are mapped to the internal IPs you would not. Also: for the clients, I am using as connect string static DNS records with names like: zookeeperN.<domain> pointing to the ec2-A-B-C-D.compute-1.amazonaws.com - thus pointing to the elastic-ip's name and not the IPs. These are mapped by EC2 to the active private IPs after assigning the elastic-ip to an instance. The clients will be recognised properly as from the correct security group(s). No need to add all the client IPs - of which I have many, and changing set; just add the clients security groups access to the the zookeeper security group. BTW: if someone knows of good resources running zookeeper and curator-based clients in AWS I would kindly like to know where...
          Hide
          fpj Flavio Junqueira added a comment -

          A few comments and concerns with the current patch:

          1. What's the point of making the QuorumServer constructors private?
          2. Please check the spacing, it is not aligned with the other lines.
          3. This patch will recreate the socket addresses every time a connection in QCM fails to get established. Sometimes this happens because the server is not up yet, and in such cases, if the mapping hasn't changed, then the recreation of the socket addresses is necessary. Right now, I don't have any great idea to get around it, so I just wanted to point it out.
          4. I was also thinking about how to test this. It would be good to have a test case.
          Show
          fpj Flavio Junqueira added a comment - A few comments and concerns with the current patch: What's the point of making the QuorumServer constructors private? Please check the spacing, it is not aligned with the other lines. This patch will recreate the socket addresses every time a connection in QCM fails to get established. Sometimes this happens because the server is not up yet, and in such cases, if the mapping hasn't changed, then the recreation of the socket addresses is necessary. Right now, I don't have any great idea to get around it, so I just wanted to point it out. I was also thinking about how to test this. It would be good to have a test case.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12581564/zk-dns-caching-refresh.patch
          against trunk revision 1561672.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1904//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581564/zk-dns-caching-refresh.patch against trunk revision 1561672. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1904//console This message is automatically generated.
          Hide
          charity Charity Majors added a comment -

          This is affecting us too. This is a pretty terrible bug for anyone trying to run zk in the cloud.

          Show
          charity Charity Majors added a comment - This is affecting us too. This is a pretty terrible bug for anyone trying to run zk in the cloud.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12581564/zk-dns-caching-refresh.patch
          against trunk revision 1463329.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1467//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581564/zk-dns-caching-refresh.patch against trunk revision 1463329. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1467//console This message is automatically generated.
          Hide
          mlasevich Michael Lasevich added a comment -

          DNS Caching refresh in case of election connection failure. (apply with patch -p1). Will cause tests to fail, more complete patch to come.

          Show
          mlasevich Michael Lasevich added a comment - DNS Caching refresh in case of election connection failure. (apply with patch -p1). Will cause tests to fail, more complete patch to come.
          Hide
          mlasevich Michael Lasevich added a comment -

          This is a patch for the DNS resolution caching in case other server restarts with a new IP. This is an incomplete patch, since it may cause test cases to fail and only takes care of the elections. More complete patch to come.

          Show
          mlasevich Michael Lasevich added a comment - This is a patch for the DNS resolution caching in case other server restarts with a new IP. This is an incomplete patch, since it may cause test cases to fail and only takes care of the elections. More complete patch to come.
          Hide
          diranged Matt Wise added a comment -

          This is still an ongoing issue. We've tried making changes to the way Java handles DNS caching and we've been unable to solve the issue. This really seems like a simple thing to fix, and its critical when running Zookeeper in Amazons cloud where internal IP addresses change each time you boot up.

          Please... somebody fix this!

          Show
          diranged Matt Wise added a comment - This is still an ongoing issue. We've tried making changes to the way Java handles DNS caching and we've been unable to solve the issue. This really seems like a simple thing to fix, and its critical when running Zookeeper in Amazons cloud where internal IP addresses change each time you boot up. Please... somebody fix this!
          Hide
          lstnr Daniel Heidebrecht added a comment -

          I am also seeing this with a 5 node Zookeeper 3.4.5 cluster running in AWS. All nodes in the cluster must be restarted.

          Show
          lstnr Daniel Heidebrecht added a comment - I am also seeing this with a 5 node Zookeeper 3.4.5 cluster running in AWS. All nodes in the cluster must be restarted.
          Hide
          diranged Matt Wise added a comment -

          We've tried that and had no luck... It seems that the Zookeeper app does a Dns lookup before creating its connection object. Once the connection object is opened, it's never re created or updated in the event of a connection failure.

          This is an ongoing (and ugly) bug that impacts the running of Zookeeper in public clouds.

          Will someone please take a look and see how fixable this is?

          Show
          diranged Matt Wise added a comment - We've tried that and had no luck... It seems that the Zookeeper app does a Dns lookup before creating its connection object. Once the connection object is opened, it's never re created or updated in the event of a connection failure. This is an ongoing (and ugly) bug that impacts the running of Zookeeper in public clouds. Will someone please take a look and see how fixable this is?
          Hide
          thawan Thawan Kooburat added a comment -

          There is also a problem with Java DNS caching. I haven't tried this but you might want to check this out
          http://stackoverflow.com/questions/1256556/any-way-to-make-java-honor-the-dns-caching-timeout-ttl

          Show
          thawan Thawan Kooburat added a comment - There is also a problem with Java DNS caching. I haven't tried this but you might want to check this out http://stackoverflow.com/questions/1256556/any-way-to-make-java-honor-the-dns-caching-timeout-ttl
          Hide
          diranged Matt Wise added a comment -

          This is still an ongoing issue. An AWS failure last night caused us to have to restart our entire Zookeeper cluster one-node-at-a-time because of this bug. Can someone at least set a target date for it?

          Show
          diranged Matt Wise added a comment - This is still an ongoing issue. An AWS failure last night caused us to have to restart our entire Zookeeper cluster one-node-at-a-time because of this bug. Can someone at least set a target date for it?
          Hide
          diranged Matt Wise added a comment -

          This is also impacting our environment. We run our entire infrastructure in the cloud, and use a set of static (but short TTL'd) hostnames to point to our Zookeeper instances. We have the exact same concern ... that right now it requires a restart of the Zookeeper application on each system to trigger a 're-lookup' of the DNS record.

          This seems like it should be customizable with a simple option. If 'dns_lookup_on_connect' is turned on, it should do a new DNS lookup for every connection to a ensemble server. If its disabled, it can lookup the DNS once and only once. Tools like 'stunnel' have this behavior and its great in the cloud.

          Show
          diranged Matt Wise added a comment - This is also impacting our environment. We run our entire infrastructure in the cloud, and use a set of static (but short TTL'd) hostnames to point to our Zookeeper instances. We have the exact same concern ... that right now it requires a restart of the Zookeeper application on each system to trigger a 're-lookup' of the DNS record. This seems like it should be customizable with a simple option. If 'dns_lookup_on_connect' is turned on, it should do a new DNS lookup for every connection to a ensemble server. If its disabled, it can lookup the DNS once and only once. Tools like 'stunnel' have this behavior and its great in the cloud.

            People

            • Assignee:
              rthille Robert P. Thille
              Reporter:
              mheffner Mike Heffner
            • Votes:
              29 Vote for this issue
              Watchers:
              48 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development