ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-1476

ipv6 reverse dns related timeouts on OSX connecting to localhost

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      We observed a weird, random issue trying to create zookeeper client connections on osx. Sometimes it would work and sometimes it would fail. Also it is randomly very slow. It turns out both issues have the same cause.

      My hosts file on osx (which is an unmodified default one), lists three entries for localhost:

      127.0.0.1 localhost
      ::1 localhost
      fe80::1%lo0 localhost

      We saw zookeeper trying to connect to fe80:0:0:0:0:0:0:1%1 sometimes, which is not listed (actually one in four times, it seems to round robin over the addresses).

      Whenever that happens, it sometimes works and sometimes fails. In both cases it's very slow. Reason: the reverse lookup for fe80:0:0:0:0:0:0:1%1 can't be resolved using the hosts file and it falls back to actually using the dns. Sometimes it actually works but other times it fails/times out after about 5 seconds. Probably a platform specific settings with dns setup hide this problem on linux.

      As a workaround, we preresolve localhost now: Inet4Address.getByName("localhost"). This always resolves to 127.0.0.1 on my machine and works fast.

      This fixes the issue for us. We're not sure where the fe80:0:0:0:0:0:0:1%1 address comes from though. I don't recall having this issue with other server side software so this might be a mix of platform setup, osx specific defaults, and zookeeper behavior.

      I've seen one ticket that relates to ipv6 in zookeeper that might be related: ZOOKEEPER-667. Perhaps the workaround for that ticket introduced this problem?

        Issue Links

          Activity

          Jilles van Gurp created issue -
          Hide
          Marshall McMullen added a comment -

          We've experienced this identical problem where reverse name lookup prevents zookeeper leader election from ever completing successfully. In our case this was failing on Linux with IPv4 not IPv6. As it turns out, there is a lot of code in zookeeper server that calls GetHostName which does a reverse dns lookup. I've patched the code in question to use GetHostString instead which does not do a reverse name lookup. Eventually it does perform a lookup but it uses getByName to do a normal dns lookup if necessary (if it's not an IP address already).

          I'm happy to upload the patch we use, but I can only vouch for it compiling properly on openjdk7. The function I had to use (GetHostString) was wrongly private in openjdk6 and made public in openjdk7. I don't know whether that function is public or private in Sun or IBM or any other flavor of java.

          Show
          Marshall McMullen added a comment - We've experienced this identical problem where reverse name lookup prevents zookeeper leader election from ever completing successfully. In our case this was failing on Linux with IPv4 not IPv6. As it turns out, there is a lot of code in zookeeper server that calls GetHostName which does a reverse dns lookup. I've patched the code in question to use GetHostString instead which does not do a reverse name lookup. Eventually it does perform a lookup but it uses getByName to do a normal dns lookup if necessary (if it's not an IP address already). I'm happy to upload the patch we use, but I can only vouch for it compiling properly on openjdk7. The function I had to use (GetHostString) was wrongly private in openjdk6 and made public in openjdk7. I don't know whether that function is public or private in Sun or IBM or any other flavor of java.
          Hide
          Marshall McMullen added a comment -

          I should have mentioned that in our restricted environment we do not have DNS and cannot have DNS. So we only ever use IP Addresses and never hostnames.

          Show
          Marshall McMullen added a comment - I should have mentioned that in our restricted environment we do not have DNS and cannot have DNS. So we only ever use IP Addresses and never hostnames.
          Hide
          Marshall McMullen added a comment -

          Update... I just compiled our patched version with Sun's java6 and it compiles just fine. It must have been a bug specific to openjdk6. If there's interest in my patch, let me know.

          Show
          Marshall McMullen added a comment - Update... I just compiled our patched version with Sun's java6 and it compiles just fine. It must have been a bug specific to openjdk6. If there's interest in my patch, let me know.
          Hide
          Jilles van Gurp added a comment -

          Here's a small program that replicates the problem. If I execute it, I get the following output:

          /etc/hosts:
          ##

          1. Host Database
            #
          2. localhost is used to configure the loopback interface
          3. when the system is booting. Do not change this entry.
            ##
            127.0.0.1 localhost
            255.255.255.255 broadcasthost
            ::1 localhost
            fe80::1%lo0 localhost

          localhost resolved in 1ms. to localhost
          localhost resolved in 1ms. to localhost
          fe80:0:0:0:0:0:0:1%1 resolved in 5002ms. to fe80:0:0:0:0:0:0:1%1
          java.lang.IllegalStateException: localhost resolves to fe80:0:0:0:0:0:0:1%1 but reverse dns lookup takes too long 5002 resolved back to fe80:0:0:0:0:0:0:1%1
          at com.nokia.search.test.LocalhostLookupTest.main(LocalhostLookupTest.java:28)

          If I preresolve localhost to 127.0.0.1, the exception never gets thrown.

          import java.io.BufferedReader;
          import java.io.FileReader;
          import java.net.InetAddress;

          public class LocalhostLookupTest {
          public static void main(String[] args) {
          try {
          InetAddress[] addresses = InetAddress.getAllByName("localhost");

          BufferedReader br = null;
          try {
          br = new BufferedReader(new FileReader("/etc/hosts"));
          StringBuilder buf = new StringBuilder();
          String line;
          while((line = br.readLine()) != null)

          { buf.append(line + '\n'); }

          String hostsFile = buf.toString();
          System.out.println("/etc/hosts:\n"+hostsFile);
          for (InetAddress inetAddress : addresses) {
          long now = System.currentTimeMillis();
          String hostName = inetAddress.getCanonicalHostName();
          long duration = System.currentTimeMillis() - now;
          System.out.println(hostName + " resolved in " + duration + "ms. to " + hostName);
          if(duration > 50)

          { throw new IllegalStateException("localhost resolves to " + inetAddress.getHostAddress() + " but reverse dns lookup takes too long " + duration + " resolved back to " + hostName); }

          }
          } finally

          { br.close(); }

          } catch (Exception e)

          { e.printStackTrace(); }

          finally

          { System.exit(0); }

          }
          }

          Show
          Jilles van Gurp added a comment - Here's a small program that replicates the problem. If I execute it, I get the following output: /etc/hosts: ## Host Database # localhost is used to configure the loopback interface when the system is booting. Do not change this entry. ## 127.0.0.1 localhost 255.255.255.255 broadcasthost ::1 localhost fe80::1%lo0 localhost localhost resolved in 1ms. to localhost localhost resolved in 1ms. to localhost fe80:0:0:0:0:0:0:1%1 resolved in 5002ms. to fe80:0:0:0:0:0:0:1%1 java.lang.IllegalStateException: localhost resolves to fe80:0:0:0:0:0:0:1%1 but reverse dns lookup takes too long 5002 resolved back to fe80:0:0:0:0:0:0:1%1 at com.nokia.search.test.LocalhostLookupTest.main(LocalhostLookupTest.java:28) If I preresolve localhost to 127.0.0.1, the exception never gets thrown. import java.io.BufferedReader; import java.io.FileReader; import java.net.InetAddress; public class LocalhostLookupTest { public static void main(String[] args) { try { InetAddress[] addresses = InetAddress.getAllByName("localhost"); BufferedReader br = null; try { br = new BufferedReader(new FileReader("/etc/hosts")); StringBuilder buf = new StringBuilder(); String line; while((line = br.readLine()) != null) { buf.append(line + '\n'); } String hostsFile = buf.toString(); System.out.println("/etc/hosts:\n"+hostsFile); for (InetAddress inetAddress : addresses) { long now = System.currentTimeMillis(); String hostName = inetAddress.getCanonicalHostName(); long duration = System.currentTimeMillis() - now; System.out.println(hostName + " resolved in " + duration + "ms. to " + hostName); if(duration > 50) { throw new IllegalStateException("localhost resolves to " + inetAddress.getHostAddress() + " but reverse dns lookup takes too long " + duration + " resolved back to " + hostName); } } } finally { br.close(); } } catch (Exception e) { e.printStackTrace(); } finally { System.exit(0); } } }
          Hide
          Jilles van Gurp added a comment -

          To be clear, the problematic zookeeper class is org.apache.zookeeper.client.StaticHostProvider, which does a InetAddress resolvedAddresses[] = InetAddress.getAllByName(address.getHostName());

          Show
          Jilles van Gurp added a comment - To be clear, the problematic zookeeper class is org.apache.zookeeper.client.StaticHostProvider, which does a InetAddress resolvedAddresses[] = InetAddress.getAllByName(address.getHostName());
          Yan Pujante made changes -
          Field Original Value New Value
          Link This issue is related to ZOOKEEPER-1661 [ ZOOKEEPER-1661 ]
          Bill Havanki made changes -
          Link This issue is related to ZOOKEEPER-1954 [ ZOOKEEPER-1954 ]
          Hide
          Flavio Junqueira added a comment -

          hi Jilles van Gurp, it is better if you upload repro code as attachments to the jira rather than posting as a comment, just a hint. given that you spotted the culprit, I was wondering if you want to propose a patch.

          Show
          Flavio Junqueira added a comment - hi Jilles van Gurp , it is better if you upload repro code as attachments to the jira rather than posting as a comment, just a hint. given that you spotted the culprit, I was wondering if you want to propose a patch.

            People

            • Assignee:
              Unassigned
              Reporter:
              Jilles van Gurp
            • Votes:
              2 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:

                Development