Commons JCS
  1. Commons JCS
  2. JCS-40

InetAddress.getLocalHost() ambiguous on Linux systems

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: jcs-1.3
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Environment:
      Linux and other *nix systems

      Description

      Per JDK bug http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4665037

      ...InetAddress.getLocalHost() is ambiguous on Linux systems. JCS uses this method. We have found that the issue breaks JCS networking on Linux/*nix systems (tested on Fedora, Red Hat and CentOS) configured with both static and DHCP-assigned IP addresses. The only workarounds we can find either do not 100% fix the problem, or require non-optimal configuration of the OS loopback connection.

      Background:
      On Windows the address returned by InetAddress.getLocalHost() is fairly consistent; typically the server's LAN address. On Windows systems with multiple network cards (e.g. Internet, WAN, LAN, VPN, multi-homed), the address returned is ambiguous however.

      On Linux, the address returned by InetAddress.getLocalHost() seems to depend on the order in which the OS lists network interfaces, which really should be irrelevant. Furthermore the behaviour can vary between Linux distributions. Linux always exposes the loopback address (127.0.0.1) as a virtual network card, as if it was a physical NIC. On servers using DHCP, the method usually returns the loopback address. On servers configured with static IP addresses, depending on OS ordering, the method sometimes returns the LAN address but sometimes returns the loopback (127.0.0.1) address.

      InetAddress.getLocalHost() makes no attempt to prioritize LAN/non-loopback addresses in its selection.

      Impact on JCS:
      This affects networking in JCS in general. e.g. remote cache and lateral cache. JCS can bind to the loopback interface (127.0.0.1) instead of the LAN address, or the RMI system can advertise the wrong (127.0.0.1) IP address to clients when sending events. We use the JCS remote cache server, and saw both of these issues on Fedora, Red Hat and CentOS machines configured with static and DHCP-assigned addresses.

      We first tried various workarounds:
      -setting system property java.rmi.server.hostname=x.x.x.x
      --> JCS overrides this by supplying an invalid (127.0.0.1) IP address to the RMI subsystem explicitly
      -changing the IP address associated with localhost in /etc/hosts file from 127.0.0.1 to the machines LAN address
      -->reduces the performance of inter-process (loopback) communication on the server

      In the end we modified the JCS source code, and we have been running it flawlessly for the past 8 months. The fix requires JDK 1.4. Can we therefore get it integrated into the next 1.4 release of JCS?

      JCS uses InetAddress.getLocalHost() in the following classes:
      org/apache/jcs/auxiliary/lateral/socket/tcp/discovery/UDPDiscoveryService.java
      org/apache/jcs/auxiliary/remote/server/RemoteCacheStartupServlet.java
      org/apache/jcs/utils/net/HostNameUtil.java

      We updated UDPDiscoveryService and RemoteCacheStartupServlet to not call InetAddress.getLocalHost() directly, but to call the getLocalHostAddress() method in HostNameUtil instead.

      We then changed the implementation of HostNameUtil.getLocalHostAddress() as follows:

      public static String getLocalHostAddress() throws UnknownHostException

      { return getLocalHostLANAddress().getHostAddress(); }

      /**

      • Returns an <code>InetAddress</code> object encapsulating what is most likely the machine's LAN IP address.
      • <p/>
      • This method is intended for use as a replacement of JDK method <code>InetAddress.getLocalHost</code>, because
      • that method is ambiguous on Linux systems. Linux systems enumerate the loopback network interface the same
      • way as regular LAN network interfaces, but the JDK <code>InetAddress.getLocalHost</code> method does not
      • specify the algorithm used to select the address returned under such circumstances, and will often return the
      • loopback address, which is not valid for network communication. Details
      • <a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4665037">here</a>.
      • <p/>
      • This method will scan all IP addresses on all network interfaces on the host machine to determine the IP address
      • most likely to be the machine's LAN address. If the machine has multiple IP addresses, this method will prefer
      • a site-local IP address (e.g. 192.168.x.x or 10.10.x.x, usually IPv4) if the machine has one (and will return the
      • first site-local address if the machine has more than one), but if the machine does not hold a site-local
      • address, this method will return simply the first non-loopback address found (IPv4 or IPv6).
      • <p/>
      • If this method cannot find a non-loopback address using this selection algorithm, it will fall back to
      • calling and returning the result of JDK method <code>InetAddress.getLocalHost</code>.
      • <p/>
        *
      • @throws UnknownHostException If the LAN address of the machine cannot be found.
        */
        private static InetAddress getLocalHostLANAddress() throws UnknownHostException {
        try {
        InetAddress candidateAddress = null;
        // Iterate all NICs (network interface cards)...
        for (Enumeration ifaces = NetworkInterface.getNetworkInterfaces(); ifaces.hasMoreElements() {
        NetworkInterface iface = (NetworkInterface) ifaces.nextElement();
        // Iterate all IP addresses assigned to each card...
        for (Enumeration inetAddrs = iface.getInetAddresses(); inetAddrs.hasMoreElements() {
        InetAddress inetAddr = (InetAddress) inetAddrs.nextElement();
        if (!inetAddr.isLoopbackAddress()) {

      if (inetAddr.isSiteLocalAddress())

      { // Found non-loopback site-local address. Return it immediately... return inetAddr; }

      else if (candidateAddress == null)

      { // Found non-loopback address, but not necessarily site-local. // Store it as a candidate to be returned if site-local address is not subsequently found... candidateAddress = inetAddr; // Note that we don't repeatedly assign non-loopback non-site-local addresses as candidates, // only the first. For subsequent iterations, candidate will be non-null. }

      }
      }
      }
      if (candidateAddress != null)

      { // We did not find a site-local address, but we found some other non-loopback address. // Server might have a non-site-local address assigned to its NIC (or it might be running // IPv6 which deprecates the "site-local" concept). // Return this non-loopback candidate address... return candidateAddress; }

      // At this point, we did not find a non-loopback address.
      // Fall back to returning whatever InetAddress.getLocalHost() returns...
      InetAddress jdkSuppliedAddress = InetAddress.getLocalHost();
      if (jdkSuppliedAddress == null)

      { throw new UnknownHostException("The JDK InetAddress.getLocalHost() method unexpectedly returned null."); }

      return jdkSuppliedAddress;
      }
      catch (Exception e)

      { UnknownHostException unknownHostException = new UnknownHostException("Failed to determine LAN address: " + e); unknownHostException.initCause(e); throw unknownHostException; }

      }

        Issue Links

          Activity

          Hide
          Filip Hanik added a comment -

          java.net.InetAddress.getLocalHost() essentially is very simple,
          I found that it basically gets the 'hostname` command, and then returns the IP that the hostname resolves too.
          All you have to do is make sure that 'hostname' resolves to the IP address you want it to in /etc/hosts

          so the workaround, doesn't have to involve code changes unless you really want to

          Show
          Filip Hanik added a comment - java.net.InetAddress.getLocalHost() essentially is very simple, I found that it basically gets the 'hostname` command, and then returns the IP that the hostname resolves too. All you have to do is make sure that 'hostname' resolves to the IP address you want it to in /etc/hosts so the workaround, doesn't have to involve code changes unless you really want to
          Hide
          Niall Gallagher added a comment -

          I know it's possible to work around the problem without making code changes. We've investigated those options - editing /etc/hosts for example. But we've found it's not quite that simple as the workarounds have side effects and they don't work in all environments.

          We use DHCP in our office/development and our testing environment. It's not possible to hard-code DHCP-assigned IP addresses in the /etc/hosts file. InetAddress.getLocalHost() returns "hostname/127.0.0.1" with DHCP, and there's no OS-level fix for that. Remember that this issue can affect both JCS client-side and server-side code, so a lot of organisations with DHCP development environments will be affected by this.

          In our production environment we assign static IPs using the standard tools (e.g. system-config-network) or the installation wizards. Those tools configure the /etc/hosts file automatically, and they don't put the LAN IP address in there. If you read Sun's evaluation of the bug report they acknowledge that "Some distributions (including RedHat) appear to always configure /etc/hosts to map the hostname to the loopback address (even when the address is static)".

          Logically, there's a reason Red Hat distros might do that: if two processes want to communicate with each other and they happen to be on the same machine, this will transparently route their traffic via the loopback interface instead of out onto the LAN and back in. If we edit /etc/hosts to map the host name to its external IP address, it will route traffic onto the LAN unnecessarily.

          So based on the standard configuration of a Linux server, InetAddress.getLocalHost() will return 127.0.0.1, and it's not due to a misconfiguration at OS level. By editing /etc/hosts it's possible to get InetAddress.getLocalHost() to return whatever IP address you want on Linux, but only by configuring the OS in a non-optimal way.

          Sun's evaluation acknowledges that the method would ideally be changed to return a "sensible" LAN IP address. Eventually they just close the issue and recommend that the JDK 1.4 NetworkInterface API be used instead.

          It's not ideal to have to work around these issues. Sun basically say InetAddress.getLocalHost() is broken. I think the solution I've posted above uses an algorithm similar to what Sun would have used if they had fixed the method. Logically- we want to determine the LAN IP address, and the method above does that, regardless of whether the machine is running in DHCP or static IP environments, and regardless of the OS.

          It took us many man hours to figure out why JCS didn't work out of the box on Linux servers when other Java apps did, and it's because of this issue (using the broken method). I know it's possible to work around the issue in some cases without code changes. As I mentioned we've not had to reconfigure the OS to work around this issue though, because we fixed the issue in the JCS source code for our in-house build. The full fix is above and it's been well tested for 8 months. I'm just posting what we've learnt so that it can be built into the next official release if people want that.

          Show
          Niall Gallagher added a comment - I know it's possible to work around the problem without making code changes. We've investigated those options - editing /etc/hosts for example. But we've found it's not quite that simple as the workarounds have side effects and they don't work in all environments. We use DHCP in our office/development and our testing environment. It's not possible to hard-code DHCP-assigned IP addresses in the /etc/hosts file. InetAddress.getLocalHost() returns "hostname/127.0.0.1" with DHCP, and there's no OS-level fix for that. Remember that this issue can affect both JCS client-side and server-side code, so a lot of organisations with DHCP development environments will be affected by this. In our production environment we assign static IPs using the standard tools (e.g. system-config-network) or the installation wizards. Those tools configure the /etc/hosts file automatically, and they don't put the LAN IP address in there. If you read Sun's evaluation of the bug report they acknowledge that "Some distributions (including RedHat) appear to always configure /etc/hosts to map the hostname to the loopback address (even when the address is static)". Logically, there's a reason Red Hat distros might do that: if two processes want to communicate with each other and they happen to be on the same machine, this will transparently route their traffic via the loopback interface instead of out onto the LAN and back in. If we edit /etc/hosts to map the host name to its external IP address, it will route traffic onto the LAN unnecessarily. So based on the standard configuration of a Linux server, InetAddress.getLocalHost() will return 127.0.0.1, and it's not due to a misconfiguration at OS level. By editing /etc/hosts it's possible to get InetAddress.getLocalHost() to return whatever IP address you want on Linux, but only by configuring the OS in a non-optimal way. Sun's evaluation acknowledges that the method would ideally be changed to return a "sensible" LAN IP address. Eventually they just close the issue and recommend that the JDK 1.4 NetworkInterface API be used instead. It's not ideal to have to work around these issues. Sun basically say InetAddress.getLocalHost() is broken. I think the solution I've posted above uses an algorithm similar to what Sun would have used if they had fixed the method. Logically- we want to determine the LAN IP address, and the method above does that, regardless of whether the machine is running in DHCP or static IP environments, and regardless of the OS. It took us many man hours to figure out why JCS didn't work out of the box on Linux servers when other Java apps did, and it's because of this issue (using the broken method). I know it's possible to work around the issue in some cases without code changes. As I mentioned we've not had to reconfigure the OS to work around this issue though, because we fixed the issue in the JCS source code for our in-house build. The full fix is above and it's been well tested for 8 months. I'm just posting what we've learnt so that it can be built into the next official release if people want that.
          Hide
          Niall Gallagher added a comment -

          Hi Dieter,
          Let's reply via JIRA so the discussion gets logged there.

          In our tests we found that Socket(host, port) binds to the LAN address on eth0 more reliably than InetAddress.getLocalHost() would allow. It appears that Socket doesn't use InetAddress.getLocalHost() to figure out which address it should bind to - as you say it probably delegates to the OS.

          Although Socket binds to the public address correctly, we found the RMI endpoint JCS advertises to clients is 127.0.0.1. This was very hard for to track down. Basically somewhere along the line the the RMI code which sends events to clients sets 127.0.0.1 as the callback address. Presumably it gets this from InetAddress.getLocalHost() somehow. Or it could be that the RMI subsystem relies on InetAddress.getLocalHost() for that. We saw clients getting "connection refused" exceptions trying to connect to themselves on 127.0.0.1 when they received events, but it wasn't caused by client side code it was that RMI callbacks from the server were advertising 127.0.0.1 as the endpoint address, so on receipt of events clients would try to connect to themselves to acknowledge receipt of the event.

          In the end we didn't actually fix the callback address being set wrong in code. We added a system property in the remote server startup script (actually our JBoss startup script):
          -Djava.rmi.server.hostname=`/sbin/ifconfig eth0 | grep 'inet addr:' | cut -d: -f2 | awk '

          { print $1}

          '`

          This parses the output of "ifconfig eth0" to find the IP address assigned to eth0, and sets it as that system property.

          -Djava.rmi.server.hostname is documented here..
          http://java.sun.com/j2se/1.4.2/docs/guide/rmi/javarmiproperties.html

          ..basically this tells the RMI subsystem which endpoint address it should advertise to clients for callbacks - overriding 127.0.0.1 which it uses otherwise.

          We are not using the lateral cache, we are using the remote cache. org.apache.jcs.auxiliary.remote.server.RemoteCacheStartupServlet calls the RemoteCacheServerFactory using InetAddress.getLocalHost():

          registryHost = InetAddress.getLocalHost().getHostAddress();
          RemoteCacheServerFactory.startup( registryHost, registryPort, "/" + DEFAULT_PROPS_FILE_NAME );
          .. and then in RemoteCacheServerFactory...
          Naming.rebind( "//" + host + ":" + port + "/" + serviceName, remoteCacheServer );

          On Fri, 2008-03-14 at 03:27 -0700, Dieter Laufkoetter wrote:
          HostNameUtil.getLocalHostAddress() is only used for debug issues and not to
          > bind at one of multiple network cards.
          > The socket to send data is created in
          > org.apache.jcs.auxiliary.lateral.socket.tcp.utils.SocketOpener by Socket(
          > String host, int port ) and not with Socket( String host, int port,
          > InetAddress localAddr, int localPort ), so the operating system decides the
          > network card.

          Show
          Niall Gallagher added a comment - Hi Dieter, Let's reply via JIRA so the discussion gets logged there. In our tests we found that Socket(host, port) binds to the LAN address on eth0 more reliably than InetAddress.getLocalHost() would allow. It appears that Socket doesn't use InetAddress.getLocalHost() to figure out which address it should bind to - as you say it probably delegates to the OS. Although Socket binds to the public address correctly, we found the RMI endpoint JCS advertises to clients is 127.0.0.1. This was very hard for to track down. Basically somewhere along the line the the RMI code which sends events to clients sets 127.0.0.1 as the callback address. Presumably it gets this from InetAddress.getLocalHost() somehow. Or it could be that the RMI subsystem relies on InetAddress.getLocalHost() for that. We saw clients getting "connection refused" exceptions trying to connect to themselves on 127.0.0.1 when they received events, but it wasn't caused by client side code it was that RMI callbacks from the server were advertising 127.0.0.1 as the endpoint address, so on receipt of events clients would try to connect to themselves to acknowledge receipt of the event. In the end we didn't actually fix the callback address being set wrong in code. We added a system property in the remote server startup script (actually our JBoss startup script): -Djava.rmi.server.hostname=`/sbin/ifconfig eth0 | grep 'inet addr:' | cut -d: -f2 | awk ' { print $1} '` This parses the output of "ifconfig eth0" to find the IP address assigned to eth0, and sets it as that system property. -Djava.rmi.server.hostname is documented here.. http://java.sun.com/j2se/1.4.2/docs/guide/rmi/javarmiproperties.html ..basically this tells the RMI subsystem which endpoint address it should advertise to clients for callbacks - overriding 127.0.0.1 which it uses otherwise. We are not using the lateral cache, we are using the remote cache. org.apache.jcs.auxiliary.remote.server.RemoteCacheStartupServlet calls the RemoteCacheServerFactory using InetAddress.getLocalHost(): registryHost = InetAddress.getLocalHost().getHostAddress(); RemoteCacheServerFactory.startup( registryHost, registryPort, "/" + DEFAULT_PROPS_FILE_NAME ); .. and then in RemoteCacheServerFactory... Naming.rebind( "//" + host + ":" + port + "/" + serviceName, remoteCacheServer ); On Fri, 2008-03-14 at 03:27 -0700, Dieter Laufkoetter wrote: HostNameUtil.getLocalHostAddress() is only used for debug issues and not to > bind at one of multiple network cards. > The socket to send data is created in > org.apache.jcs.auxiliary.lateral.socket.tcp.utils.SocketOpener by Socket( > String host, int port ) and not with Socket( String host, int port, > InetAddress localAddr, int localPort ), so the operating system decides the > network card.
          Hide
          Aaron Smuts added a comment -

          I implemented the fix, but I'm not sure how to unit test it properly.

          Show
          Aaron Smuts added a comment - I implemented the fix, but I'm not sure how to unit test it properly.
          Hide
          Aaron Smuts added a comment -

          This will be in the next temp build, 1.3.2.0-RC

          Show
          Aaron Smuts added a comment - This will be in the next temp build, 1.3.2.0-RC
          Hide
          Niall Gallagher added a comment -

          Thanks Aaron.

          Regarding testing- yes I know testing this is almost impossible, unless there's multiple OS test environments available (unlikely).

          I can tell you that the method works for us with:

          • static IPs on Linux
          • DHCP on Linux
          • DHCP on Windows

          We've not tested it with static IPs on Windows as our servers are Linux, but logically it should work in that environment too.

          Show
          Niall Gallagher added a comment - Thanks Aaron. Regarding testing- yes I know testing this is almost impossible, unless there's multiple OS test environments available (unlikely). I can tell you that the method works for us with: static IPs on Linux DHCP on Linux DHCP on Windows We've not tested it with static IPs on Windows as our servers are Linux, but logically it should work in that environment too.

            People

            • Assignee:
              Aaron Smuts
              Reporter:
              Niall Gallagher
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 1h
                1h
                Remaining:
                Remaining Estimate - 1h
                1h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development