Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-34

The elephant should remember names, not numbers.

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      The name node and the data node should not cache the resolution of host names, as doing so prevents the use of DNS CNAMEs for any sort of fail over capability.

        Issue Links

          Activity

          Hide
          Allen Wittenauer added a comment -

          We recently tried a "new kind" of fail over in our environment. Rather than having a static IP for the name node, we attempted to use DNS CNAMEs to move the name node from one node to another. We discovered that the data nodes continually attempted to contact the old machine even though DNS pointed to the new machine.

          Since we configure a host name in hadoop, I would expect that the data nodes would at some point drop their cache of the IP and re-resolve. However, this never happened.

          I'd like to see either an option or just the default to be when a name is given in a configuration file, Hadoop always does a host name resolution on that entry prior to connection. The operating system should be able to handle the job of caching any addresses that need to be cached, either through a mechanism like nscd or through a fully-blooded, local DNS cache.

          Show
          Allen Wittenauer added a comment - We recently tried a "new kind" of fail over in our environment. Rather than having a static IP for the name node, we attempted to use DNS CNAMEs to move the name node from one node to another. We discovered that the data nodes continually attempted to contact the old machine even though DNS pointed to the new machine. Since we configure a host name in hadoop, I would expect that the data nodes would at some point drop their cache of the IP and re-resolve. However, this never happened. I'd like to see either an option or just the default to be when a name is given in a configuration file, Hadoop always does a host name resolution on that entry prior to connection. The operating system should be able to handle the job of caching any addresses that need to be cached, either through a mechanism like nscd or through a fully-blooded, local DNS cache.
          Hide
          steve_l added a comment -

          1. The JVM does lots and lots of DNS caching of its own. To get it to cache positive and negative DNS entries for less time than forever, you've got to start the processes with a DNS TTL property, networkaddress.cache.ttl
          By default, Java5 caches forever: http://java.sun.com/j2se/1.5.0/docs/api/java/net/InetAddress.html
          Java6 is a bit smarter, and only caches forever when a security manager is installed
          http://java.sun.com/javase/6/docs/technotes/guides/net/properties.html

          You'd need to decide what is a good DNS cache TTL for the datanodes and push it out to the scripts. I'm not 100% sure you can set this property inside the JVM and have it taken up; the command line is the conventional way to do it.

          2. The address of the namenode is currently set up in DataNode.startDataNode(). There are a few classes that assume that DataNode.getNameNodeAddr() is never null; they'd need to change their assumptions.

          3. DNS is slow and is one of the main delays in test runs right now.

          What may work is leaving the current init code as it is, but whenever a connection to the namenode fails, the datanode should re-read the namenode address from the configuration and redo the nslookup; the scripts would need patching to set a low TTL on the live systems. Rereading the address from the configuration would be useful if the configuration was coming from something like an LDAP server; you could change the hostname there and have it picked up without DNS caching interfering.

          Show
          steve_l added a comment - 1. The JVM does lots and lots of DNS caching of its own. To get it to cache positive and negative DNS entries for less time than forever, you've got to start the processes with a DNS TTL property, networkaddress.cache.ttl By default, Java5 caches forever: http://java.sun.com/j2se/1.5.0/docs/api/java/net/InetAddress.html Java6 is a bit smarter, and only caches forever when a security manager is installed http://java.sun.com/javase/6/docs/technotes/guides/net/properties.html You'd need to decide what is a good DNS cache TTL for the datanodes and push it out to the scripts. I'm not 100% sure you can set this property inside the JVM and have it taken up; the command line is the conventional way to do it. 2. The address of the namenode is currently set up in DataNode.startDataNode(). There are a few classes that assume that DataNode.getNameNodeAddr() is never null; they'd need to change their assumptions. 3. DNS is slow and is one of the main delays in test runs right now. What may work is leaving the current init code as it is, but whenever a connection to the namenode fails, the datanode should re-read the namenode address from the configuration and redo the nslookup; the scripts would need patching to set a low TTL on the live systems. Rereading the address from the configuration would be useful if the configuration was coming from something like an LDAP server; you could change the hostname there and have it picked up without DNS caching interfering.
          Hide
          Allen Wittenauer added a comment -

          Steve's comment reminded me of an important detail that is bound to come up from our experiment! I failed to mention that we set the TTL for the DNS CNAME to be an extremely low value (5 minutes) in the DNS zone. We knew (and tested to make sure) that on the DNS side, the address change would be handled properly.

          Show
          Allen Wittenauer added a comment - Steve's comment reminded me of an important detail that is bound to come up from our experiment! I failed to mention that we set the TTL for the DNS CNAME to be an extremely low value (5 minutes) in the DNS zone. We knew (and tested to make sure) that on the DNS side, the address change would be handled properly.
          Hide
          steve_l added a comment -

          Did I say you could set this on the command line? I was wrong:
          http://jira.smartfrog.org/jira/browse/SFOS-764

          you edit a properties file in the JVM lib/security directory, or call

          java.security.Security.setProperty("networkaddress.cache.ttl" , "0");

          It would be possible for server-side nodes to set this property when they start up, but the operation should be wrapped with a catch for any security exception, so running hadoop under a security manager isn't fatal.

          -this is separate to where the hostnames should be resolved, which needs to be moved into every services offerService loop.

          Alan - I believe the Sun JVM DNS cache still ignores the TTL that comes down from above. It's to stop applets and other sandboxed things breaking out of the sandbox and talking to hosts behind the firewall, but interferes with long-lived server-side code.

          Show
          steve_l added a comment - Did I say you could set this on the command line? I was wrong: http://jira.smartfrog.org/jira/browse/SFOS-764 you edit a properties file in the JVM lib/security directory, or call java.security.Security.setProperty("networkaddress.cache.ttl" , "0"); It would be possible for server-side nodes to set this property when they start up, but the operation should be wrapped with a catch for any security exception, so running hadoop under a security manager isn't fatal. -this is separate to where the hostnames should be resolved, which needs to be moved into every services offerService loop. Alan - I believe the Sun JVM DNS cache still ignores the TTL that comes down from above. It's to stop applets and other sandboxed things breaking out of the sandbox and talking to hosts behind the firewall, but interferes with long-lived server-side code.
          Hide
          Yossi Ittach added a comment -

          We are trying to do something similar - but instead of changing the DNS scheme , we're using floating IP - So it should override all the caching. I'll update when I have more results.

          Show
          Yossi Ittach added a comment - We are trying to do something similar - but instead of changing the DNS scheme , we're using floating IP - So it should override all the caching. I'll update when I have more results.
          Hide
          Raghu Angadi added a comment -

          Thanks Steve. Based on the above comments, we need to do couple of things :

          • If a configuration variable for ttl is not default, DN (and may be clients) set an explicit ttl (through "networkaddress.cache.ttl").
          • RPC clients re-resolve RPC servers for a new connection, if the last resolution was at least "ttl" ego.

          So that admin could set the ttl to 10 minutes and RPC clients resolve at at most once every 10 minutes.

          Show
          Raghu Angadi added a comment - Thanks Steve. Based on the above comments, we need to do couple of things : If a configuration variable for ttl is not default, DN (and may be clients) set an explicit ttl (through "networkaddress.cache.ttl"). RPC clients re-resolve RPC servers for a new connection, if the last resolution was at least "ttl" ego. So that admin could set the ttl to 10 minutes and RPC clients resolve at at most once every 10 minutes.
          Hide
          Raghu Angadi added a comment -

          Alan - I believe the Sun JVM DNS cache still ignores the TTL that comes down from above. It's to stop applets and other sandboxed things breaking out of the sandbox and talking to hosts behind the firewall, but interferes with long-lived server-side code.

          oops! fix for Sun JVM is a must.

          Show
          Raghu Angadi added a comment - Alan - I believe the Sun JVM DNS cache still ignores the TTL that comes down from above. It's to stop applets and other sandboxed things breaking out of the sandbox and talking to hosts behind the firewall, but interferes with long-lived server-side code. oops! fix for Sun JVM is a must.
          Hide
          steve_l added a comment -

          If you start the JVM with a TTL property on the command line, it gets picked up. If hadoop does its own TTL code underneath that, then you probably get the minimum TTL of hadoop-site.xml and the JVM. I think.

          this is going to be painful to test.

          Show
          steve_l added a comment - If you start the JVM with a TTL property on the command line, it gets picked up. If hadoop does its own TTL code underneath that, then you probably get the minimum TTL of hadoop-site.xml and the JVM. I think. this is going to be painful to test.
          Hide
          Raghu Angadi added a comment -

          > If you start the JVM with a TTL property on the command line, it gets picked up.

          Do you mean this method would work on Sun JVM? Then it is probably good enough. Documentation for the hadoop config variable would clarify that and admin needs to add a JVM arg if it needs to be effective with Sun JVM.

          Show
          Raghu Angadi added a comment - > If you start the JVM with a TTL property on the command line, it gets picked up. Do you mean this method would work on Sun JVM? Then it is probably good enough. Documentation for the hadoop config variable would clarify that and admin needs to add a JVM arg if it needs to be effective with Sun JVM.
          Hide
          Allen Wittenauer added a comment -

          We're working around this by making sure every hostname and IP address given to clients is movable in some form or another, including IP aliases and BGP route propagation techniques. Closing at won't fix.

          Show
          Allen Wittenauer added a comment - We're working around this by making sure every hostname and IP address given to clients is movable in some form or another, including IP aliases and BGP route propagation techniques. Closing at won't fix.

            People

            • Assignee:
              Unassigned
              Reporter:
              Allen Wittenauer
            • Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development