Hadoop Common
  1. Hadoop Common
  2. HADOOP-3426

Datanode does not start up if the local machines DNS isnt working right and dfs.datanode.dns.interface==default

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      Ubuntu 8.04, at home, no reverse DNS

    • Hadoop Flags:
      Reviewed

      Description

      This is the third Java project I've been involved in that doesnt work on my home network, due to implementation issues with java.net.InetAddress.getLocalHost(), issues that only show up on an unamanged network. Fortunately my home network exists to find these problems early.

      In hadoop, if the local hostname doesnt resolve, the datanode does not start up:

      Caused by: java.net.UnknownHostException: k2: k2
      at java.net.InetAddress.getLocalHost(InetAddress.java:1353)
      at org.apache.hadoop.net.DNS.getDefaultHost(DNS.java:185)
      at org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:184)
      at org.apache.hadoop.dfs.DataNode.(DataNode.java:162)
      at org.apache.hadoop.dfs.ExtDataNode.(ExtDataNode.java:55)
      at org.smartfrog.services.hadoop.components.datanode.DatanodeImpl.sfStart(DatanodeImpl.java:60)

      While this is a valid option in a production (non-virtual) cluster, if you are playing with VMWare/Xen private networks or on a home network, you can't rely on DNS.

      1. In these situations, its usually better to fall back to using "localhost" or 127.0.0.1 as a hostname if Java can't work it out for itself,
      2. Its often good to cache this if used in lots of parts of the system, otherwise the 30s timeouts can cause problems of their own.

      1. hadoop-3426.patch
        14 kB
        Steve Loughran
      2. hadoop-3426.patch
        14 kB
        Steve Loughran
      3. hadoop-3426.patch
        19 kB
        Steve Loughran
      4. hadoop-3426.patch
        19 kB
        Steve Loughran
      5. hadoop-3426.patch
        12 kB
        Steve Loughran

        Issue Links

          Activity

          Hide
          Steve Loughran added a comment -

          Ivy didnt work right on this box either.

          Show
          Steve Loughran added a comment - Ivy didnt work right on this box either.
          Hide
          Steve Loughran added a comment -

          This includes a patch for HADOOP-3613 and lots of tests, including a test that formalises the current behaviour of HADOOP-3612, because that may be the desired behaviour. It does not patch HADOOP-3612.

          Show
          Steve Loughran added a comment - This includes a patch for HADOOP-3613 and lots of tests, including a test that formalises the current behaviour of HADOOP-3612 , because that may be the desired behaviour. It does not patch HADOOP-3612 .
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12384410/dns-fixes.patch
          against trunk revision 669986.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2712/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2712/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2712/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2712/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12384410/dns-fixes.patch against trunk revision 669986. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2712/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2712/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2712/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2712/console This message is automatically generated.
          Hide
          Steve Loughran added a comment -

          lets see why the core tests are failing

          Show
          Steve Loughran added a comment - lets see why the core tests are failing
          Hide
          Steve Loughran added a comment -

          The reverse DNS test is a test whose failures change with the network infrastructure. This could be trouble, as anywhere that expects the DNS.reverseDns() method to work reliably will be in for a disappointment.

          On hudson it is failing with

          javax.naming.NameNotFoundException: DNS name not found [response code 3]; remaining name '106.11.211.140.in-addr.arpa'
          at com.sun.jndi.dns.DnsClient.checkResponseCode(DnsClient.java:596)
          at com.sun.jndi.dns.DnsClient.isMatchResponse(DnsClient.java:553)
          at com.sun.jndi.dns.DnsClient.doUdpQuery(DnsClient.java:399)
          at com.sun.jndi.dns.DnsClient.query(DnsClient.java:186)
          at com.sun.jndi.dns.Resolver.query(Resolver.java:64)
          at com.sun.jndi.dns.DnsContext.c_getAttributes(DnsContext.java:413)
          at com.sun.jndi.toolkit.ctx.ComponentDirContext.p_getAttributes(ComponentDirContext.java:213)
          at com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.getAttributes(PartialCompositeDirContext.java:121)
          at com.sun.jndi.toolkit.url.GenericURLDirContext.getAttributes(GenericURLDirContext.java:85)
          at javax.naming.directory.InitialDirContext.getAttributes(InitialDirContext.java:123)
          at org.apache.hadoop.net.DNS.reverseDns(DNS.java:67)
          at org.apache.hadoop.net.TestDNS.testRDNS(TestDNS.java:91)

          Standard Output

          Localhost IPAddr is localhost/127.0.0.1

          and on the apache servers,
          > nslookup 106.11.211.140.in-addr.arpa
          Server: 140.211.166.130
          Address: 140.211.166.130#53

            • server can't find 106.11.211.140.in-addr.arpa: NXDOMAIN
          Show
          Steve Loughran added a comment - The reverse DNS test is a test whose failures change with the network infrastructure. This could be trouble, as anywhere that expects the DNS.reverseDns() method to work reliably will be in for a disappointment. On hudson it is failing with javax.naming.NameNotFoundException: DNS name not found [response code 3] ; remaining name '106.11.211.140.in-addr.arpa' at com.sun.jndi.dns.DnsClient.checkResponseCode(DnsClient.java:596) at com.sun.jndi.dns.DnsClient.isMatchResponse(DnsClient.java:553) at com.sun.jndi.dns.DnsClient.doUdpQuery(DnsClient.java:399) at com.sun.jndi.dns.DnsClient.query(DnsClient.java:186) at com.sun.jndi.dns.Resolver.query(Resolver.java:64) at com.sun.jndi.dns.DnsContext.c_getAttributes(DnsContext.java:413) at com.sun.jndi.toolkit.ctx.ComponentDirContext.p_getAttributes(ComponentDirContext.java:213) at com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.getAttributes(PartialCompositeDirContext.java:121) at com.sun.jndi.toolkit.url.GenericURLDirContext.getAttributes(GenericURLDirContext.java:85) at javax.naming.directory.InitialDirContext.getAttributes(InitialDirContext.java:123) at org.apache.hadoop.net.DNS.reverseDns(DNS.java:67) at org.apache.hadoop.net.TestDNS.testRDNS(TestDNS.java:91) Standard Output Localhost IPAddr is localhost/127.0.0.1 and on the apache servers, > nslookup 106.11.211.140.in-addr.arpa Server: 140.211.166.130 Address: 140.211.166.130#53 server can't find 106.11.211.140.in-addr.arpa: NXDOMAIN
          Hide
          Steve Loughran added a comment -

          @work, on a well managed machine with a real IP address in the 0x10, subnet (but unreachable from the outside world)
          Testcase: testRDNS took 0.296 sec
          Caused an ERROR
          DNS name not found [response code 3]
          javax.naming.NameNotFoundException: DNS name not found [response code 3]; remaining name '1.1.0.127.in-addr.arpa'
          at com.sun.jndi.dns.DnsClient.checkResponseCode(DnsClient.java:596)
          at com.sun.jndi.dns.DnsClient.isMatchResponse(DnsClient.java:553)
          at com.sun.jndi.dns.DnsClient.doUdpQuery(DnsClient.java:399)
          at com.sun.jndi.dns.DnsClient.query(DnsClient.java:186)
          at com.sun.jndi.dns.Resolver.query(Resolver.java:64)
          at com.sun.jndi.dns.DnsContext.c_getAttributes(DnsContext.java:413)
          at com.sun.jndi.toolkit.ctx.ComponentDirContext.p_getAttributes(ComponentDirContext.java:213)
          at com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.getAttributes(PartialCompositeDirContext.java:121)
          at com.sun.jndi.toolkit.url.GenericURLDirContext.getAttributes(GenericURLDirContext.java:85)
          at javax.naming.directory.InitialDirContext.getAttributes(InitialDirContext.java:123)
          at org.apache.hadoop.net.DNS.reverseDns(DNS.java:67)
          at org.apache.hadoop.net.TestDNS.testRDNS(TestDNS.java:91)

          This box does have a proper IPAddr, as resolvable from people.apache.org

          > nslookup morzine.hpl.hp.com
          Server: 140.211.166.130
          Address: 140.211.166.130#53

          Non-authoritative answer:
          Name: morzine.hpl.hp.com
          Address: 16.25.171.118

          but not reverse resolvable, their or elsewhere
          > nslookup 118.171.25.16.in-addr.arpa
          Server: 140.211.166.131
          Address: 140.211.166.131#53

          Non-authoritative answer:

              • Can't find 118.171.25.16.in-addr.arpa: No answer

          so:
          1. why isnt the full IP address being picked up here? A regression?
          2. any code that relies on rDNS to work reliably is in trouble.

          Show
          Steve Loughran added a comment - @work, on a well managed machine with a real IP address in the 0x10, subnet (but unreachable from the outside world) Testcase: testRDNS took 0.296 sec Caused an ERROR DNS name not found [response code 3] javax.naming.NameNotFoundException: DNS name not found [response code 3] ; remaining name '1.1.0.127.in-addr.arpa' at com.sun.jndi.dns.DnsClient.checkResponseCode(DnsClient.java:596) at com.sun.jndi.dns.DnsClient.isMatchResponse(DnsClient.java:553) at com.sun.jndi.dns.DnsClient.doUdpQuery(DnsClient.java:399) at com.sun.jndi.dns.DnsClient.query(DnsClient.java:186) at com.sun.jndi.dns.Resolver.query(Resolver.java:64) at com.sun.jndi.dns.DnsContext.c_getAttributes(DnsContext.java:413) at com.sun.jndi.toolkit.ctx.ComponentDirContext.p_getAttributes(ComponentDirContext.java:213) at com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.getAttributes(PartialCompositeDirContext.java:121) at com.sun.jndi.toolkit.url.GenericURLDirContext.getAttributes(GenericURLDirContext.java:85) at javax.naming.directory.InitialDirContext.getAttributes(InitialDirContext.java:123) at org.apache.hadoop.net.DNS.reverseDns(DNS.java:67) at org.apache.hadoop.net.TestDNS.testRDNS(TestDNS.java:91) This box does have a proper IPAddr, as resolvable from people.apache.org > nslookup morzine.hpl.hp.com Server: 140.211.166.130 Address: 140.211.166.130#53 Non-authoritative answer: Name: morzine.hpl.hp.com Address: 16.25.171.118 but not reverse resolvable, their or elsewhere > nslookup 118.171.25.16.in-addr.arpa Server: 140.211.166.131 Address: 140.211.166.131#53 Non-authoritative answer: Can't find 118.171.25.16.in-addr.arpa: No answer so: 1. why isnt the full IP address being picked up here? A regression? 2. any code that relies on rDNS to work reliably is in trouble.
          Hide
          Steve Loughran added a comment -

          updated patch for DNS.java. Not intended for submission yet; needs testing @home

          Show
          Steve Loughran added a comment - updated patch for DNS.java. Not intended for submission yet; needs testing @home
          Hide
          Steve Loughran added a comment -

          TestDNS to go with the changed DNSTest; this checks out ipv6 behaviour too,

          Show
          Steve Loughran added a comment - TestDNS to go with the changed DNSTest; this checks out ipv6 behaviour too,
          Hide
          Steve Loughran added a comment -

          Linking to HADOOP-3619; the patches for DNS.java

          1. throwing an IllegalArgumentException in reverseDNS() when an IPV6 address is hit (with the file structured for someone to add ipv6 in cleanly when required)

          2.skipping this exception (logging@debug level) when enumerating all interface hostnames.

          More experimentation/understanding is needed here.

          Show
          Steve Loughran added a comment - Linking to HADOOP-3619 ; the patches for DNS.java 1. throwing an IllegalArgumentException in reverseDNS() when an IPV6 address is hit (with the file structured for someone to add ipv6 in cleanly when required) 2.skipping this exception (logging@debug level) when enumerating all interface hostnames. More experimentation/understanding is needed here.
          Hide
          Steve Loughran added a comment -

          Patch for this and HADOOP-3619

          Show
          Steve Loughran added a comment - Patch for this and HADOOP-3619
          Hide
          Steve Loughran added a comment -

          patch seems stable@work, though with different network configurations out there, its hard to be sure

          Show
          Steve Loughran added a comment - patch seems stable@work, though with different network configurations out there, its hard to be sure
          Hide
          Steve Loughran added a comment -

          with test case included

          Show
          Steve Loughran added a comment - with test case included
          Hide
          Raghu Angadi added a comment -

          Regd cache : Should we enable all the time? When does it expire? When anyone invokes a DNS, he/she should be aware of potential delays. Thats why in many places Hadoop is very careful about DNS.. avoids it in most places at runtime. Though adding cache looks like 'obviously a good thing', I am not sure if it needs to be added to handle not so common error conditions. If it is required because the application is DNS-heavy, then it helps.

          Show
          Raghu Angadi added a comment - Regd cache : Should we enable all the time? When does it expire? When anyone invokes a DNS, he/she should be aware of potential delays. Thats why in many places Hadoop is very careful about DNS.. avoids it in most places at runtime. Though adding cache looks like 'obviously a good thing', I am not sure if it needs to be added to handle not so common error conditions. If it is required because the application is DNS-heavy, then it helps.
          Hide
          Raghu Angadi added a comment -

          > Regd cache : Should we enable all the time?
          Opps! the patch does not add a DNS resolution cache. It just stores the local host name.. in that sense, I think its fine.

          Show
          Raghu Angadi added a comment - > Regd cache : Should we enable all the time? Opps! the patch does not add a DNS resolution cache. It just stores the local host name.. in that sense, I think its fine.
          Hide
          Steve Loughran added a comment -

          As you have noted, this patch only caches the local hostname, which is asked for on occasions, and when it is unknown, things break. Caching this value makes both success and failure faster from the second time onwards. This is effectively the same code in use in other apache projects (e.g. Ivy).

          The only troublespot would be on roaming systems, where IPAddr and hostname would change. However, I don't think Hadoop would cope well in such an environment, as JREs normally assume that the network infrastructure is stable, and don't send notifications to their apps telling them to reset all their cached DNS addresses and close and reopen all sockets to bond to the changed interfaces. Java SE and Java EE isn't Java mobile, sadly.

          Show
          Steve Loughran added a comment - As you have noted, this patch only caches the local hostname, which is asked for on occasions, and when it is unknown, things break. Caching this value makes both success and failure faster from the second time onwards. This is effectively the same code in use in other apache projects (e.g. Ivy). The only troublespot would be on roaming systems, where IPAddr and hostname would change. However, I don't think Hadoop would cope well in such an environment, as JREs normally assume that the network infrastructure is stable, and don't send notifications to their apps telling them to reset all their cached DNS addresses and close and reopen all sockets to bond to the changed interfaces. Java SE and Java EE isn't Java mobile, sadly.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12385638/hadoop-3426.patch
          against trunk revision 676069.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2832/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2832/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2832/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2832/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12385638/hadoop-3426.patch against trunk revision 676069. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2832/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2832/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2832/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2832/console This message is automatically generated.
          Hide
          Owen O'Malley added a comment -

          I'm worried that using localhost will cause problems, because it only works in single node cases.

          Can't we default instead to the unresolved IP address? As long as we make sure it isn't 127.0.0.1, it will be useful, even in multi-node clusters.

          Show
          Owen O'Malley added a comment - I'm worried that using localhost will cause problems, because it only works in single node cases. Can't we default instead to the unresolved IP address? As long as we make sure it isn't 127.0.0.1, it will be useful, even in multi-node clusters.
          Hide
          Steve Loughran added a comment -

          what are the use cases that really matter? and being ruthless, my home pc isnt one

          how about

          • cluster without DNS
          • cluster on xen/vmware without DNS or internet
          • developer laptop on the move running tests/demos

          Next question: which of these matter enough to address?

          Show
          Steve Loughran added a comment - what are the use cases that really matter? and being ruthless, my home pc isnt one how about cluster without DNS cluster on xen/vmware without DNS or internet developer laptop on the move running tests/demos Next question: which of these matter enough to address?
          Hide
          Raghu Angadi added a comment -

          > 1. In these situations, its usually better to fall back to using "localhost" or 127.0.0.1 as a hostname if Java can't work it out for itself,

          Should this be a config option that is disabled by default? What is worse than seeing an error is not seeing an error message. Are there any alternatives (using first available id adress etc)?

          > 2. Its often good to cache this if used in lots of parts of the system, otherwise the 30s timeouts can cause problems of their own.

          This part I understand.. I thought this is all that this jira did.

          I still need to go through the patch but was surprised to see the changes with Ip4, Ip6 etc.

          Show
          Raghu Angadi added a comment - > 1. In these situations, its usually better to fall back to using "localhost" or 127.0.0.1 as a hostname if Java can't work it out for itself, Should this be a config option that is disabled by default? What is worse than seeing an error is not seeing an error message. Are there any alternatives (using first available id adress etc)? > 2. Its often good to cache this if used in lots of parts of the system, otherwise the 30s timeouts can cause problems of their own. This part I understand.. I thought this is all that this jira did. I still need to go through the patch but was surprised to see the changes with Ip4, Ip6 etc.
          Hide
          Raghu Angadi added a comment -

          My opinion:

          > cluster without DNS

          IP address should be used without the hostname

          > cluster on xen/vmware without DNS
          Same as above.

          > ... without internet
          You mean without network set up? If we can detect it, then it should be a warning and we should use the localhost interface if that exists. If not it should be a hard error.

          > developer laptop on the move running tests/demos
          may be need not be handled now.

          Show
          Raghu Angadi added a comment - My opinion: > cluster without DNS IP address should be used without the hostname > cluster on xen/vmware without DNS Same as above. > ... without internet You mean without network set up? If we can detect it, then it should be a warning and we should use the localhost interface if that exists. If not it should be a hard error. > developer laptop on the move running tests/demos may be need not be handled now.
          Hide
          Steve Loughran added a comment -

          > I still need to go through the patch but was surprised to see the changes with Ip4, Ip6 etc.

          The existing code assumed all internet addresses were IPv4. This code recognises IPv6 as different but currently declines to reverse look it up, throwing an exception instead. Someone can add that feature later.

          Show
          Steve Loughran added a comment - > I still need to go through the patch but was surprised to see the changes with Ip4, Ip6 etc. The existing code assumed all internet addresses were IPv4. This code recognises IPv6 as different but currently declines to reverse look it up, throwing an exception instead. Someone can add that feature later.
          Hide
          Steve Loughran added a comment -

          >> cluster without DNS

          >IP address should be used without the hostname

          OK. One thing we can't rely on here is any kind of rDNS to work, hostname lookup may time out, etc.

          >> cluster on xen/vmware without DNS
          >Same as above.

          OK

          >> ... without internet
          >You mean without network set up? If we can detect it, then it should be a warning and we should use the localhost interface if that exists. If not it should be a hard error.

          I was thinking a little private xen/vmware virtual cluster with no external connectivity or DNS. Its a self-contained network.

          >> developer laptop on the move running tests/demos
          >may be need not be handled now.

          Makes sense.

          One thing that is hard when handling different network setups is actually creating the relevant network configurations. Most developers work from well managed networks where things like DNS work properly; its out in the mass-end-user world that networking becomes atrocious. I think for Hadoop, it would be important to list the supported/unsupported network configurations, and only target the supported stuff. The unsupported stuff -if it can be detected and warned about, then all is well, but some requirements "you have a network and it is fairly reliable" may be things that we can just rely on, and opt not to feel guilty if the code doesn't work when that requirement isn't met.

          Show
          Steve Loughran added a comment - >> cluster without DNS >IP address should be used without the hostname OK. One thing we can't rely on here is any kind of rDNS to work, hostname lookup may time out, etc. >> cluster on xen/vmware without DNS >Same as above. OK >> ... without internet >You mean without network set up? If we can detect it, then it should be a warning and we should use the localhost interface if that exists. If not it should be a hard error. I was thinking a little private xen/vmware virtual cluster with no external connectivity or DNS. Its a self-contained network. >> developer laptop on the move running tests/demos >may be need not be handled now. Makes sense. One thing that is hard when handling different network setups is actually creating the relevant network configurations. Most developers work from well managed networks where things like DNS work properly; its out in the mass-end-user world that networking becomes atrocious. I think for Hadoop, it would be important to list the supported/unsupported network configurations, and only target the supported stuff. The unsupported stuff -if it can be detected and warned about, then all is well, but some requirements "you have a network and it is fairly reliable" may be things that we can just rely on, and opt not to feel guilty if the code doesn't work when that requirement isn't met.
          Hide
          Raghu Angadi added a comment -

          > OK. One thing we can't rely on here is any kind of rDNS to work, hostname lookup may time out, etc.

          Right. we need to discourage depending on rDNS unless it is really required. rDNS failure should generally be not a fatal error.

          > I was thinking a little private xen/vmware virtual cluster with no external connectivity or DNS. Its a self-contained network.
          In that case using the first available ip address works, right?

          > One thing that is hard when handling different network setups is actually [...]
          agreed.. we should clarify the requirements, tolerate less perfect set ups, and generally be more end user friendly .

          Show
          Raghu Angadi added a comment - > OK. One thing we can't rely on here is any kind of rDNS to work, hostname lookup may time out, etc. Right. we need to discourage depending on rDNS unless it is really required. rDNS failure should generally be not a fatal error. > I was thinking a little private xen/vmware virtual cluster with no external connectivity or DNS. Its a self-contained network. In that case using the first available ip address works, right? > One thing that is hard when handling different network setups is actually [...] agreed.. we should clarify the requirements, tolerate less perfect set ups, and generally be more end user friendly .
          Hide
          Steve Loughran added a comment -

          > rDNS failure should generally be not a fatal error.

          That's where I think we may have problems right now.

          > we should clarify the requirements, tolerate less perfect set ups, and generally be more end user friendly .

          That's very forgiving. I was thinking more of detecting bad configurations and bailing out early with an error message that could be searched on.

          Show
          Steve Loughran added a comment - > rDNS failure should generally be not a fatal error. That's where I think we may have problems right now. > we should clarify the requirements, tolerate less perfect set ups, and generally be more end user friendly . That's very forgiving. I was thinking more of detecting bad configurations and bailing out early with an error message that could be searched on.
          Hide
          Raghu Angadi added a comment -

          > I was thinking more of detecting bad configurations and bailing out early with an error message that could be searched on.
          +1. This sound fine. Most users will be fine with a good error message.

          Show
          Raghu Angadi added a comment - > I was thinking more of detecting bad configurations and bailing out early with an error message that could be searched on. +1. This sound fine. Most users will be fine with a good error message.
          Hide
          Steve Loughran added a comment -

          I've now tracked down the root cause of this machine's DNS issues: /etc/hosts had the fully qualified name next to 127.0.0.1 and not the shortname, and no DNS infrastructure, so nslookup of the local hostname was failing. Why this causes java to fail to determine its hostname, I do not know. But it means I may be able to recreate the problem in a vmware image.

          Show
          Steve Loughran added a comment - I've now tracked down the root cause of this machine's DNS issues: /etc/hosts had the fully qualified name next to 127.0.0.1 and not the shortname, and no DNS infrastructure, so nslookup of the local hostname was failing. Why this causes java to fail to determine its hostname, I do not know. But it means I may be able to recreate the problem in a vmware image.
          Hide
          Steve Loughran added a comment -

          This is the latest patch; works well on my networks and caches the local hostname and IPAddress only.

          Show
          Steve Loughran added a comment - This is the latest patch; works well on my networks and caches the local hostname and IPAddress only.
          Hide
          Chris Douglas added a comment -

          Sorry this sat in the patch queue for so long without being reviewed.

          • DNS::reverseDns(Inet4Address,String) has some commented-out code in it that should be removed
          • Shouldn't cachedHostAddress and cachedHostName be final, rather than volatile Strings? Calling a method to initialize these is a good idea, but doing it lazily seems to offer no advantages.
              private static final String cachedHostname = getLocalHostname();
            
              private static String getLocalHostname() {
                String localhost;
                try {
                  localhost = InetAddress.getLocalHost().getCanonicalHostName();
                } catch (UnknownHostException e) {
                  LOG.info("Unable to determine local hostname "
                          + "-falling back to \""+LOCALHOST+"\"", e);
                  localhost = LOCALHOST;
                }
                return localhost;
              }
            

            These should also be listed at the top of the class, with the other fields. Since they're never updated, "cached" seems like the wrong name.

          • Until something productive is done with IPv6 addresses, the effort to throw from its handler method seems ill spent. The check for an Inet4Address is worthwhile, but it should be in the existing, public reverseDns method. The new, private overloads are unnecessary.
          Show
          Chris Douglas added a comment - Sorry this sat in the patch queue for so long without being reviewed. DNS::reverseDns(Inet4Address,String) has some commented-out code in it that should be removed Shouldn't cachedHostAddress and cachedHostName be final, rather than volatile Strings? Calling a method to initialize these is a good idea, but doing it lazily seems to offer no advantages. private static final String cachedHostname = getLocalHostname(); private static String getLocalHostname() { String localhost; try { localhost = InetAddress.getLocalHost().getCanonicalHostName(); } catch (UnknownHostException e) { LOG.info( "Unable to determine local hostname " + "-falling back to \" "+LOCALHOST+" \"", e); localhost = LOCALHOST; } return localhost; } These should also be listed at the top of the class, with the other fields. Since they're never updated, "cached" seems like the wrong name. Until something productive is done with IPv6 addresses, the effort to throw from its handler method seems ill spent. The check for an Inet4Address is worthwhile, but it should be in the existing, public reverseDns method. The new, private overloads are unnecessary.
          Hide
          Steve Loughran added a comment -

          This update does as suggested -it creates static final cached values

          1. errors are logged at info and then, on the second attempt, at ERROR, as it means your network is really hosed.

          2. The test cases handle the situation of HADOOP-5339, where reverse DNS fails on loopback addresses

          Show
          Steve Loughran added a comment - This update does as suggested -it creates static final cached values 1. errors are logged at info and then, on the second attempt, at ERROR, as it means your network is really hosed. 2. The test cases handle the situation of HADOOP-5339 , where reverse DNS fails on loopback addresses
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12401055/hadoop-3426.patch
          against trunk revision 748403.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/11/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/11/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/11/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/11/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12401055/hadoop-3426.patch against trunk revision 748403. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/11/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/11/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/11/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/11/console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12401055/hadoop-3426.patch
          against trunk revision 748623.

          -1 @author. The patch appears to contain @author tags which the Hadoop community has agreed to not allow in code contributions.

          +1 tests included. The patch appears to include new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/12/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12401055/hadoop-3426.patch against trunk revision 748623. -1 @author. The patch appears to contain @author tags which the Hadoop community has agreed to not allow in code contributions. +1 tests included. The patch appears to include new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/12/console This message is automatically generated.
          Hide
          Steve Loughran added a comment -

          regenerated patch; not sure why hudson is playing up

          Show
          Steve Loughran added a comment - regenerated patch; not sure why hudson is playing up
          Hide
          Steve Loughran added a comment -

          regenerating and resubmitting; I'm not sure why hudson is unhappy with the patch, but there don't appear to be any author tags, and looking at the console output, there appear to be missing directories in the build

          Show
          Steve Loughran added a comment - regenerating and resubmitting; I'm not sure why hudson is unhappy with the patch, but there don't appear to be any author tags, and looking at the console output, there appear to be missing directories in the build
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12401170/hadoop-3426.patch
          against trunk revision 749262.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/31/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/31/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/31/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/31/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12401170/hadoop-3426.patch against trunk revision 749262. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/31/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/31/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/31/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/31/console This message is automatically generated.
          Hide
          Chris Douglas added a comment -

          I committed this. Thanks, Steve

          Show
          Chris Douglas added a comment - I committed this. Thanks, Steve
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk #827 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/827/)
          . Fix/provide handling when DNS lookup fails on the loopback
          address. Also cache the result of the lookup. Contributed by Steve Loughran

          Show
          Hudson added a comment - Integrated in Hadoop-trunk #827 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/827/ ) . Fix/provide handling when DNS lookup fails on the loopback address. Also cache the result of the lookup. Contributed by Steve Loughran

            People

            • Assignee:
              Steve Loughran
              Reporter:
              Steve Loughran
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development