Issue Details (XML | Word | Printable)

Key: HADOOP-497
Type: New Feature New Feature
Status: Closed Closed
Resolution: Fixed
Priority: Minor Minor
Assignee: Unassigned
Reporter: Lorenzo Thione
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

DataNodes and TaskTrackers should be able to report hostnames and ips relative to customizable network interfaces and nameservers

Created: 31/Aug/06 02:31 AM   Updated: 08/Jul/09 04:51 PM
Return to search
Component/s: util
Affects Version/s: 0.6.0
Fix Version/s: 0.6.0

Time Tracking:
Not Specified

File Attachments:
  Size
Java Archive File dnsjava-2.0.2.jar 2006-08-31 02:31 AM Lorenzo Thione 261 kB
Text File Licensed for inclusion in ASF works net-dns.patch 2006-09-06 11:24 PM Lorenzo Thione 12 kB
Text File Licensed for inclusion in ASF works net-dns.patch 2006-09-01 07:28 AM Lorenzo Thione 11 kB
Text File nif-utils.patch 2006-08-31 02:31 AM Lorenzo Thione 9 kB

Resolution Date: 07/Sep/06 08:00 PM


 Description  « Hide
This patch allows for network configuration parameters to be aded to the hadoop-site.xml file. These parameters specify a network interface name and an optional nameserver hostname which DataNodes and TaskTrackers consult to resolve their hostnames from the IP bound to the specified network interface.

This is useful when machines that are part of different physical or logical network need to participate in hadoop clusters as client nodes. The hostname and IP reported by InetAddress.getLocalHost() are not necessarily the ones that will allow the JobTracker and NameNode to reach the clients, as well as not necessarily the ones through which the DFS clients can reach the DataNodes.

The configuration parameters are

  • cluster.report.nif
  • cluster.report.ns

nif: takes the name of a network interface, like en0, en1 (on macs), eth0, etc...
ns: the host name of a DNS server to use when resolving the IP bound to the specified nif

These parameters are set by default to the value "default" which will replicate the current behavior of reporting InetAddress.getLocalHost().getHostName() and getHostAddress()

As part of the patch, a new library dnsjava was added along with its license information (BSD license). The list of affected files is:

src
org.apache.hadoop.dfs.DataNode
org.apache.hadoop.mapred.taskTracker
org.apache.hadoop.util.NetworkUtils
conf
hadoop-default.xml
lib
dnsjava-2.0.2.jar
dnsjava-2.0.2.LICENSE.txt



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Doug Cutting added a comment - 31/Aug/06 05:32 PM
I'm not sure why you need the dnsjava library. At a glance, it seems you could instead set sun.net.spi.nameservice.nameservers and then use the java.net.NetworkInterface to find which hostnames correspond to a particular interface. Why doesn't that work?

A few nits with the code:

  • It should probably be named net/DNS.java, insead of util/NetworkUtils.java. The util package is already a mess.
  • The descriptions of the new config properties only mention tasktracker and jobtracker, but the options are also valid for datanode and namenode. I would also name these properties something like net.dns.interface and net.interface.nameserver.
  • The new class and all its public members should have informative javadoc comments. if we add a new package, it should have a package.html with at least a one-line description of the package.

Lorenzo Thione added a comment - 01/Sep/06 07:27 AM
Just setting sun.net.spi.nameservice.nameservers worked for direct lookup but not for reverse lookup, at least when I first looked at it. This new patch does the job without using an external library, just using JNDI directly to query the DNS server. The new class has a new name (DNS.java) , and a new location (net) . A new package.html file was created as well. Finally, now network interface and nameserver are independently configurable for datanodes and tasktrackers. The names of the new properties have been changed to

net.dns.datanode.interface
net.dns.tasktracker.interface
net.dns.datanode.nameserver
net.dns.tasktracker.nameserver


Doug Cutting added a comment - 05/Sep/06 06:08 PM
Do we really think we'll want to configure these separately for the tasktracker and datanode? If so, then these properties should probably be named something like mapred.tasktracker.dns.interface, and grouped with mapred & dfs options.

Lorenzo Thione added a comment - 06/Sep/06 07:28 AM
For us there were real reasons why this was necessary. The NameNode (which cares about the host names for the DataNodes) and the JobTracker (which cares about the hostnames of the TaskTrackers) are on different networks. It seems that configuring these independently gives the right level of flexibility. Any reason why we'd want to keep them necessarily aligned? I'd be fine with changing the names of the properties.

Doug Cutting added a comment - 06/Sep/06 05:28 PM
Okay, then, yes, let's rename these to be mapred.* config parameters.

Also, lots of the lines are longer than 80 columns. Can you please re-format those?


Lorenzo Thione added a comment - 06/Sep/06 11:24 PM
Here's a new patch, with reformatted column-size and new names for properties.

mapred.tasktracker.dns.interface
mapred.tasktracker.dns.nameserver
dfs.datanode.dns.interface
dfs.datanode.dns.nameserver


Doug Cutting added a comment - 07/Sep/06 08:00 PM
I just committed this. The patch had some spurious changes to imports, and indented four spaces-per-level rather than two. I also moved some duplicated code into the DNS utility class. Thanks, Lorenzo!