Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
The Datanode should register multiple interfaces with the Namenode (who then forwards them to clients). We can do this by extending the DatanodeID, which currently just contains a single interface, to contain a list of interfaces. For compatibility, the DatanodeID method to get the DN address for data transfer should remain unchanged (multiple interfaces are only used where the client explicitly takes advantage of them).
By default, if the Datanode binds on all interfaces (via using the wildcard in the dfs*address configuration) all interfaces are exposed, modulo ones like the loopback that should never be exposed. Alternatively, a new configuration parameter (dfs.datanode.available.interfaces) allows the set of interfaces can be specified explicitly in case the user only wants to expose a subset. If the new default behavior is too disruptive we could default dfs.datanode.available.interfaces to be the IP of the IPC interface which is the only interface exposed today (per HADOOP-6867, only the port from dfs.datanode.address is used today).
The interfaces can be specified by name (eg "eth0"), subinterface name (eg "eth0:0"), or IP address. The IP address can be specified by range using CIDR notation so the configuration values are portable.