Hadoop Common
  1. Hadoop Common
  2. HADOOP-985

Namenode should identify DataNodes as ip:port instead of hostname:port

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.11.0
    • Fix Version/s: 0.12.0
    • Component/s: None
    • Labels:
      None

      Description

      Right now NameNode keeps track of DataNodes with "hostname:port". One proposal is to keep track of datanodes with "ip:port". There are various concerns expressed regd hostnames and ip. Please add your experiences here so that we have better idea on what we should fix etc.

      How should be calculate datanode ip:

      1) Just like how we calculate hostname currently with "dfs.datanode.dns.interface" and "dfs.datanode.dns.nameserver". So if interface specified wrong, it could report ip like 127.0.0.1 which might or might not be intended.

      2) Namenode can use the remove socket address when the datanode registers. Not sure how easy it to get this address in RPC or if this is desirable.

      3) Namenode could just resolve the hostname when a datanode registers. It could print of a warning if the resolved ip and reported ip don't match.

      One advantage of using IPs is that DFSClient does not need to resolve them when it connects to datanode. This could save few milliseconds for each block. Also, DFSClient should check all its ips to see if a given ip is local or not.

      As far I see namenode does not resolve any DNS in normal operations since it does not actively contact datanodes. In that sense not sure if this have any change in Namenode performance.

      Thoughts?

      1. dfshealth.html
        4 kB
        Raghu Angadi
      2. HADOOP-985-1.patch
        14 kB
        Raghu Angadi
      3. HADOOP-985-2.patch
        14 kB
        Raghu Angadi
      4. HADOOP-985-3.patch
        23 kB
        Raghu Angadi
      5. HADOOP-985-4.patch
        23 kB
        Raghu Angadi
      6. HADOOP-985-5.patch
        25 kB
        Raghu Angadi
      7. HADOOP-985-6.patch
        25 kB
        Raghu Angadi

        Issue Links

          Activity

          Hide
          Marco Nicosia added a comment -

          I support option #2 (determining remote IP from the socket). From my comment on HADOOP-685:

          I know it's not trivial, but I'd prefer that the nameNode record the IP address of a connection. That way there's no DNS involved at any level in the transaction, and we know exactly which interface/IP address is being used. Additionally, there's no worrying about /etc/hosts, or dhcp, or whatnot. It works for the entire time the dataNode's up, and making network connections.

          Regarding option #1: On the dataNode's side, determining which IP address to use is even harder than determining administrative hostname, since you don't know what route packets will take to get to the nameNode (and on some OS's (solaris) if you have interface IPs and VIPs on that interface, you can't control which IP address will be used).

          Regarding option #3: On startup, massive clusters really pound on the nameNode, delaying startup. The nameNode's already very busy. Worse, I'd hate if the cluster had extended difficulties coming up because DNS lookups were either slow or busted entirely.

          Show
          Marco Nicosia added a comment - I support option #2 (determining remote IP from the socket). From my comment on HADOOP-685 : I know it's not trivial, but I'd prefer that the nameNode record the IP address of a connection. That way there's no DNS involved at any level in the transaction, and we know exactly which interface/IP address is being used. Additionally, there's no worrying about /etc/hosts, or dhcp, or whatnot. It works for the entire time the dataNode's up, and making network connections. Regarding option #1: On the dataNode's side, determining which IP address to use is even harder than determining administrative hostname, since you don't know what route packets will take to get to the nameNode (and on some OS's (solaris) if you have interface IPs and VIPs on that interface, you can't control which IP address will be used). Regarding option #3: On startup, massive clusters really pound on the nameNode, delaying startup. The nameNode's already very busy. Worse, I'd hate if the cluster had extended difficulties coming up because DNS lookups were either slow or busted entirely.
          Hide
          Raghu Angadi added a comment -

          I prefer #2 as well. This could be the default behavior and if dfs.datanode.dns.interface is specified, then we can use the ip of the specific interface (this might be required for some special cases).

          Instead of modifying RPC so that namenode sees remote ip for this case, datanode can report the ip and hostname. Datanode can open a UDP socket to namenode and check the local ip of the socket. I think it does not even need to send any packets. Either case, it does not need namenode to be up or wait for namenode response.

          Datanode can resolve the ip for hostname. This won't always match 'hostname -f'.. I will check how exactly we currently get the hostname.

          Show
          Raghu Angadi added a comment - I prefer #2 as well. This could be the default behavior and if dfs.datanode.dns.interface is specified, then we can use the ip of the specific interface (this might be required for some special cases). Instead of modifying RPC so that namenode sees remote ip for this case, datanode can report the ip and hostname. Datanode can open a UDP socket to namenode and check the local ip of the socket. I think it does not even need to send any packets. Either case, it does not need namenode to be up or wait for namenode response. Datanode can resolve the ip for hostname. This won't always match 'hostname -f'.. I will check how exactly we currently get the hostname.
          Hide
          Owen O'Malley added a comment -

          I think the best way to support rpc calls being able to find the IP address of the caller would be to have a static method in RPC that uses a thread-local variable to return the IP address of the caller. Clearly the RPC framework would set the variable before calling the method on the server and clear it when it was done. Something like:

          /**

          • Get the host ip address of the caller. Only valid on the server while running the remote procedure.
          • @return the dotted ip address of the caller or NULL if not in an RPC call
            */
            public static String getHostAddress() { ... }
          Show
          Owen O'Malley added a comment - I think the best way to support rpc calls being able to find the IP address of the caller would be to have a static method in RPC that uses a thread-local variable to return the IP address of the caller. Clearly the RPC framework would set the variable before calling the method on the server and clear it when it was done. Something like: /** Get the host ip address of the caller. Only valid on the server while running the remote procedure. @return the dotted ip address of the caller or NULL if not in an RPC call */ public static String getHostAddress() { ... }
          Hide
          Doug Cutting added a comment -

          I think Owen's design is good: a static method that references a thread local. I'd put the static method on Server, though, not RPC, and call it getClientAddress().

          Show
          Doug Cutting added a comment - I think Owen's design is good: a static method that references a thread local. I'd put the static method on Server, though, not RPC, and call it getClientAddress().
          Hide
          Raghu Angadi added a comment -

          I was thinking of thread local as well.. but was not sure if it was normal practice or not. will do that.

          Regd hostname, should we just let Datanode behave pretty much how it does now and not bother resolving it Namenode?

          Show
          Raghu Angadi added a comment - I was thinking of thread local as well.. but was not sure if it was normal practice or not. will do that. Regd hostname, should we just let Datanode behave pretty much how it does now and not bother resolving it Namenode?
          Hide
          Raghu Angadi added a comment -

          With this fix, what we displace on dfs front page changes. The href for datanode now will have ip address. See attached dfshealth.html. Following comment in dfshealth.jsp describes what we display:

          /* Say the datanode is dn1.hadoop.apache.org with ip 192.168.0.5
          we use:
          1) d.getHostName():d.getPort() to display.
          Domain and port are stipped if they are common across the nodes.
          i.e. "dn1"
          2) d.getHostName():d.Port() for "title".
          i.e. "dn1.hadoop.apache.org:50010"
          3) d.getHost():d.getInfoPort() for url.
          i.e. "http://192.168.0.5:50075/..."
          Note that "d.getHost():d.getPort()" is what DFS clients use
          to interact with datanodes.
          */

          Yes, the datanode hrefs don't looks good. But one advantage is that we can easily see what namenode and clients see.

          Show
          Raghu Angadi added a comment - With this fix, what we displace on dfs front page changes. The href for datanode now will have ip address. See attached dfshealth.html. Following comment in dfshealth.jsp describes what we display: /* Say the datanode is dn1.hadoop.apache.org with ip 192.168.0.5 we use: 1) d.getHostName():d.getPort() to display. Domain and port are stipped if they are common across the nodes. i.e. "dn1" 2) d.getHostName():d.Port() for "title". i.e. "dn1.hadoop.apache.org:50010" 3) d.getHost():d.getInfoPort() for url. i.e. "http://192.168.0.5:50075/..." Note that "d.getHost():d.getPort()" is what DFS clients use to interact with datanodes. */ Yes, the datanode hrefs don't looks good. But one advantage is that we can easily see what namenode and clients see.
          Hide
          Raghu Angadi added a comment -

          Ok, I switched (2) and (3) above. "title" (hover) shows 192.168.0.5:50010 and href will have hostname.

          Show
          Raghu Angadi added a comment - Ok, I switched (2) and (3) above. "title" (hover) shows 192.168.0.5:50010 and href will have hostname.
          Hide
          Raghu Angadi added a comment -

          Attached patch for using ips in namenode. Added extra field hostName in DatanodeID but it is not serialized.

          I tested with a deliberately wrong config so that each datanode gets "localhost" as its hostname. Namenode web page lists "localhost" for all the nodes but the cluster just-works .

          Show
          Raghu Angadi added a comment - Attached patch for using ips in namenode. Added extra field hostName in DatanodeID but it is not serialized. I tested with a deliberately wrong config so that each datanode gets "localhost" as its hostname. Namenode web page lists "localhost" for all the nodes but the cluster just-works .
          Hide
          Raghu Angadi added a comment -

          2.patch : minor typo fix.

          Show
          Raghu Angadi added a comment - 2.patch : minor typo fix.
          Hide
          Hairong Kuang added a comment -

          The open request takes the client host name as a parameter. Upon receiving an open request, the name node searches the datanode map to find the descriptor of the data node that runs on the client machine. Now that DatanodeDescriptor contains its ip address not its host name. This search always returns null.

          Show
          Hairong Kuang added a comment - The open request takes the client host name as a parameter. Upon receiving an open request, the name node searches the datanode map to find the descriptor of the data node that runs on the client machine. Now that DatanodeDescriptor contains its ip address not its host name. This search always returns null.
          Hide
          Raghu Angadi added a comment -

          attached 3.patch. Updated patch removed 'clientMachine' argument from ClientProtocol's open() and create(). This argument was part of rack-aware patch.

          Show
          Raghu Angadi added a comment - attached 3.patch. Updated patch removed 'clientMachine' argument from ClientProtocol's open() and create(). This argument was part of rack-aware patch.
          Hide
          Raghu Angadi added a comment -

          Thanks Hairong.
          minor change in 4.patch.

          Show
          Raghu Angadi added a comment - Thanks Hairong. minor change in 4.patch.
          Hide
          Hairong Kuang added a comment -

          The patch looks good. I have two comments:

          1. ClientProtocolVersionNumber should be bumped since the syntax of the open & create requests is changed.
          2. DatanodeID contains the fields that need to be saved to the disk. Since the new field hostName does not need to serialized, it might better be put in DatanodeDescriptor.

          Show
          Hairong Kuang added a comment - The patch looks good. I have two comments: 1. ClientProtocolVersionNumber should be bumped since the syntax of the open & create requests is changed. 2. DatanodeID contains the fields that need to be saved to the disk. Since the new field hostName does not need to serialized, it might better be put in DatanodeDescriptor.
          Hide
          Raghu Angadi added a comment -

          Thanks Hairong. I will include both in a new patch.

          This changes the what DFS returns for getDatanodeHints(), which is ultimately used by mapreduce. Two options for handling this:

          a) we can modify getDatanodeHints() to return what it used return before this patch. i.e. return descriptor.getHostName() instead of descriptor.getHost(). Advantage is that no changes are necessary in mapreduce. But does not confirm to 'ip every where' policy.

          b) Make Job and task tracker also deal in ips. I am not sure yet how intrusive this change is.

          My preference is (a). comments?

          Show
          Raghu Angadi added a comment - Thanks Hairong. I will include both in a new patch. This changes the what DFS returns for getDatanodeHints(), which is ultimately used by mapreduce. Two options for handling this: a) we can modify getDatanodeHints() to return what it used return before this patch. i.e. return descriptor.getHostName() instead of descriptor.getHost(). Advantage is that no changes are necessary in mapreduce. But does not confirm to 'ip every where' policy. b) Make Job and task tracker also deal in ips. I am not sure yet how intrusive this change is. My preference is (a). comments?
          Hide
          Hairong Kuang added a comment -

          I also prefer option (a). I would open another jira issue to investigate the use of ip in mapred.

          Show
          Hairong Kuang added a comment - I also prefer option (a). I would open another jira issue to investigate the use of ip in mapred.
          Hide
          Raghu Angadi added a comment -

          5.patch : includes the changes Hairong suggested.

          We now send hostname for hints. Thanks Ownen, verified that job tracker correctly assigns the jobs.

          Show
          Raghu Angadi added a comment - 5.patch : includes the changes Hairong suggested. We now send hostname for hints. Thanks Ownen, verified that job tracker correctly assigns the jobs.
          Hide
          Doug Cutting added a comment -

          This patch no longer applies to the current trunk. Can you please update it? Thanks!

          Show
          Doug Cutting added a comment - This patch no longer applies to the current trunk. Can you please update it? Thanks!
          Hide
          Raghu Angadi added a comment -


          Fixed conflict with HADOOP-442 in ClientProtocol.java.

          Show
          Raghu Angadi added a comment - Fixed conflict with HADOOP-442 in ClientProtocol.java.
          Hide
          Doug Cutting added a comment -

          I just committed this. Thanks, Raghu!

          Show
          Doug Cutting added a comment - I just committed this. Thanks, Raghu!

            People

            • Assignee:
              Raghu Angadi
              Reporter:
              Raghu Angadi
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development