Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-4269

DatanodeManager#registerDatanode rejects all datanode registrations from localhost in single-node developer setup

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0, trunk-win, 2.1.0-beta
    • Fix Version/s: 2.1.0-beta
    • Component/s: namenode
    • Labels:
      None

      Description

      HDFS-3990 is a change that optimized some redundant DNS lookups. As part of that change, DatanodeManager#registerDatanode now rejects attempts to register a datanode for which the name has not been resolved. Unfortunately, this broke single-node developer setups on Windows, because Windows does not resolve 127.0.0.1 to "localhost".

      1. HDFS-4269.1.patch
        1 kB
        Chris Nauroth
      2. HDFS-4269.2.patch
        1 kB
        Chris Nauroth
      3. HDFS-4269.3.patch
        2 kB
        Chris Nauroth

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Patch Available Patch Available Open Open
          1d 2h 2m 2 Chris Nauroth 11/Dec/12 20:20
          Open Open Patch Available Patch Available
          5d 14h 2m 3 Chris Nauroth 11/Dec/12 20:45
          Patch Available Patch Available Resolved Resolved
          3h 28m 1 Suresh Srinivas 12/Dec/12 00:14
          Resolved Resolved Reopened Reopened
          72d 54m 1 Konstantin Boudnik 22/Feb/13 01:09
          Reopened Reopened Resolved Resolved
          22h 45m 1 Konstantin Boudnik 22/Feb/13 23:54
          Allen Wittenauer made changes -
          Fix Version/s 2.1.0-beta [ 12324031 ]
          Fix Version/s 3.0.0 [ 12320356 ]
          Chris Nauroth made changes -
          Link This issue is related to HDFS-5338 [ HDFS-5338 ]
          Hide
          Daryn Sharp added a comment -

          Jagane Sundar I understand your frustration, but regarding the presumed "refuctoring": as far as I can tell, it was a refactoring. I reviewed my change in HDFS-3990. I didn't alter the handling of the allow/deny lists.

          The original implementation always did a reverse lookup in DatanodeManager.checkInList – this caused massive dns lookups for the web pages and new node registrations. checkInList when called via inHostsList returned false if the host was unresolvable. Within DatanodeManager#registerDatanode, the check !inHostsList(nodeReg) would throw a DisallowedDatanodeException.

          All I did was hoist the reverse lookup from checkInList up into registerDatanode, and store the resolved hostname for later use by checkInList to prevent unnecessary reverse lookups. The localhost bug was introduced because I accommodated tests faking the hostname. I compared the ip and hostname instead of checking InetSocketAddress#isUnresolved. On non-windows systems, 127.0.0.1 resolves to localhost; windows wrongly claims 127.0.0.1 resolves to 127.0.0.1, hence it appeared unresolved.

          Regarding your points:

          1. Not requiring hosts to resolve if there are only IPs in both lists may be feasible.
          2. If you have multiple A records for your datanodes that confuse your browser, I think the DFSClient will be equally confused.
          3. Long ago, the NN operated only on IPs. It was changed to support grids using public IPs, and a private intra-grid network with different IPs, where the public and private networks resolved a hostname to the correct IP.
          4. The 'does reverse lookup succeed' cannot be only for exclusion. Decommissioning is signified by the host being in both lists.
          Show
          Daryn Sharp added a comment - Jagane Sundar I understand your frustration, but regarding the presumed "refuctoring": as far as I can tell, it was a refactoring. I reviewed my change in HDFS-3990 . I didn't alter the handling of the allow/deny lists. The original implementation always did a reverse lookup in DatanodeManager.checkInList – this caused massive dns lookups for the web pages and new node registrations. checkInList when called via inHostsList returned false if the host was unresolvable. Within DatanodeManager#registerDatanode , the check !inHostsList(nodeReg) would throw a DisallowedDatanodeException . All I did was hoist the reverse lookup from checkInList up into registerDatanode , and store the resolved hostname for later use by checkInList to prevent unnecessary reverse lookups. The localhost bug was introduced because I accommodated tests faking the hostname. I compared the ip and hostname instead of checking InetSocketAddress#isUnresolved . On non-windows systems, 127.0.0.1 resolves to localhost; windows wrongly claims 127.0.0.1 resolves to 127.0.0.1, hence it appeared unresolved. Regarding your points: Not requiring hosts to resolve if there are only IPs in both lists may be feasible. If you have multiple A records for your datanodes that confuse your browser, I think the DFSClient will be equally confused. Long ago, the NN operated only on IPs. It was changed to support grids using public IPs, and a private intra-grid network with different IPs, where the public and private networks resolved a hostname to the correct IP. The 'does reverse lookup succeed' cannot be only for exclusion. Decommissioning is signified by the host being in both lists.
          Hide
          Jagane Sundar added a comment -

          OK. The TCP endpoint gives you an IP address, not a hostname. It seems like there are two places in the NameNode where you might want to convert these IP addresses to hostnames:

          1. If the hosts allow and deny files have hostnames, then the NameNode must do a reverse DNS lookup of the IP address and ensure that the hostname that the reverse DNS lookup returns is not in the deny file.

          2. In the web UI, the NameNode may want to display a human friendly name, instead of an IP address. (Even this could be an error, because the browser could resolve the hostname to an IP address that the DataNode is not listening on)

          My belief is that the NameNode should operate exclusively using IP addresses. Only in the above two places should there be any DNS reverse lookups.

          Would it not be better to move 'does reverse lookup succeed' check to the inExcludedHostsList method in DatanodeManager.java? This way we will be saying:
          'We need to exclude host X, and we could not reverse lookup your IP, so we do not know whether you are in the excluded list or not. Sorry. We cannot allow a connection from you.'

          The check is now in registerDatanode. What we are saying is:
          'We are unable to do a reverse lookup on your IP address. So we cannot allow any connections from you'

          The problem is over by the east, and we blew out a big giant hole of functionality in the west. And nobody will understand why we have this mysterious hole in the west....

          Is this an example of code refuctoring?
          http://www.waterfall2006.com/Refuctoring.pdf

          Show
          Jagane Sundar added a comment - OK. The TCP endpoint gives you an IP address, not a hostname. It seems like there are two places in the NameNode where you might want to convert these IP addresses to hostnames: 1. If the hosts allow and deny files have hostnames, then the NameNode must do a reverse DNS lookup of the IP address and ensure that the hostname that the reverse DNS lookup returns is not in the deny file. 2. In the web UI, the NameNode may want to display a human friendly name, instead of an IP address. (Even this could be an error, because the browser could resolve the hostname to an IP address that the DataNode is not listening on) My belief is that the NameNode should operate exclusively using IP addresses. Only in the above two places should there be any DNS reverse lookups. Would it not be better to move 'does reverse lookup succeed' check to the inExcludedHostsList method in DatanodeManager.java? This way we will be saying: 'We need to exclude host X, and we could not reverse lookup your IP, so we do not know whether you are in the excluded list or not. Sorry. We cannot allow a connection from you.' The check is now in registerDatanode. What we are saying is: 'We are unable to do a reverse lookup on your IP address. So we cannot allow any connections from you' The problem is over by the east, and we blew out a big giant hole of functionality in the west. And nobody will understand why we have this mysterious hole in the west.... Is this an example of code refuctoring? http://www.waterfall2006.com/Refuctoring.pdf
          Hide
          Chris Nauroth added a comment -

          No worries, Daryn. Thanks for clarification of the impact regarding the allow/deny lists.

          Show
          Chris Nauroth added a comment - No worries, Daryn. Thanks for clarification of the impact regarding the allow/deny lists.
          Hide
          Daryn Sharp added a comment -

          Sorry, was OOO. Performing the lookups outside of the namespace lock (a very good change in and of itself!) does not alleviate the performance penalty associated with say lookups of 4K nodes.

          Requiring a valid reverse DNS lookup for all datanode IP addresses seems like a drastic solution to a performance problem in an out-of-band management service.

          It's bigger than that. The allow and deny lists allow either hostnames or IP addresses to be specified. Let's say you chose to reject or decommission nodes by hostname. There's a DNS hiccup, DNS outage, packet loss that drop the udp requests, that causes a reverse lookup to erroneously fail - now that node slips back in and becomes active again!

          If you want to allow unresolvable IPs, then you essentially have to ban hostnames in the allow and deny host lists.

          Editing /etc/hosts on the NN should be sufficient. I don't think you'll encounter client problems.

          Show
          Daryn Sharp added a comment - Sorry, was OOO. Performing the lookups outside of the namespace lock (a very good change in and of itself!) does not alleviate the performance penalty associated with say lookups of 4K nodes. Requiring a valid reverse DNS lookup for all datanode IP addresses seems like a drastic solution to a performance problem in an out-of-band management service. It's bigger than that. The allow and deny lists allow either hostnames or IP addresses to be specified. Let's say you chose to reject or decommission nodes by hostname . There's a DNS hiccup, DNS outage, packet loss that drop the udp requests, that causes a reverse lookup to erroneously fail - now that node slips back in and becomes active again! If you want to allow unresolvable IPs, then you essentially have to ban hostnames in the allow and deny host lists. Editing /etc/hosts on the NN should be sufficient. I don't think you'll encounter client problems.
          Hide
          Chris Nauroth added a comment -

          I filed HDFS-4702 to investigate removing the namesystem lock from this code path.

          Show
          Chris Nauroth added a comment - I filed HDFS-4702 to investigate removing the namesystem lock from this code path.
          Chris Nauroth made changes -
          Link This issue relates to HDFS-4702 [ HDFS-4702 ]
          Hide
          Chris Nauroth added a comment -

          It looks like the performance is indeed in the metadata path, since a read lock is held on the Namespace while the reverse lookup is performed.

          There was supposed to be a follow-up jira to HDFS-3990 for investigating if this lock can be removed. Daryn Sharp, do you know if we have that jira? I can't find one, so I'll plan on creating a new one later today unless I hear that it already exists somewhere.

          The lock happens in DatanodeManager#fetchDatanodes, but it's not immediately obvious to me why we need the namesystem lock here. Perhaps it's related to the edit log ops OP_DATANODE_ADD and OP_DATANODE_REMOVE, which are deprecated and now appear to be unused? Maybe it's safe to remove the lock now, but this will need more investigation.

          I think a config param that turns off this check still makes sense.

          There is existing precedent for undocumented testing configuration flags in the codebase. Examples: dfs.namenode.delegation.token.always-use and setting dfs.datanode.scan.period.hours to a negative number to disable the block scanner thread. Jagane Sundar, I recommend that you file a separate jira for this as a new feature, and the whole community can weigh in. Hopefully, we can get this to a state where it's more convenient for you.

          Show
          Chris Nauroth added a comment - It looks like the performance is indeed in the metadata path, since a read lock is held on the Namespace while the reverse lookup is performed. There was supposed to be a follow-up jira to HDFS-3990 for investigating if this lock can be removed. Daryn Sharp , do you know if we have that jira? I can't find one, so I'll plan on creating a new one later today unless I hear that it already exists somewhere. The lock happens in DatanodeManager#fetchDatanodes , but it's not immediately obvious to me why we need the namesystem lock here. Perhaps it's related to the edit log ops OP_DATANODE_ADD and OP_DATANODE_REMOVE , which are deprecated and now appear to be unused? Maybe it's safe to remove the lock now, but this will need more investigation. I think a config param that turns off this check still makes sense. There is existing precedent for undocumented testing configuration flags in the codebase. Examples: dfs.namenode.delegation.token.always-use and setting dfs.datanode.scan.period.hours to a negative number to disable the block scanner thread. Jagane Sundar , I recommend that you file a separate jira for this as a new feature, and the whole community can weigh in. Hopefully, we can get this to a state where it's more convenient for you.
          Hide
          Jagane Sundar added a comment -

          I apologize - I should have read the bug completely. It looks like the performance is indeed in the metadata path, since a read lock is held on the Namespace while the reverse lookup is performed.

          I think a config param that turns off this check still makes sense.

          Show
          Jagane Sundar added a comment - I apologize - I should have read the bug completely. It looks like the performance is indeed in the metadata path, since a read lock is held on the Namespace while the reverse lookup is performed. I think a config param that turns off this check still makes sense.
          Hide
          Jagane Sundar added a comment -

          Thanks for your explanation, Chris.

          The performance problem you mention is not in the data path or the metadata path of HDFS, right?
          Requiring a valid reverse DNS lookup for all datanode IP addresses seems like a drastic solution to a performance problem in an out-of-band management service.

          I am going to sound like the crusty curmudgeon that I am when I say this: For time immemorial, TCP services have worked with raw IP addresses that do not have valid reverse lookups. I think that HDFS should work with raw IP addresses.

          Editing /etc/hosts or creating a new DNS server are not solutions. The client machines now need to know of this new DNS mapping. Forcing users to add the new DNS server to their TCP stack's configuration, or asking them to edit their /etc/hosts or their C:\windows\system32\drivers\etc\hosts (editable only by programs running as administrator) is not practical.

          I see one possible solution to this: Enhance the web UI to use some AJAX magic and display IP addresses first, then switch to Hostnames as the reverse lookups in a server thread succeed.

          In lieu of this, can we at least add a hdfs-site.xml config option that turns off this check?

          Show
          Jagane Sundar added a comment - Thanks for your explanation, Chris. The performance problem you mention is not in the data path or the metadata path of HDFS, right? Requiring a valid reverse DNS lookup for all datanode IP addresses seems like a drastic solution to a performance problem in an out-of-band management service. I am going to sound like the crusty curmudgeon that I am when I say this: For time immemorial, TCP services have worked with raw IP addresses that do not have valid reverse lookups. I think that HDFS should work with raw IP addresses. Editing /etc/hosts or creating a new DNS server are not solutions. The client machines now need to know of this new DNS mapping. Forcing users to add the new DNS server to their TCP stack's configuration, or asking them to edit their /etc/hosts or their C:\windows\system32\drivers\etc\hosts (editable only by programs running as administrator) is not practical. I see one possible solution to this: Enhance the web UI to use some AJAX magic and display IP addresses first, then switch to Hostnames as the reverse lookups in a server thread succeed. In lieu of this, can we at least add a hdfs-site.xml config option that turns off this check?
          Hide
          Chris Nauroth added a comment -

          Hello, Jagane Sundar. HDFS-3990 made a code change to reject registration of datanodes for which the hostname corresponding to the IP address could not be resolved. The reason for this change is that allowing registration of a datanode with an unresolved hostname caused performance problems later. The namenode web UI's list of live datanodes would attempt a reverse DNS lookup for every unresolved data node. For a cluster of thousands of nodes, the massive number of reverse DNS lookups would cause a performance problem.

          This issue, HDFS-4269, specifically covers some fallout of the HDFS-3990 patch that we observed on Windows. For Windows, a reverse DNS lookup of 127.0.0.1 does not resolve to localhost. AFAIK, Windows is the only OS that has this behavior. Since usage of localhost is common in the test suites, we added an escape hatch for 127.0.0.1 to keep the test suites passing on Windows.

          You mentioned the specific case of running a VM with a DHCP-assigned address and bridged networking. You're right that you need to arrange for the reverse DNS lookup to work. When I've run this kind of setup, I've done it in one of two ways:

          1. Edit /etc/hosts to map 192.168.1.94 (or any other DHCP-assigned address) to a specific hostname.
          2. Run my own local name server to do the same.

          In both of these cases, a VM reboot may invalidate your configuration (either the /etc/hosts entry or the A record in your local name server) by getting a different DHCP-assigned address. In theory, you could write some automation to update /etc/hosts or the local name server with your current DHCP-assigned address on boot. In practice, I've personally never bothered to take it this far, because my VMs change their IP addresses relatively infrequently, and a manual update isn't too cumbersome.

          Unfortunately, I don't think there is anything we can do in the Hadoop code itself to simplify this. I can't think of a reliable way to detect the difference between an unresolvable IP address and a "testing" IP address, for which name resolution isn't important.

          Show
          Chris Nauroth added a comment - Hello, Jagane Sundar . HDFS-3990 made a code change to reject registration of datanodes for which the hostname corresponding to the IP address could not be resolved. The reason for this change is that allowing registration of a datanode with an unresolved hostname caused performance problems later. The namenode web UI's list of live datanodes would attempt a reverse DNS lookup for every unresolved data node. For a cluster of thousands of nodes, the massive number of reverse DNS lookups would cause a performance problem. This issue, HDFS-4269 , specifically covers some fallout of the HDFS-3990 patch that we observed on Windows. For Windows, a reverse DNS lookup of 127.0.0.1 does not resolve to localhost. AFAIK, Windows is the only OS that has this behavior. Since usage of localhost is common in the test suites, we added an escape hatch for 127.0.0.1 to keep the test suites passing on Windows. You mentioned the specific case of running a VM with a DHCP-assigned address and bridged networking. You're right that you need to arrange for the reverse DNS lookup to work. When I've run this kind of setup, I've done it in one of two ways: Edit /etc/hosts to map 192.168.1.94 (or any other DHCP-assigned address) to a specific hostname. Run my own local name server to do the same. In both of these cases, a VM reboot may invalidate your configuration (either the /etc/hosts entry or the A record in your local name server) by getting a different DHCP-assigned address. In theory, you could write some automation to update /etc/hosts or the local name server with your current DHCP-assigned address on boot. In practice, I've personally never bothered to take it this far, because my VMs change their IP addresses relatively infrequently, and a manual update isn't too cumbersome. Unfortunately, I don't think there is anything we can do in the Hadoop code itself to simplify this. I can't think of a reliable way to detect the difference between an unresolvable IP address and a "testing" IP address, for which name resolution isn't important.
          Hide
          Jagane Sundar added a comment -

          It would appear that there is more to this problem.

          Here is what I am trying to do:
          Create a single VM developer environment that runs all daemons in a VM. The VM gets a DHCP IP address, but there is no hostname associated with the IP address. I configure hadoop using the DHCP IP address (e.g. 192.168.1.94) instead of the hostname, or 'localhost' or '127.0.0.1]'. Datanode registration fails because of this check.

          Creating an escape hatch just for 127.0.0.1 does not solve my problem because I want to use the DHCP address 192.168.1.94. I want to use 192.168.1.94 because I want to be able to access this VM from my host OS, or from other machines in the network (if I use bridged networking in the virtual NIC configuration).

          I don't quite follow the original reasoning behind this check. Can somebody clarify that?

          Show
          Jagane Sundar added a comment - It would appear that there is more to this problem. Here is what I am trying to do: Create a single VM developer environment that runs all daemons in a VM. The VM gets a DHCP IP address, but there is no hostname associated with the IP address. I configure hadoop using the DHCP IP address (e.g. 192.168.1.94) instead of the hostname, or 'localhost' or '127.0.0.1]'. Datanode registration fails because of this check. Creating an escape hatch just for 127.0.0.1 does not solve my problem because I want to use the DHCP address 192.168.1.94. I want to use 192.168.1.94 because I want to be able to access this VM from my host OS, or from other machines in the network (if I use bridged networking in the virtual NIC configuration). I don't quite follow the original reasoning behind this check. Can somebody clarify that?
          Konstantin Boudnik made changes -
          Status Reopened [ 4 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Konstantin Boudnik added a comment -

          Committed to branch-2. Thanks Chris.

          Show
          Konstantin Boudnik added a comment - Committed to branch-2. Thanks Chris.
          Hide
          Konstantin Boudnik added a comment -

          This is great Chris! I will commit it in the morning then!

          Show
          Konstantin Boudnik added a comment - This is great Chris! I will commit it in the morning then!
          Chris Nauroth made changes -
          Affects Version/s 2.0.4-beta [ 12324031 ]
          Target Version/s 3.0.0, trunk-win [ 12320356, 12323305 ] 3.0.0, trunk-win, 2.0.4-beta [ 12320356, 12323305, 12324031 ]
          Hide
          Chris Nauroth added a comment -

          Thanks, Konstantin. I've confirmed that the same patch can apply cleanly to branch-2. I applied it and ran TestHDFSFileContextMainOperations for a quick test.

          Adding 2.0.4-beta to Affects Versions and Target Versions.

          Show
          Chris Nauroth added a comment - Thanks, Konstantin. I've confirmed that the same patch can apply cleanly to branch-2. I applied it and ran TestHDFSFileContextMainOperations for a quick test. Adding 2.0.4-beta to Affects Versions and Target Versions.
          Hide
          Konstantin Boudnik added a comment -

          This original "feature" also breaks Linux especially CentOS with its silly way of configuring /etc/hosts

          Show
          Konstantin Boudnik added a comment - This original "feature" also breaks Linux especially CentOS with its silly way of configuring /etc/hosts
          Konstantin Boudnik made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Hide
          Konstantin Boudnik added a comment -

          Reopening to merge into branch-2

          Show
          Konstantin Boudnik added a comment - Reopening to merge into branch-2
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1283 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1283/)
          HDFS-4269. Datanode rejects all datanode registrations from localhost in single-node developer setup on Windows. Contributed by Chris Nauroth. (Revision 1420492)

          Result = SUCCESS
          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1420492
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1283 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1283/ ) HDFS-4269 . Datanode rejects all datanode registrations from localhost in single-node developer setup on Windows. Contributed by Chris Nauroth. (Revision 1420492) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1420492 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1252 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1252/)
          HDFS-4269. Datanode rejects all datanode registrations from localhost in single-node developer setup on Windows. Contributed by Chris Nauroth. (Revision 1420492)

          Result = FAILURE
          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1420492
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1252 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1252/ ) HDFS-4269 . Datanode rejects all datanode registrations from localhost in single-node developer setup on Windows. Contributed by Chris Nauroth. (Revision 1420492) Result = FAILURE suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1420492 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Yarn-trunk #63 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/63/)
          HDFS-4269. Datanode rejects all datanode registrations from localhost in single-node developer setup on Windows. Contributed by Chris Nauroth. (Revision 1420492)

          Result = SUCCESS
          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1420492
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
          Show
          Hudson added a comment - Integrated in Hadoop-Yarn-trunk #63 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/63/ ) HDFS-4269 . Datanode rejects all datanode registrations from localhost in single-node developer setup on Windows. Contributed by Chris Nauroth. (Revision 1420492) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1420492 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk-Commit #3114 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3114/)
          HDFS-4269. Datanode rejects all datanode registrations from localhost in single-node developer setup on Windows. Contributed by Chris Nauroth. (Revision 1420492)

          Result = SUCCESS
          suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1420492
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
          Show
          Hudson added a comment - Integrated in Hadoop-trunk-Commit #3114 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3114/ ) HDFS-4269 . Datanode rejects all datanode registrations from localhost in single-node developer setup on Windows. Contributed by Chris Nauroth. (Revision 1420492) Result = SUCCESS suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1420492 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
          Suresh Srinivas made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Fix Version/s 3.0.0 [ 12320356 ]
          Resolution Fixed [ 1 ]
          Hide
          Suresh Srinivas added a comment -

          I committed the patch. Thank you Chris.

          Show
          Suresh Srinivas added a comment - I committed the patch. Thank you Chris.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12560450/HDFS-4269.3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3640//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3640//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12560450/HDFS-4269.3.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3640//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3640//console This message is automatically generated.
          Hide
          Suresh Srinivas added a comment -

          I will commit the patch after +1 from Jenkins.

          Show
          Suresh Srinivas added a comment - I will commit the patch after +1 from Jenkins.
          Hide
          Suresh Srinivas added a comment -

          Chris, thanks for making the change. The code looks much better. +1.

          Show
          Suresh Srinivas added a comment - Chris, thanks for making the change. The code looks much better. +1.
          Chris Nauroth made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Chris Nauroth made changes -
          Attachment HDFS-4269.3.patch [ 12560450 ]
          Hide
          Chris Nauroth added a comment -

          Thanks, Suresh. Here is an updated patch that incorporates your feedback.

          Show
          Chris Nauroth added a comment - Thanks, Suresh. Here is an updated patch that incorporates your feedback.
          Chris Nauroth made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Suresh Srinivas added a comment -

          Patch looks good. Minor comments: Could you move the host to ip check into a separate method DatanodeManager#isNameResolved(). Add a brief comment on why this is being done where it is called. The comment related to windows specific logic could be moved to method implementation.

          Show
          Suresh Srinivas added a comment - Patch looks good. Minor comments: Could you move the host to ip check into a separate method DatanodeManager#isNameResolved(). Add a brief comment on why this is being done where it is called. The comment related to windows specific logic could be moved to method implementation.
          Hide
          Chris Nauroth added a comment -

          I believe the test failure is unrelated to this patch, and instead is related to a recent patch for HDFS-4261. I just saw Nicholas comment on HDFS-4261 that he was going to revert that change.

          Show
          Chris Nauroth added a comment - I believe the test failure is unrelated to this patch, and instead is related to a recent patch for HDFS-4261 . I just saw Nicholas comment on HDFS-4261 that he was going to revert that change.
          Hide
          Suresh Srinivas added a comment -

          Chris, can you look at the test failure flagged by Jenkins?

          Show
          Suresh Srinivas added a comment - Chris, can you look at the test failure flagged by Jenkins?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12560287/HDFS-4269.2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3631//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3631//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12560287/HDFS-4269.2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3631//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3631//console This message is automatically generated.
          Hide
          Arpit Agarwal added a comment -

          +1

          Looks good with the change. Thanks for fixing it.

          Show
          Arpit Agarwal added a comment - +1 Looks good with the change. Thanks for fixing it.
          Chris Nauroth made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Chris Nauroth made changes -
          Attachment HDFS-4269.2.patch [ 12560287 ]
          Hide
          Chris Nauroth added a comment -

          Thanks, Arpit. I have uploaded a new patch that switches to using InetAddress#isLoopbackAddress. I did not use InetAddress#isAnyLocalAddress, because this code is checking the source of a client connection, so I wouldn't expect to see a wildcard address here.

          Show
          Chris Nauroth added a comment - Thanks, Arpit. I have uploaded a new patch that switches to using InetAddress#isLoopbackAddress . I did not use InetAddress#isAnyLocalAddress , because this code is checking the source of a client connection, so I wouldn't expect to see a wildcard address here.
          Chris Nauroth made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Arpit Agarwal added a comment -

          That should be dnAddress.isLoopbackAddress().

          Show
          Arpit Agarwal added a comment - That should be dnAddress.isLoopbackAddress().
          Hide
          Arpit Agarwal added a comment -

          Hi Chris,

          Checking for ip.isLoopbackAddress() would a more portable option if it works.

          Also I wonder if you need to check for and allow ip.isAnyLocalAddress()?

          Show
          Arpit Agarwal added a comment - Hi Chris, Checking for ip.isLoopbackAddress() would a more portable option if it works. Also I wonder if you need to check for and allow ip.isAnyLocalAddress()?
          Hide
          Chris Nauroth added a comment -

          Jenkins gave -1 for no new tests, but this change is required to fix all MiniDFSCluster-based tests on Windows.

          Show
          Chris Nauroth added a comment - Jenkins gave -1 for no new tests, but this change is required to fix all MiniDFSCluster-based tests on Windows.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12560225/HDFS-4269.1.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3626//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3626//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12560225/HDFS-4269.1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3626//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3626//console This message is automatically generated.
          Chris Nauroth made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Chris Nauroth made changes -
          Affects Version/s 3.0.0 [ 12320356 ]
          Target Version/s trunk-win [ 12323305 ] 3.0.0, trunk-win [ 12320356, 12323305 ]
          Hide
          Chris Nauroth added a comment -

          Adding 3.0.0 to Affects Version/s and Target Version/s. This patch can commit to trunk and then merge to branch-trunk-win.

          Show
          Chris Nauroth added a comment - Adding 3.0.0 to Affects Version/s and Target Version/s. This patch can commit to trunk and then merge to branch-trunk-win.
          Chris Nauroth made changes -
          Attachment HDFS-4269.1.patch [ 12560225 ]
          Hide
          Chris Nauroth added a comment -

          This patch allows "127.0.0.1" to register as a special case. The other idea mentioned was NetUtils#getStaticResolution, but this really seems like a lone special case.

          Show
          Chris Nauroth added a comment - This patch allows "127.0.0.1" to register as a special case. The other idea mentioned was NetUtils#getStaticResolution , but this really seems like a lone special case.
          Chris Nauroth made changes -
          Field Original Value New Value
          Link This issue is broken by HDFS-3990 [ HDFS-3990 ]
          Hide
          Chris Nauroth added a comment -

          We merged this change to branch-trunk-win on Friday, 11/30. Unfortunately, this had an unintended side effect of breaking on Windows, at least for single-node developer setups, because of the code change to reject registration of an unresolved data node:

          public void registerDatanode(DatanodeRegistration nodeReg)
                throws DisallowedDatanodeException {
              InetAddress dnAddress = Server.getRemoteIp();
              if (dnAddress != null) {
                // Mostly called inside an RPC, update ip and peer hostname
                String hostname = dnAddress.getHostName();
                String ip = dnAddress.getHostAddress();
                if (hostname.equals(ip)) {
                  LOG.warn("Unresolved datanode registration from " + ip);
                  throw new DisallowedDatanodeException(nodeReg);
                }
          

          On Windows, 127.0.0.1 does not resolve to localhost. It reports host name as "127.0.0.1". Therefore, on Windows, running pseudo-distributed mode or MiniDFSCluster-based tests always rejects the datanode registrations. (See HADOOP-8414 for more discussion of the particulars of resolving 127.0.0.1 on Windows.)

          Potential fixes I can think of:

          1. Add special case logic to allow registration if ip.equals("127.0.0.1"). This is the quick fix I applied to my dev environment to unblock myself last Friday.
          2. Add a check against NetUtils.getStaticResolution and register it with NetUtils.addStaticResolution("127.0.0.1", "localhost") somewhere at initialization time.

          Below is a short code sample that demonstrates the problem. This is a very rough approximation of the IPC Server/Connection and DatanodeManager logic. When I run this server on Mac, it prints "connection from hostName = localhost, hostAddress = 127.0.0.1, canonicalHostName = localhost" for any client connection. On Windows, it prints "connection from hostName = 127.0.0.1, hostAddress = 127.0.0.1, canonicalHostName = 127.0.0.1".

          package cnauroth;
          
          import java.io.PrintWriter;
          import java.net.InetAddress;
          import java.net.InetSocketAddress;
          import java.net.ServerSocket;
          import java.net.Socket;
          import java.nio.channels.ServerSocketChannel;
          
          class Main {
            public static void main(String[] args) throws Exception {
              ServerSocket ss = ServerSocketChannel.open().socket();
              ss.bind(new InetSocketAddress("localhost", 1234), 0);
              System.out.println("ss = " + ss);
              for (;;) {
                Socket s = ss.accept();
                InetAddress addr = s.getInetAddress();
                System.out.println("connection from hostName = " + addr.getHostName() + ", hostAddress = " + addr.getHostAddress() + ", canonicalHostName = " + addr.getCanonicalHostName());
                PrintWriter pw = new PrintWriter(s.getOutputStream());
                pw.println("hello");
                pw.close();
                s.close();
              }
            }
          }
          
          Show
          Chris Nauroth added a comment - We merged this change to branch-trunk-win on Friday, 11/30. Unfortunately, this had an unintended side effect of breaking on Windows, at least for single-node developer setups, because of the code change to reject registration of an unresolved data node: public void registerDatanode(DatanodeRegistration nodeReg) throws DisallowedDatanodeException { InetAddress dnAddress = Server.getRemoteIp(); if (dnAddress != null ) { // Mostly called inside an RPC, update ip and peer hostname String hostname = dnAddress.getHostName(); String ip = dnAddress.getHostAddress(); if (hostname.equals(ip)) { LOG.warn( "Unresolved datanode registration from " + ip); throw new DisallowedDatanodeException(nodeReg); } On Windows, 127.0.0.1 does not resolve to localhost. It reports host name as "127.0.0.1". Therefore, on Windows, running pseudo-distributed mode or MiniDFSCluster-based tests always rejects the datanode registrations. (See HADOOP-8414 for more discussion of the particulars of resolving 127.0.0.1 on Windows.) Potential fixes I can think of: Add special case logic to allow registration if ip.equals("127.0.0.1"). This is the quick fix I applied to my dev environment to unblock myself last Friday. Add a check against NetUtils.getStaticResolution and register it with NetUtils.addStaticResolution("127.0.0.1", "localhost") somewhere at initialization time. Below is a short code sample that demonstrates the problem. This is a very rough approximation of the IPC Server/Connection and DatanodeManager logic. When I run this server on Mac, it prints "connection from hostName = localhost, hostAddress = 127.0.0.1, canonicalHostName = localhost" for any client connection. On Windows, it prints "connection from hostName = 127.0.0.1, hostAddress = 127.0.0.1, canonicalHostName = 127.0.0.1". package cnauroth; import java.io.PrintWriter; import java.net.InetAddress; import java.net.InetSocketAddress; import java.net.ServerSocket; import java.net.Socket; import java.nio.channels.ServerSocketChannel; class Main { public static void main( String [] args) throws Exception { ServerSocket ss = ServerSocketChannel.open().socket(); ss.bind( new InetSocketAddress( "localhost" , 1234), 0); System .out.println( "ss = " + ss); for (;;) { Socket s = ss.accept(); InetAddress addr = s.getInetAddress(); System .out.println( "connection from hostName = " + addr.getHostName() + ", hostAddress = " + addr.getHostAddress() + ", canonicalHostName = " + addr.getCanonicalHostName()); PrintWriter pw = new PrintWriter(s.getOutputStream()); pw.println( "hello" ); pw.close(); s.close(); } } }
          Chris Nauroth created issue -

            People

            • Assignee:
              Chris Nauroth
              Reporter:
              Chris Nauroth
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development