Hadoop Common
  1. Hadoop Common
  2. HADOOP-8414

Address problems related to localhost resolving to 127.0.0.1 on Windows

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: 1-win
    • Component/s: fs, test
    • Labels:
      None
    • Target Version/s:

      Description

      Localhost resolves to 127.0.0.1 on Windows and that causes the following tests to fail:

      • TestHarFileSystem
      • TestCLI
      • TestSaslRPC

      This Jira tracks fixing these tests and other possible places that have similar issue.

      1. HADOOP-8414-branch-1-win.patch
        33 kB
        Ivan Mitic
      2. HADOOP-8414-branch-1-win.patch
        34 kB
        Ivan Mitic
      3. HADOOP-8414-branch-1-win(2).patch
        32 kB
        Ivan Mitic
      4. HADOOP-8414-branch-1-win(3).patch
        31 kB
        Ivan Mitic

        Issue Links

          Activity

          Hide
          Sanjay Radia added a comment -

          Thanks Ivan, committed to branch-1 windows.

          Show
          Sanjay Radia added a comment - Thanks Ivan, committed to branch-1 windows.
          Hide
          Daryn Sharp added a comment -

          +1

          Show
          Daryn Sharp added a comment - +1
          Hide
          Ivan Mitic added a comment -

          Great to hear that the latest patch looks good.

          Are you sure there isn't an existing expansion of NAMENODE in the output? I thought I added that functionality last year...?

          I could not find this in branch-1. Just to make sure, I commented out the expansion I added and the test fails. Let me know if you have any additional comments.

          Thanks!

          Show
          Ivan Mitic added a comment - Great to hear that the latest patch looks good. Are you sure there isn't an existing expansion of NAMENODE in the output? I thought I added that functionality last year...? I could not find this in branch-1. Just to make sure, I commented out the expansion I added and the test fails. Let me know if you have any additional comments. Thanks!
          Hide
          Daryn Sharp added a comment -

          Ah, yes, there was a disconnect. I think the latest patch looks good. Are you sure there isn't an existing expansion of NAMENODE in the output? I thought I added that functionality last year...?

          Show
          Daryn Sharp added a comment - Ah, yes, there was a disconnect. I think the latest patch looks good. Are you sure there isn't an existing expansion of NAMENODE in the output? I thought I added that functionality last year...?
          Hide
          Ivan Mitic added a comment -

          Thanks for commenting Daryn, it seems that there was a disconnect then

          On Windows, TestCLI#namenode string resolves to hdfs://127.0.0.1:XXXX, and from this point on, all inputs to FsShell that have NAMENODE in it, will be expanded with hdfs://127.0.0.1:XXXX, causing the comparison to fail because the expected output is something with hdfs://localhost:XXXX. The output is actually equivalent to input on Windows.

          Attaching the new patch . Now the expected output will always be in line with what is being sent as an input. Hope it looks better now!

          Show
          Ivan Mitic added a comment - Thanks for commenting Daryn, it seems that there was a disconnect then On Windows, TestCLI#namenode string resolves to hdfs://127.0.0.1:XXXX , and from this point on, all inputs to FsShell that have NAMENODE in it, will be expanded with hdfs://127.0.0.1:XXXX , causing the comparison to fail because the expected output is something with hdfs://localhost:XXXX . The output is actually equivalent to input on Windows. Attaching the new patch . Now the expected output will always be in line with what is being sent as an input. Hope it looks better now!
          Hide
          Daryn Sharp added a comment -

          Maybe the details of this jira got paged out to faulty media... My understanding is the test is passing in "localhost" uris, but windows mangles them to contain the IP address which causes output != input? Reverse dns isn't supposed to be occurring at all, so windows appears to not "remember" the given hostname used to instantiate the InetAddress/InetSocketAddress. If true, I don't see how using the NAMENODE var would help?

          Show
          Daryn Sharp added a comment - Maybe the details of this jira got paged out to faulty media... My understanding is the test is passing in "localhost" uris, but windows mangles them to contain the IP address which causes output != input? Reverse dns isn't supposed to be occurring at all, so windows appears to not "remember" the given hostname used to instantiate the InetAddress / InetSocketAddress . If true, I don't see how using the NAMENODE var would help?
          Hide
          Ivan Mitic added a comment -

          Thanks Daryn, looking forward to your response!

          In the meantime, I was thinking about one possible way to address this, and you let me know if it makes sense. Currently, the test assumes that rDNS will resolve 127.0.0.1 to localhost, what turned out not to be always true. We can change the test result to have a NAMENODE embedded in it, and then, in test code, replace NAMENODE with a constant value that was used as input. Now, the input being localhost or 127.0.0.1, we are verifying that the output is equivalent to input and only input.

          Show
          Ivan Mitic added a comment - Thanks Daryn, looking forward to your response! In the meantime, I was thinking about one possible way to address this, and you let me know if it makes sense. Currently, the test assumes that rDNS will resolve 127.0.0.1 to localhost , what turned out not to be always true. We can change the test result to have a NAMENODE embedded in it, and then, in test code, replace NAMENODE with a constant value that was used as input. Now, the input being localhost or 127.0.0.1 , we are verifying that the output is equivalent to input and only input.
          Hide
          Daryn Sharp added a comment -

          I'm very sorry, I planned to work on this last week but a oozie/proxy-token/hftp blocker came up... I'll get to it by EOW.

          After all the time I spent getting the shell path display correct, I'm reluctant to introduce inconsistent behavior on windows whose test changes may allow a breakage in non-windows to go undetected.

          Show
          Daryn Sharp added a comment - I'm very sorry, I planned to work on this last week but a oozie/proxy-token/hftp blocker came up... I'll get to it by EOW. After all the time I spent getting the shell path display correct, I'm reluctant to introduce inconsistent behavior on windows whose test changes may allow a breakage in non-windows to go undetected.
          Hide
          Ivan Mitic added a comment -

          The patch still allows a different authority to be returned than the one actually provided. If implemented correctly, the test pattern matches won't require any changes. I'll try to take a swag at a small patch that will make unix/windows behavior equivalent.

          Daryn, did you maybe get a chance to look into this? Would it be fine to proceed with this patch as is, and file a separate Jira to track the investigation on possible improvements? Thanks!

          Show
          Ivan Mitic added a comment - The patch still allows a different authority to be returned than the one actually provided. If implemented correctly, the test pattern matches won't require any changes. I'll try to take a swag at a small patch that will make unix/windows behavior equivalent. Daryn, did you maybe get a chance to look into this? Would it be fine to proceed with this patch as is, and file a separate Jira to track the investigation on possible improvements? Thanks!
          Hide
          Daryn Sharp added a comment -

          The patch still allows a different authority to be returned than the one actually provided. If implemented correctly, the test pattern matches won't require any changes. I'll try to take a swag at a small patch that will make unix/windows behavior equivalent.

          Show
          Daryn Sharp added a comment - The patch still allows a different authority to be returned than the one actually provided. If implemented correctly, the test pattern matches won't require any changes. I'll try to take a swag at a small patch that will make unix/windows behavior equivalent.
          Hide
          Ivan Mitic added a comment -

          The FsShell or FileSystem shouldn't be using the canonical host name or even mangling the URI at all unless a change recently slipped past me...

          @Daryn: Thanks for clarifying your concerns. Commands passed in by the test have the following form:

          -fs NAMENODE -du /data15bytes

          where the NAMENODE string gets replaced with the fs.default.name. Per my previous comment, namenode default FS URI resolves to hdfs://127.0.0.1:port on Windows, so from the FsShell perspective, output is equivalent to input.

          Show
          Ivan Mitic added a comment - The FsShell or FileSystem shouldn't be using the canonical host name or even mangling the URI at all unless a change recently slipped past me... @Daryn: Thanks for clarifying your concerns. Commands passed in by the test have the following form: -fs NAMENODE -du /data15bytes where the NAMENODE string gets replaced with the fs.default.name . Per my previous comment, namenode default FS URI resolves to hdfs://127.0.0.1:port on Windows, so from the FsShell perspective, output is equivalent to input.
          Hide
          Ivan Mitic added a comment -

          Attaching updated patch.

          Show
          Ivan Mitic added a comment - Attaching updated patch.
          Hide
          Ivan Mitic added a comment -

          Thanks, I'll take a look at this.

          Btw, I separated the Har file system change into a separate patch HADOOP-8440 in case you want to take a look.

          Show
          Ivan Mitic added a comment - Thanks, I'll take a look at this. Btw, I separated the Har file system change into a separate patch HADOOP-8440 in case you want to take a look.
          Hide
          Daryn Sharp added a comment -

          The best I can think of is to modify NetUtils to do a addStaticResolution with the result of getByName(null).

          Show
          Daryn Sharp added a comment - The best I can think of is to modify NetUtils to do a addStaticResolution with the result of getByName(null) .
          Hide
          Ivan Mitic added a comment -

          Thanks for the pointers Daryn! After looking more into this, I came to the following:

          InetAddress.getByName("127.0.0.1").getHostName() == localhost (on Linux)
          InetAddress.getByName("127.0.0.1").getHostName() == 127.0.0.1 (on Windows)

          Going further, namenode's default filesystem URI is created using the above host name, and this URI is later used for DFS path resolution.

          From looking at the documentation for InetAddress.getHostName(), it is supposed to perform a reverse DNS lookup of the address. For some reason, this does not work well for 127.0.0.1 in Java under Windows. I tried a few other IP addresses, and getByName(IP).getHostName() worked fine.

          Do you maybe have other suggestions/alternatives we can use to address the problem?

          Show
          Ivan Mitic added a comment - Thanks for the pointers Daryn! After looking more into this, I came to the following: InetAddress.getByName("127.0.0.1").getHostName() == localhost (on Linux) InetAddress.getByName("127.0.0.1").getHostName() == 127.0.0.1 (on Windows) Going further, namenode's default filesystem URI is created using the above host name, and this URI is later used for DFS path resolution. From looking at the documentation for InetAddress.getHostName(), it is supposed to perform a reverse DNS lookup of the address. For some reason, this does not work well for 127.0.0.1 in Java under Windows. I tried a few other IP addresses, and getByName(IP).getHostName() worked fine. Do you maybe have other suggestions/alternatives we can use to address the problem?
          Hide
          Daryn Sharp added a comment -

          Hmm, that's odd that getCanonicalHostName is returning an IP. Ignoring that, the other cases all look perfect so there's a grue lurking somewhere. The FsShell or FileSystem shouldn't be using the canonical host name or even mangling the URI at all unless a change recently slipped past me...

          Show
          Daryn Sharp added a comment - Hmm, that's odd that getCanonicalHostName is returning an IP. Ignoring that, the other cases all look perfect so there's a grue lurking somewhere. The FsShell or FileSystem shouldn't be using the canonical host name or even mangling the URI at all unless a change recently slipped past me...
          Hide
          Ivan Mitic added a comment -

          If however #2 is also returning localhost as expected, then it means there's probably something wrong in FsShell or DistributedFileSystem's handling of the URI.

          Oh, sorry for not being clear, below is the output:

          InetAddress.getByName(null).getHostName() == localhost
          InetAddress.getByName(null).getCanonicalHostName() == 127.0.0.1
          InetAddress.getByName("localhost").getHostName() == localhost
          InetAddress.getByName("localhost").getCanonicalHostName() == 127.0.0.1
          new InetSocketAddress("localhost", 8000).getHostName() == localhost

          PS. thanks for the tip.

          Show
          Ivan Mitic added a comment - If however #2 is also returning localhost as expected, then it means there's probably something wrong in FsShell or DistributedFileSystem's handling of the URI. Oh, sorry for not being clear, below is the output: InetAddress.getByName(null).getHostName() == localhost InetAddress.getByName(null).getCanonicalHostName() == 127.0.0.1 InetAddress.getByName("localhost").getHostName() == localhost InetAddress.getByName("localhost").getCanonicalHostName() == 127.0.0.1 new InetSocketAddress("localhost", 8000).getHostName() == localhost PS. thanks for the tip.
          Hide
          Daryn Sharp added a comment -

          Could you please do the same testing with InetAddress.getByName("localhost") to ensure that something else isn't botching the hostname to 127.0.0.1?

          @Daryn: Same result as with null.

          If I understand correctly:

          1. InetAddress.getByName("localhost").getHostName() returns localhost
          2. new InetSocketAddress("localhost", port).getHostName() returns 127.0.0.1??

          If however #2 is also returning localhost as expected, then it means there's probably something wrong in FsShell or DistributedFileSystem's handling of the URI.

          As for host-based tokens, TestSecurityUtil should catch problems, but there's a jira where I have to bring over a slew of more tests from 205 that would be more definitive.

          PS. You can use "bq." or "{quote}" on prior statements. It's a bit easier to read than inline responses. If you click the , there's a lot of simple and useful formatting available.

          Show
          Daryn Sharp added a comment - Could you please do the same testing with InetAddress.getByName("localhost") to ensure that something else isn't botching the hostname to 127.0.0.1? @Daryn: Same result as with null. If I understand correctly: InetAddress.getByName("localhost").getHostName() returns localhost new InetSocketAddress("localhost", port).getHostName() returns 127.0.0.1 ?? If however #2 is also returning localhost as expected, then it means there's probably something wrong in FsShell or DistributedFileSystem 's handling of the URI. As for host-based tokens, TestSecurityUtil should catch problems, but there's a jira where I have to bring over a slew of more tests from 205 that would be more definitive. PS. You can use "bq." or "{quote}" on prior statements. It's a bit easier to read than inline responses. If you click the , there's a lot of simple and useful formatting available.
          Hide
          Ivan Mitic added a comment -

          Thanks Daryn, some answers/comments inline.

          Could you please do the same testing with InetAddress.getByName("localhost") to ensure that something else isn't botching the hostname to 127.0.0.1?
          @Daryn: Same result as with null.

          The HarFileSystem change is probably worthy of its own jira now since it's generally useful and should be able to stand alone w/o the other changes in this jira
          @Daryn: Sure.

          I forgot to mention that if the discrepancy with window's resolution of localhost is likely to break host-based tokens (hadoop.security.token.service.use_ip=false). We'll need to be sure to test host-based tokens if the discrepancy is real and can't be fixed.
          @Daryn: I see, thanks for bringing this up. Do you maybe know of a test that can catch this?

          Show
          Ivan Mitic added a comment - Thanks Daryn, some answers/comments inline. Could you please do the same testing with InetAddress.getByName("localhost") to ensure that something else isn't botching the hostname to 127.0.0.1? @Daryn: Same result as with null. The HarFileSystem change is probably worthy of its own jira now since it's generally useful and should be able to stand alone w/o the other changes in this jira @Daryn: Sure. I forgot to mention that if the discrepancy with window's resolution of localhost is likely to break host-based tokens (hadoop.security.token.service.use_ip=false). We'll need to be sure to test host-based tokens if the discrepancy is real and can't be fixed. @Daryn: I see, thanks for bringing this up. Do you maybe know of a test that can catch this?
          Hide
          Daryn Sharp added a comment -

          The HarFileSystem change is probably worthy of its own jira now since it's generally useful and should be able to stand alone w/o the other changes in this jira. This jira would then just depend on the new one since it indirectly addresses issues for this jira.

          I forgot to mention that if the discrepancy with window's resolution of localhost is likely to break host-based tokens (hadoop.security.token.service.use_ip=false). We'll need to be sure to test host-based tokens if the discrepancy is real and can't be fixed.

          Show
          Daryn Sharp added a comment - The HarFileSystem change is probably worthy of its own jira now since it's generally useful and should be able to stand alone w/o the other changes in this jira. This jira would then just depend on the new one since it indirectly addresses issues for this jira. I forgot to mention that if the discrepancy with window's resolution of localhost is likely to break host-based tokens ( hadoop.security.token.service.use_ip=false ). We'll need to be sure to test host-based tokens if the discrepancy is real and can't be fixed.
          Hide
          Daryn Sharp added a comment -

          For InetAddress.getByName(), the two actually return different values, "localhost" and "127.0.0.1", respectively.

          Good, that is exactly what I would expect! However this makes it even more odd that new InetSocketAddress("localhost", port).getHostName() on windows will return 127.0.0.1 instead of localhost... The InetSocketAddress ctor is supposed to call InetAddress.getByName(host) and then InetSocketAddress#getHostName() delegates to InetAddress#getHostName(). I would expect InetAddress.getByName(hostname) to behave similarly for localhost and null as the given hostname.

          Could you please do the same testing with InetAddress.getByName("localhost") to ensure that something else isn't botching the hostname to 127.0.0.1?

          Show
          Daryn Sharp added a comment - For InetAddress.getByName(), the two actually return different values, "localhost" and "127.0.0.1", respectively. Good, that is exactly what I would expect! However this makes it even more odd that new InetSocketAddress("localhost", port).getHostName() on windows will return 127.0.0.1 instead of localhost ... The InetSocketAddress ctor is supposed to call InetAddress.getByName(host) and then InetSocketAddress#getHostName() delegates to InetAddress#getHostName() . I would expect InetAddress.getByName(hostname) to behave similarly for localhost and null as the given hostname. Could you please do the same testing with InetAddress.getByName("localhost") to ensure that something else isn't botching the hostname to 127.0.0.1 ?
          Hide
          Ivan Mitic added a comment -

          Attaching updated patch.

          Show
          Ivan Mitic added a comment - Attaching updated patch.
          Hide
          Ivan Mitic added a comment -

          Thanks Daryn!

          For InetAddress.getByName(), the two actually return different values, "localhost" and "127.0.0.1", respectively.

          Thanks for clarifying the replaceFirst("-", "://") part. I moved to this model and indeed it is simpler. Will submit an updated patch soon.

          Show
          Ivan Mitic added a comment - Thanks Daryn! For InetAddress.getByName(), the two actually return different values, "localhost" and "127.0.0.1", respectively. Thanks for clarifying the replaceFirst("-", "://") part. I moved to this model and indeed it is simpler. Will submit an updated patch soon.
          Hide
          Daryn Sharp added a comment -

          Odd, for my known knowledge of quirks, does InetAddress.getByName(null).getHostName() and InetAddress.getByName(null).getCanonicalHostName() also return just 127.0.0.1? This is a bit dismaying, because the tests are trying to verify that the output contains the exact path(s) as given on the cmdline. I spent quite a bit of time ensuring one can parse the output for a given path. Allowing either localhost or 127.0.0.1 to match means the tests won't catch if something mangles the host for non-windows platforms. I'm not sure how to deal with that...

          The majority of the code mangling the authority into another uri, that you had to touch, is basically trying to convert scheme-host:port into scheme://host:port. Instead of all the null and -1 checks to rebuild the uri, it seems it would be simpler to just use replaceFirst. It also avoids introducing more ipv6 incompatibilities caused by a naive split on ":".

          Show
          Daryn Sharp added a comment - Odd, for my known knowledge of quirks, does InetAddress.getByName(null).getHostName() and InetAddress.getByName(null).getCanonicalHostName() also return just 127.0.0.1? This is a bit dismaying, because the tests are trying to verify that the output contains the exact path(s) as given on the cmdline. I spent quite a bit of time ensuring one can parse the output for a given path. Allowing either localhost or 127.0.0.1 to match means the tests won't catch if something mangles the host for non-windows platforms. I'm not sure how to deal with that... The majority of the code mangling the authority into another uri, that you had to touch, is basically trying to convert scheme-host:port into scheme://host:port . Instead of all the null and -1 checks to rebuild the uri, it seems it would be simpler to just use replaceFirst . It also avoids introducing more ipv6 incompatibilities caused by a naive split on ":".
          Hide
          Ivan Mitic added a comment -

          Thanks for the feedback Daryn. Answers below.

          Daryn:I'm not sure I understand the issue. Are you saying that on Windows creating an InetSocketAddress from "localhost" will return 127.0.0.1 as the hostname?
          Ivan: Correct.

          Daryn:Also, isn't it a bit redundant to check both host and authority? Why not just authority?
          Ivan: Agree, will fix this.

          Not sure I understand your proposal around replaceFirst("-", "://")). Can you please elaborate?

          Show
          Ivan Mitic added a comment - Thanks for the feedback Daryn. Answers below. Daryn:I'm not sure I understand the issue. Are you saying that on Windows creating an InetSocketAddress from "localhost" will return 127.0.0.1 as the hostname? Ivan: Correct. Daryn:Also, isn't it a bit redundant to check both host and authority? Why not just authority? Ivan: Agree, will fix this. Not sure I understand your proposal around replaceFirst("-", "://")). Can you please elaborate?
          Hide
          Daryn Sharp added a comment -

          Also, couldn't a lot of that logic be obliterated by doing something like:

          URI.create(rawUri.getAuthority().replaceFirst("-", "://"))
          Show
          Daryn Sharp added a comment - Also, couldn't a lot of that logic be obliterated by doing something like: URI.create(rawUri.getAuthority().replaceFirst( "-" , ": //" ))
          Hide
          Daryn Sharp added a comment -

          I'm not sure I understand the issue. Are you saying that on Windows creating an InetSocketAddress from "localhost" will return 127.0.0.1 as the hostname?

          Also, isn't it a bit redundant to check both host and authority? Why not just authority?

          if (uri.getHost() == null && uri.getAuthority() == null)
          Show
          Daryn Sharp added a comment - I'm not sure I understand the issue. Are you saying that on Windows creating an InetSocketAddress from "localhost" will return 127.0.0.1 as the hostname? Also, isn't it a bit redundant to check both host and authority? Why not just authority? if (uri.getHost() == null && uri.getAuthority() == null )
          Hide
          Bikas Saha added a comment -

          +1 looks good

          Show
          Bikas Saha added a comment - +1 looks good
          Hide
          Ivan Mitic added a comment -

          Attaching patch.

          Show
          Ivan Mitic added a comment - Attaching patch.

            People

            • Assignee:
              Ivan Mitic
              Reporter:
              Ivan Mitic
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development