Solr
  1. Solr
  2. SOLR-5241

SimplePostToolTest is slow on some systmes - likely due to hostname resolution of "example.com"

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.6, Trunk
    • Component/s: None
    • Labels:
      None

      Description

      As noted by Shai on the dev @lucene list, SimplePostToolTest is ridiculously slow when he ran from ant, but only takes 1 second in his IDE.

      problem seems to be relate to the URL class attempting to response "example.com"

      1. SOLR-5241.patch
        10 kB
        Hoss Man
      2. SOLR-5241.patch
        10 kB
        Hoss Man

        Activity

        Hide
        Hoss Man added a comment -

        patch switching all usage of example.com to a lookback IP.

        this makes the entire test class take 2 seconds on my system (as 6 minutes before this)

        Show
        Hoss Man added a comment - patch switching all usage of example.com to a lookback IP. this makes the entire test class take 2 seconds on my system (as 6 minutes before this)
        Hide
        Robert Muir added a comment -

        Won't this still be an issue for jenkins runs because even loopback addresses are blackholed?

        Show
        Robert Muir added a comment - Won't this still be an issue for jenkins runs because even loopback addresses are blackholed?
        Hide
        Dawid Weiss added a comment -

        Yeah... maybe that fake:// protocol handler is actually a sensible idea for such tests.

        Show
        Dawid Weiss added a comment - Yeah... maybe that fake:// protocol handler is actually a sensible idea for such tests.
        Hide
        Hoss Man added a comment -

        I don't think so, but maybe i don't fully understand what the FreeBSD blackhole does.

        The test never attempts to open any sockets to these URL objects – the problem so far (that i can see) is just that by nature of being java.net.URL, there is a DNS check when the equals/hashCode methods get used and that seems to be the speed problem when the urls contain "example.com" ... so i figured using a "safe" IP would prevent that...

        http://www.eishay.com/2008/04/javas-url-little-secret.html

        is there any reason the freebsd blackhole would affect dns lookups on "127.42.42.42" even if the URL class did decide to try to "resolve" that IP as a hostname (i don't think it does) ?

        Show
        Hoss Man added a comment - I don't think so, but maybe i don't fully understand what the FreeBSD blackhole does. The test never attempts to open any sockets to these URL objects – the problem so far (that i can see) is just that by nature of being java.net.URL, there is a DNS check when the equals/hashCode methods get used and that seems to be the speed problem when the urls contain "example.com" ... so i figured using a "safe" IP would prevent that... http://www.eishay.com/2008/04/javas-url-little-secret.html is there any reason the freebsd blackhole would affect dns lookups on "127.42.42.42" even if the URL class did decide to try to "resolve" that IP as a hostname (i don't think it does) ?
        Hide
        Hoss Man added a comment -

        maybe that fake:// protocol handler is actually a sensible idea for such tests.

        I'm not sure, but i think the URL class would still try to resolve the hostname portion of the URL, even if we registered our own fake protocol.

        I really don't see how the blackhole could affect this even if it did block dns lookups, since no lookup should ever happen with an ip specified, but if it does then i think the whole test just needs re-written not to use the URL class.

        Show
        Hoss Man added a comment - maybe that fake:// protocol handler is actually a sensible idea for such tests. I'm not sure, but i think the URL class would still try to resolve the hostname portion of the URL, even if we registered our own fake protocol. — I really don't see how the blackhole could affect this even if it did block dns lookups, since no lookup should ever happen with an ip specified, but if it does then i think the whole test just needs re-written not to use the URL class.
        Hide
        Dawid Weiss added a comment -

        I'd have to check, I don't remember. Looking at this it seems you could write a custom parsing routine –
        http://docs.oracle.com/javase/7/docs/api/java/net/URLStreamHandler.html#parseURL(java.net.URL, java.lang.String, int, int)

        but this may be an overkill.

        Show
        Dawid Weiss added a comment - I'd have to check, I don't remember. Looking at this it seems you could write a custom parsing routine – http://docs.oracle.com/javase/7/docs/api/java/net/URLStreamHandler.html#parseURL(java.net.URL , java.lang.String, int, int) but this may be an overkill.
        Hide
        Robert Muir added a comment -

        why is it trying to resolve the host? Is it so that it can then try to connect to it and the test expects that this will fail?

        Its the latter part that will cause the issue: use ips like [ff01::114] instead.

        Show
        Robert Muir added a comment - why is it trying to resolve the host? Is it so that it can then try to connect to it and the test expects that this will fail? Its the latter part that will cause the issue: use ips like [ff01::114] instead.
        Hide
        Robert Muir added a comment -

        and if you really just need a URL, why not use file://....

        Show
        Robert Muir added a comment - and if you really just need a URL, why not use file:// ....
        Hide
        Shai Erera added a comment -

        With this patch, the test runs for 1.8-3.2s (varies, but much faster than before).

        Show
        Shai Erera added a comment - With this patch, the test runs for 1.8-3.2s (varies, but much faster than before).
        Hide
        Hoss Man added a comment -

        FYI: today is the first time i've looked at this test before, so take all of my comments with a grain of salt...

        why is it trying to resolve the host? Is it so that it can then try to connect to it and the test expects that this will fail?

        From what i can tell, nothing in the test, that i can see, is trying to resolve the hostname or IP or connect to any of these URLs. A "MockPageFetcher" is explicitly plugged into the SimplePostTool when used in the test to mock out the HTTP communication to prevent this.

        The problem (again: as far as i can tell) is entirely because the tests (and underlying code in SimplePostTool) use the java.net.URL class, which can/will attempt hostname resolution of DNS names used in URLs under the covers in some cases – notable anytime the URL.equals() or URL.hashCode methods are called.

        use ips like [ff01::114] instead.

        I've got no problem doing that if you think it makes a diff – but just so i understand: can you explain why that is better then 127.42.42.42 ?

        if you really just need a URL, why not use file://....

        That might work, but the SimplePostTool actually supports diff options for dealing with local files vs remote web urls, and the test's MockPageFetcher actually simulates servers that response with diff HTTP status codes, so i'm not sure if using "file://" will work and/or test the same things.


        attaching an updated patch using "[ff01::114]" instead of "127.42.42.42" per rmuir's request.

        Shai Erera: does this still run "fast" for you?

        any objections from anyone to be committing this?

        Show
        Hoss Man added a comment - FYI: today is the first time i've looked at this test before, so take all of my comments with a grain of salt... why is it trying to resolve the host? Is it so that it can then try to connect to it and the test expects that this will fail? From what i can tell, nothing in the test, that i can see, is trying to resolve the hostname or IP or connect to any of these URLs. A "MockPageFetcher" is explicitly plugged into the SimplePostTool when used in the test to mock out the HTTP communication to prevent this. The problem (again: as far as i can tell) is entirely because the tests (and underlying code in SimplePostTool) use the java.net.URL class, which can/will attempt hostname resolution of DNS names used in URLs under the covers in some cases – notable anytime the URL.equals() or URL.hashCode methods are called. use ips like [ff01::114] instead. I've got no problem doing that if you think it makes a diff – but just so i understand: can you explain why that is better then 127.42.42.42 ? if you really just need a URL, why not use file:// .... That might work, but the SimplePostTool actually supports diff options for dealing with local files vs remote web urls, and the test's MockPageFetcher actually simulates servers that response with diff HTTP status codes, so i'm not sure if using "file://" will work and/or test the same things. attaching an updated patch using " [ff01::114] " instead of "127.42.42.42" per rmuir's request. Shai Erera : does this still run "fast" for you? any objections from anyone to be committing this?
        Hide
        Robert Muir added a comment -

        I've got no problem doing that if you think it makes a diff – but just so i understand: can you explain why that is better then 127.42.42.42 ?

        its specifically allocated for test purposes and wont be routed if something tries to make a connection, and will fail fast with protocol not supported, or worst case no route to host... everywhere, not just jenkins.

        in jenkins specifically, tests can never try to connect to an unbound port and expected a connection refused, it will just hang for a huge amount of time until it finally times out.

        Show
        Robert Muir added a comment - I've got no problem doing that if you think it makes a diff – but just so i understand: can you explain why that is better then 127.42.42.42 ? its specifically allocated for test purposes and wont be routed if something tries to make a connection, and will fail fast with protocol not supported, or worst case no route to host... everywhere, not just jenkins. in jenkins specifically, tests can never try to connect to an unbound port and expected a connection refused, it will just hang for a huge amount of time until it finally times out.
        Hide
        Shai Erera added a comment -

        This still runs fast, but a bit slower than with the previous patch: 5s from Ant. However, with this patch (I haven't checked previous patch), Eclipse runs faster than before: 0.1s (vs 1s). I'll try to get to the bottom of it, though let's commit this (or previous) patch, because it's already a huge improvement.

        Show
        Shai Erera added a comment - This still runs fast, but a bit slower than with the previous patch: 5s from Ant. However, with this patch (I haven't checked previous patch), Eclipse runs faster than before: 0.1s (vs 1s). I'll try to get to the bottom of it, though let's commit this (or previous) patch, because it's already a huge improvement.
        Hide
        Shai Erera added a comment -

        Hoss, see my comments on the thread. Seems to be related to our tests.policy security settings. I don't mind if you commit the patch, but maybe if there's a simple solution, we don't need to change the test.

        Show
        Shai Erera added a comment - Hoss, see my comments on the thread. Seems to be related to our tests.policy security settings. I don't mind if you commit the patch, but maybe if there's a simple solution, we don't need to change the test.
        Hide
        Robert Muir added a comment -

        if the test violates the security settings, then SecurityException would be thrown.

        So maybe there is a bug in the solr code hiding/masking exceptions.

        Show
        Robert Muir added a comment - if the test violates the security settings, then SecurityException would be thrown. So maybe there is a bug in the solr code hiding/masking exceptions.
        Hide
        Dawid Weiss added a comment -

        While I appreciate Hoss's solution I think what Robert said – we should get to the bottom of this problem, this difference in runtime is not easily explained and it may lead to a bigger can of worms.

        Show
        Dawid Weiss added a comment - While I appreciate Hoss's solution I think what Robert said – we should get to the bottom of this problem, this difference in runtime is not easily explained and it may lead to a bigger can of worms.
        Hide
        Uwe Schindler added a comment -

        Hi,
        as I said in response to Shai's e-mail: The problem may indeed be related to the security policy. To check if a connection is allowed, the security manager has to resolve dns. As this gets a not found or whatever error, it will not throw a security exception. This also explains why there is a runtime env difference: With Eclipse the security manager is not used, with ANT it is.
        The fix should be not to use hostnames for invalid URLs and use the same strategy like in Solr Cloud tests: Use a non-routeable IPv6 address. To me the problem is exactly what Robert and I thought is the reason.

        See BaseDistributedTestCase:

            this.deadServers = new String[] {"[ff01::114]:33332" + context, 
                                             "[ff01::083]:33332" + context, 
                                             "[ff01::213]:33332" + context};
        

        Please use these URLs and the problem should be gone. Real DNS names out of our control are bad to emulate failures. What happens if soebody links example.com to real IP adresses?

        Show
        Uwe Schindler added a comment - Hi, as I said in response to Shai's e-mail: The problem may indeed be related to the security policy. To check if a connection is allowed, the security manager has to resolve dns. As this gets a not found or whatever error, it will not throw a security exception. This also explains why there is a runtime env difference: With Eclipse the security manager is not used, with ANT it is. The fix should be not to use hostnames for invalid URLs and use the same strategy like in Solr Cloud tests: Use a non-routeable IPv6 address. To me the problem is exactly what Robert and I thought is the reason. See BaseDistributedTestCase: this .deadServers = new String [] { "[ff01::114]:33332" + context, "[ff01::083]:33332" + context, "[ff01::213]:33332" + context}; Please use these URLs and the problem should be gone. Real DNS names out of our control are bad to emulate failures. What happens if soebody links example.com to real IP adresses?
        Hide
        Uwe Schindler added a comment -

        From analyzing the test:
        The main probelm is just the DNS resolve. We can really use any IP (IPv6 or IPv4) here, because the stream handler never actually connects to anywhere. Theoretically we could also use 127.0.0.1, the blackhole is not related here, because it just looks up hostnames. The actual connection is prevented by the stream handler. So finally we can use any numeric IP, just no host name.

        Show
        Uwe Schindler added a comment - From analyzing the test: The main probelm is just the DNS resolve. We can really use any IP (IPv6 or IPv4) here, because the stream handler never actually connects to anywhere. Theoretically we could also use 127.0.0.1, the blackhole is not related here, because it just looks up hostnames. The actual connection is prevented by the stream handler. So finally we can use any numeric IP, just no host name.
        Hide
        Dawid Weiss added a comment -

        Thanks Uwe.

        Show
        Dawid Weiss added a comment - Thanks Uwe.
        Hide
        ASF subversion and git services added a comment -

        Commit 1523725 from hossman@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1523725 ]

        SOLR-5241: Fix SimplePostToolTest performance problem - implicit DNS lookups

        Show
        ASF subversion and git services added a comment - Commit 1523725 from hossman@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1523725 ] SOLR-5241 : Fix SimplePostToolTest performance problem - implicit DNS lookups
        Hide
        ASF subversion and git services added a comment -

        Commit 1523726 from hossman@apache.org in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1523726 ]

        SOLR-5241: Fix SimplePostToolTest performance problem - implicit DNS lookups (merge r1523725)

        Show
        ASF subversion and git services added a comment - Commit 1523726 from hossman@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1523726 ] SOLR-5241 : Fix SimplePostToolTest performance problem - implicit DNS lookups (merge r1523725)
        Hide
        Hoss Man added a comment -

        ... Theoretically we could also use 127.0.0.1, the blackhole is not related here, because it just looks up hostnames. ...

        we could, this was my point earlier when i asked rmuir why "[ff01::114]" was better - since we're never opening a socket i didn't understand the diff.

        now that i do understand the diff however, i definitely think "[ff01::114]" is better – not because of anything in the test now, but because it helps protect us from the risk of someone working on the test in the future and accidentally changing something so that it does start trying to open sockets.

        so i've committed the most recent patch as is.

        Thanks everybody for your help.

        Show
        Hoss Man added a comment - ... Theoretically we could also use 127.0.0.1, the blackhole is not related here, because it just looks up hostnames. ... we could, this was my point earlier when i asked rmuir why " [ff01::114] " was better - since we're never opening a socket i didn't understand the diff. now that i do understand the diff however, i definitely think " [ff01::114] " is better – not because of anything in the test now, but because it helps protect us from the risk of someone working on the test in the future and accidentally changing something so that it does start trying to open sockets. so i've committed the most recent patch as is. Thanks everybody for your help.

          People

          • Assignee:
            Hoss Man
            Reporter:
            Hoss Man
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development