Solr
  1. Solr
  2. SOLR-4136

SolrCloud bugs when servlet context contains "/" or "_"

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.0
    • Fix Version/s: 4.1, 6.0
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      SolrCloud does not work properly with non-trivial values for "hostContext" (ie: the servlet context path). In particular...

      • Using a hostContext containing a "/" (ie: a servlet context with a subdir path, semi-common among people who organize webapps hierarchically for lod blanacer rules) is explicitly forbidden in ZkController because of how the hostContext is used to build a ZK nodeName
      • Using a hostContext containing a "_" causes problems in OverseerCollectionProcessor where it assumes all "_" characters should be converted to "/" to reconstitute a URL from nodeName (NOTE: this code specifically has a TODO to fix this, and then has a subsequent TODO about assuming "http://" labeled "this sucks")
      1. SOLR-4136.patch
        19 kB
        Hoss Man
      2. SOLR-4136.patch
        18 kB
        Hoss Man
      3. SOLR-4136.patch
        18 kB
        Hoss Man
      4. SOLR-4136.patch
        18 kB
        Hoss Man
      5. SOLR-4136.patch
        7 kB
        Hoss Man

        Issue Links

          Activity

          Hide
          Hoss Man added a comment -

          Context...

          Mark's suggestion in that email regarding my original question (about prohibiting "/" in nodeNames) was that zkcontroller should replace them "/" with "_" – but that would cause potential collisions between contexts like "/foo/solr" and "/foo_solr", so i think using something like URLEncoding makes more sense (and shouldn't impact existing ZK cluster state data for most existing users)

          The attached patch enhances the test base classes to allow for randomized hostContext values, and then uses this URLEncoding logic in ZKController to build nodeNames – and in most cases seems to work. But thinking about "_" in paths got me paranoid about explicitly testing that which is how I discovered the crufty logic in OverseerCollectionProcessor. (NOTE: you can see the obvious OverseerCollectionProcessor errors trying to talk to the wrong URL in the test logs, and they seem to explain the subsequent test failure message, but it's also possible there is a subsequent problem i haven't noticed yet)

          I haven't dug into this part of the code/problem very much yet, but i think the right fix here is to clean this up this code so that intead of making assumptions about the node name, is uses the clusterstate to lookup the base_url from the nodeName.

          Logged error (repeated for multiple shards)...

          [junit4:junit4]   2> 204647 T33 oasc.SolrException.log SEVERE Collection createcollection of awholynewcollection_1 failed
          [junit4:junit4]   2> 204686 T31 oasc.DistributedQueue$LatchChildWatcher.process Watcher fired on path: /overseer/collection-queue-work state: SyncConnected type NodeChildrenChanged
          [junit4:junit4]   2> 204688 T33 oasc.OverseerCollectionProcessor.createCollection Create collection awholynewcollection_2 on [127.0.0.1:57855_randctxmqvf%2F_ay, 127.0.0.1:37463_randctxmqvf%2F_ay]
          [junit4:junit4]   2> 204691 T33 oasc.OverseerCollectionProcessor.createCollection SEVERE Error talking to shard: 127.0.0.1:37463/randctxmqvf%2F/ay org.apache.solr.common.SolrException: Server at http://127.0.0.1:37463/randctxmqvf%2F/ay returned non ok status:404, message:Not Found
          [junit4:junit4]   2> 	at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
          [junit4:junit4]   2> 	at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
          [junit4:junit4]   2> 	at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:166)
          [junit4:junit4]   2> 	at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133)
          [junit4:junit4]   2> 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
          

          Final test failure message...

             <testcase classname="org.apache.solr.cloud.BasicDistributedZkTest" name="testDistribSearch" time="268.833">
                <failure message="Could not find new 2 slice collection called awholynewcollection_0" type="java.lang.AssertionError">java.lang.AssertionError: Could not find new 2 slice collection called awholynewcollection_0
                  at __randomizedtesting.SeedInfo.seed([1BD856523B97C07C:9A3ED84A4CC8A040]:0)
                  at org.junit.Assert.fail(Assert.java:93)
                  at org.apache.solr.cloud.BasicDistributedZkTest.checkForCollection(BasicDistributedZkTest.java:1053)
                  at org.apache.solr.cloud.BasicDistributedZkTest.testCollectionsAPI(BasicDistributedZkTest.java:768)
                  at org.apache.solr.cloud.BasicDistributedZkTest.doTest(BasicDistributedZkTest.java:361)
                  at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:712)
          
          Show
          Hoss Man added a comment - Context... http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201211.mbox/%3Calpine.DEB.2.02.1211292004430.2543@frisbee%3E http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201211.mbox/%3C551C5E62-0520-42A2-BF71-165FDA3608ED@gmail.com%3E Mark's suggestion in that email regarding my original question (about prohibiting "/" in nodeNames) was that zkcontroller should replace them "/" with "_" – but that would cause potential collisions between contexts like "/foo/solr" and "/foo_solr", so i think using something like URLEncoding makes more sense (and shouldn't impact existing ZK cluster state data for most existing users) The attached patch enhances the test base classes to allow for randomized hostContext values, and then uses this URLEncoding logic in ZKController to build nodeNames – and in most cases seems to work. But thinking about "_" in paths got me paranoid about explicitly testing that which is how I discovered the crufty logic in OverseerCollectionProcessor. (NOTE: you can see the obvious OverseerCollectionProcessor errors trying to talk to the wrong URL in the test logs, and they seem to explain the subsequent test failure message, but it's also possible there is a subsequent problem i haven't noticed yet) I haven't dug into this part of the code/problem very much yet, but i think the right fix here is to clean this up this code so that intead of making assumptions about the node name, is uses the clusterstate to lookup the base_url from the nodeName. Logged error (repeated for multiple shards)... [junit4:junit4] 2> 204647 T33 oasc.SolrException.log SEVERE Collection createcollection of awholynewcollection_1 failed [junit4:junit4] 2> 204686 T31 oasc.DistributedQueue$LatchChildWatcher.process Watcher fired on path: /overseer/collection-queue-work state: SyncConnected type NodeChildrenChanged [junit4:junit4] 2> 204688 T33 oasc.OverseerCollectionProcessor.createCollection Create collection awholynewcollection_2 on [127.0.0.1:57855_randctxmqvf%2F_ay, 127.0.0.1:37463_randctxmqvf%2F_ay] [junit4:junit4] 2> 204691 T33 oasc.OverseerCollectionProcessor.createCollection SEVERE Error talking to shard: 127.0.0.1:37463/randctxmqvf%2F/ay org.apache.solr.common.SolrException: Server at http://127.0.0.1:37463/randctxmqvf%2F/ay returned non ok status:404, message:Not Found [junit4:junit4] 2> at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372) [junit4:junit4] 2> at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) [junit4:junit4] 2> at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:166) [junit4:junit4] 2> at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133) [junit4:junit4] 2> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) Final test failure message... <testcase classname="org.apache.solr.cloud.BasicDistributedZkTest" name="testDistribSearch" time="268.833"> <failure message="Could not find new 2 slice collection called awholynewcollection_0" type="java.lang.AssertionError">java.lang.AssertionError: Could not find new 2 slice collection called awholynewcollection_0 at __randomizedtesting.SeedInfo.seed([1BD856523B97C07C:9A3ED84A4CC8A040]:0) at org.junit.Assert.fail(Assert.java:93) at org.apache.solr.cloud.BasicDistributedZkTest.checkForCollection(BasicDistributedZkTest.java:1053) at org.apache.solr.cloud.BasicDistributedZkTest.testCollectionsAPI(BasicDistributedZkTest.java:768) at org.apache.solr.cloud.BasicDistributedZkTest.doTest(BasicDistributedZkTest.java:361) at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:712)
          Hide
          Hoss Man added a comment -

          Been poking around the SolrCloud/zk code ... fun times.

          From what i can tell, we don't record anywhere in zookeeper the mapping of "nodeName" -> "baseURL" for the various solr nodes in a solr cloud cluster. We do seem evidently record the baseUrl associated with a nodeName in the info about each replica – but that information is per collection & shard, so as is it doesn't really help in the general case of the bad code in OverseerCollectionProcessor.

          Three options occur to me...

          1) We could consider adding these mappins to ZK as 1st order info. possibly by adding some data to the ephemeral "liveNodes" path for each node, so code like OverseerCollectionProcessor could just ask for the data of each liveNode to know it's baseUrl ... but i'm not sure how far down that rabithole we want to go (i don't really know the performance characteristics of ZK enough to know if it's a good idea to have code doing lots of those kinds of lookups ad-hoc)

          2) we could cheat: we could add something like this to ClusterState...

          private final Map<String,String> baseUrls;
          public String getBaseUrl(final String nodeName);
          

          ...and populate the baseUrls Map in the constructor based on the properties found when looping over every collections->slice->replica. The only question is what to do if/when two diff collections/slice/replica in the clusterstate disagree about the baseUrl? (assertion failed?)

          3) We could improve the kludge to be a bit less kludgy: OverseerCollectionProcessor (and possibly other places) currently assume that a baseUrl can be computed from a nodeName by replacing all "_" with "/" – if we change that substitution to only apply to the first "_" in the nodeName, and combine it with some URL decoding on the "hostContext" portion of the nodeName (to match my suggested improvement in theprevious patch) i think we would have a fairly safe way of bi-directinally converting nodeName<->URL regardless of what's in the hostContext - because hostnames and ports can't ever have "_" in them. (this wouldn't address the "http://" kludge, but that assumption seems to be more pervasive - we can fight that battle another day)

          Option #3 seems the invasive for now, so unless mark/yonik/sami/somebody chimes in with more encouragment to go down one of the other routes, i'll take a stab at #3 and see what other problems i encounter.

          Show
          Hoss Man added a comment - Been poking around the SolrCloud/zk code ... fun times. From what i can tell, we don't record anywhere in zookeeper the mapping of "nodeName" -> "baseURL" for the various solr nodes in a solr cloud cluster. We do seem evidently record the baseUrl associated with a nodeName in the info about each replica – but that information is per collection & shard, so as is it doesn't really help in the general case of the bad code in OverseerCollectionProcessor. Three options occur to me... 1) We could consider adding these mappins to ZK as 1st order info. possibly by adding some data to the ephemeral "liveNodes" path for each node, so code like OverseerCollectionProcessor could just ask for the data of each liveNode to know it's baseUrl ... but i'm not sure how far down that rabithole we want to go (i don't really know the performance characteristics of ZK enough to know if it's a good idea to have code doing lots of those kinds of lookups ad-hoc) 2) we could cheat: we could add something like this to ClusterState... private final Map< String , String > baseUrls; public String getBaseUrl( final String nodeName); ...and populate the baseUrls Map in the constructor based on the properties found when looping over every collections->slice->replica. The only question is what to do if/when two diff collections/slice/replica in the clusterstate disagree about the baseUrl? (assertion failed?) 3) We could improve the kludge to be a bit less kludgy: OverseerCollectionProcessor (and possibly other places) currently assume that a baseUrl can be computed from a nodeName by replacing all "_" with "/" – if we change that substitution to only apply to the first "_" in the nodeName, and combine it with some URL decoding on the "hostContext" portion of the nodeName (to match my suggested improvement in theprevious patch) i think we would have a fairly safe way of bi-directinally converting nodeName<->URL regardless of what's in the hostContext - because hostnames and ports can't ever have "_" in them. (this wouldn't address the "http://" kludge, but that assumption seems to be more pervasive - we can fight that battle another day) – Option #3 seems the invasive for now, so unless mark/yonik/sami/somebody chimes in with more encouragment to go down one of the other routes, i'll take a stab at #3 and see what other problems i encounter.
          Hide
          Mark Miller added a comment -

          Yeah, I've looked at this in the past because we have problems with _ in the url's the way things are now (there are comments to that affect). Never got to adding the right info to ZK yet though. I don't remember thinking of a silver bullet that seemed like the clear way to go.

          Take a stab, I'll try and catch up soon.

          Show
          Mark Miller added a comment - Yeah, I've looked at this in the past because we have problems with _ in the url's the way things are now (there are comments to that affect). Never got to adding the right info to ZK yet though. I don't remember thinking of a silver bullet that seemed like the clear way to go. Take a stab, I'll try and catch up soon.
          Hide
          Hoss Man added a comment -

          New patch...

          1) implements the "less kludgy-kludge" idea i mentioned above. I put the new public method for getting the baseURL of a nodeName in SolrZkClient in case we want to change it to be a live lookup from ZK at some point (and marked it @lucene.experimental in case we want completely move/remove it at some point)

          2) improved ZkController to allow both leading and trailing "/" in the hostContext – both of which get striped off after the "empty string defaults to 'solr'" logic, and before the generation of the nodeName. As as result, specifying hostContext="/" in solr.xml is now a viable way to tell solr it's running in the root context. (In general i think this makes things easier to use, but

          3) enhanced the hostContext randomization logic in BaseDistributedSearchTestCase so that it randomly uses: the root context (eg: "/"), one level paths (eg: "/as"), two level paths (eg: "/s/bf"), and, sometimes including an "" character in a path (eg: "/as", "/s_d", "/_d/bf", etc..)

          4) Changed BaseDistributedSearchTestCase to us the context path randomization logic by default. This is something i had left out of my previous patch because i wa worried about users who might have their own tests extending BaseDistributedSearchTestCase that were then making assumptions about the context being "/solr" – but i decided having more test coverage using these random context paths was more important – we can include an upgrade note along the lines of...

          BaseDistributedSearchTestCase now randomizes the servlet context it uses when 
          creating Jetty instances.  Subclasses that assume a hard coded context of 
          "/solr" should either be fixed to use the "String context" variable, or should 
          take advantage of the new BaseDistributedSearchTestCase(String) constructor
          to explicitly specify a fixed servlet context path.
          

          5) Fixed test bugs exposed by 3 & 4 above.

          Mark: would really appreciate it if you could take this bad-boy for a spin and let me know what you think

          Show
          Hoss Man added a comment - New patch... 1) implements the "less kludgy-kludge" idea i mentioned above. I put the new public method for getting the baseURL of a nodeName in SolrZkClient in case we want to change it to be a live lookup from ZK at some point (and marked it @lucene.experimental in case we want completely move/remove it at some point) 2) improved ZkController to allow both leading and trailing "/" in the hostContext – both of which get striped off after the "empty string defaults to 'solr'" logic, and before the generation of the nodeName. As as result, specifying hostContext="/" in solr.xml is now a viable way to tell solr it's running in the root context. (In general i think this makes things easier to use, but 3) enhanced the hostContext randomization logic in BaseDistributedSearchTestCase so that it randomly uses: the root context (eg: "/"), one level paths (eg: "/as"), two level paths (eg: "/s/bf"), and, sometimes including an " " character in a path (eg: "/as ", "/s_d", "/_d/bf", etc..) 4) Changed BaseDistributedSearchTestCase to us the context path randomization logic by default. This is something i had left out of my previous patch because i wa worried about users who might have their own tests extending BaseDistributedSearchTestCase that were then making assumptions about the context being "/solr" – but i decided having more test coverage using these random context paths was more important – we can include an upgrade note along the lines of... BaseDistributedSearchTestCase now randomizes the servlet context it uses when creating Jetty instances. Subclasses that assume a hard coded context of "/solr" should either be fixed to use the "String context" variable, or should take advantage of the new BaseDistributedSearchTestCase(String) constructor to explicitly specify a fixed servlet context path. 5) Fixed test bugs exposed by 3 & 4 above. – Mark: would really appreciate it if you could take this bad-boy for a spin and let me know what you think
          Hide
          Hoss Man added a comment -

          patch updated to trunk

          Show
          Hoss Man added a comment - patch updated to trunk
          Hide
          Mark Miller added a comment -

          I'm going to take a closer look at this today.

          Show
          Mark Miller added a comment - I'm going to take a closer look at this today.
          Hide
          Hoss Man added a comment -

          patch updated to trunk (again)

          Show
          Hoss Man added a comment - patch updated to trunk (again)
          Hide
          Mark Miller added a comment -

          This looks great! +1

          Show
          Mark Miller added a comment - This looks great! +1
          Hide
          Hoss Man added a comment -

          Hmmm... running the lsat patch, CloudSolrServerTest just tickled a bug when "/" got selected as the hostContext...

                <failure message="fail check for leader:http://127.0.0.1:54905//collection1 in [http://127.0.0.1:54905/collection1/, http://127.0.0.1:35690/collection1/]" type="java.lang.AssertionError">java.lang.AssertionError: fail check for leader:http://127.0.0.1:54905//collection1 in [http://127.0.0.1:54905/collection1/, http://127.0.0.1:35690/collection1/]
          

          ...i just sanity checked, and the precendent is that baseUrl never ends in a trailing slash (something i had already noticed, but somehow overlooked after my change). It seemed like an easy fix but in running the full tests again I've already seen several failures go by that i need to figure out.

          Show
          Hoss Man added a comment - Hmmm... running the lsat patch, CloudSolrServerTest just tickled a bug when "/" got selected as the hostContext... <failure message="fail check for leader:http://127.0.0.1:54905//collection1 in [http://127.0.0.1:54905/collection1/, http://127.0.0.1:35690/collection1/]" type="java.lang.AssertionError">java.lang.AssertionError: fail check for leader:http://127.0.0.1:54905//collection1 in [http://127.0.0.1:54905/collection1/, http://127.0.0.1:35690/collection1/] ...i just sanity checked, and the precendent is that baseUrl never ends in a trailing slash (something i had already noticed, but somehow overlooked after my change). It seemed like an easy fix but in running the full tests again I've already seen several failures go by that i need to figure out.
          Hide
          Hoss Man added a comment -

          ok, so it turns out the failure i was seeing scroll by all came from OverseerCollectionProcessorTest which is a new test added in SOLR-4114.

          The assertion failure(s) are fairly obtuse...

                <failure message="
            Expectation failure on verify:
              submit(capture(Nothing captured yet), capture(Nothing captured yet), capture(Nothing captured yet)): expected: 1, actual: 0
              takeCompletedOrError(): expected: 7, actual: 0
              submit(capture(Nothing captured yet), capture(Nothing captured yet), capture(Nothing captured yet)): expected: 1, actual: 0
              submit(capture(Nothing captured yet), capture(Nothing captured yet), capture(Nothing captured yet)): expected: 1, actual: 0
              submit(capture(Nothing captured yet), capture(Nothing captured yet), capture(Nothing captured yet)): expected: 1, actual: 0
              submit(capture(Nothing captured yet), capture(Nothing captured yet), capture(Nothing captured yet)): expected: 1, actual: 0
              submit(capture(Nothing captured yet), capture(Nothing captured yet), capture(Nothing captured yet)): expected: 1, actual: 0" type="java.lang.AssertionError"
          

          ...but the root issue seems to be hidden in a log message i found: the easymock stuff is evidently freaking out because a new method is being called that it doesn't expect...

          [junit4:junit4]   2> 689 T1552 oasc.SolrException.log SEVERE Collection createcollection of mycollection failed:java.lang.AssertionError: 
          [junit4:junit4]   2> 	  Unexpected method call getZkClient():
          [junit4:junit4]   2> 		at org.easymock.internal.MockInvocationHandler.invoke(MockInvocationHandler.java:45)
          [junit4:junit4]   2> 		at org.easymock.internal.ObjectMethodsFilter.invoke(ObjectMethodsFilter.java:73)
          [junit4:junit4]   2> 		at org.easymock.internal.ClassProxyFactory$MockMethodInterceptor.intercept(ClassProxyFactory.java:92)
          [junit4:junit4]   2> 		at org.apache.solr.common.cloud.ZkStateReader$$EnhancerByCGLIB$$a53a0cdb.getZkClient(<generated>)
          [junit4:junit4]   2> 		at org.apache.solr.cloud.OverseerCollectionProcessor.createCollection(OverseerCollectionProcessor.java:263)
          [junit4:junit4]   2> 		at org.apache.solr.cloud.OverseerCollectionProcessor.processMessage(OverseerCollectionProcessor.java:155)
          [junit4:junit4]   2> 		at org.apache.solr.cloud.OverseerCollectionProcessor.run(OverseerCollectionProcessor.java:102)
          [junit4:junit4]   2> 		at java.lang.Thread.run(Thread.java:679)
          [junit4:junit4]   2> 	
          

          The new test also has some hardcoded assumptions about hte context being "solr" and being able to rebuild the URL by replacing "_" with "/" – but since it doesn't subclass the distrib base test, i think those will actualy be ok. i just need help udnerstanding how to tell the easy mock stuff to expect this method call.

          The updated attachment:

          1) brings things up to date with trunk
          2) incorporated my jetty context change suggestion from SOLR-4057 so it's really trivial now to run the example on any context you want
          3) fixes a small glitch in the generated the "baseUrl" for a node when context is "/" ... the trailing "/" was being left on the URL, which is inconsistent with past behavior, so i cleaned that up (this was triggering a failure in CloudSolrServerTest because it was finding that "http://foo:8983//collection1" wasn't equal to "http://foo:8983/collection1" when checking the leader URL)

          ...but it still suffers from the easy mock failure.

          Show
          Hoss Man added a comment - ok, so it turns out the failure i was seeing scroll by all came from OverseerCollectionProcessorTest which is a new test added in SOLR-4114 . The assertion failure(s) are fairly obtuse... <failure message=" Expectation failure on verify: submit(capture(Nothing captured yet), capture(Nothing captured yet), capture(Nothing captured yet)): expected: 1, actual: 0 takeCompletedOrError(): expected: 7, actual: 0 submit(capture(Nothing captured yet), capture(Nothing captured yet), capture(Nothing captured yet)): expected: 1, actual: 0 submit(capture(Nothing captured yet), capture(Nothing captured yet), capture(Nothing captured yet)): expected: 1, actual: 0 submit(capture(Nothing captured yet), capture(Nothing captured yet), capture(Nothing captured yet)): expected: 1, actual: 0 submit(capture(Nothing captured yet), capture(Nothing captured yet), capture(Nothing captured yet)): expected: 1, actual: 0 submit(capture(Nothing captured yet), capture(Nothing captured yet), capture(Nothing captured yet)): expected: 1, actual: 0" type="java.lang.AssertionError" ...but the root issue seems to be hidden in a log message i found: the easymock stuff is evidently freaking out because a new method is being called that it doesn't expect... [junit4:junit4] 2> 689 T1552 oasc.SolrException.log SEVERE Collection createcollection of mycollection failed:java.lang.AssertionError: [junit4:junit4] 2> Unexpected method call getZkClient(): [junit4:junit4] 2> at org.easymock.internal.MockInvocationHandler.invoke(MockInvocationHandler.java:45) [junit4:junit4] 2> at org.easymock.internal.ObjectMethodsFilter.invoke(ObjectMethodsFilter.java:73) [junit4:junit4] 2> at org.easymock.internal.ClassProxyFactory$MockMethodInterceptor.intercept(ClassProxyFactory.java:92) [junit4:junit4] 2> at org.apache.solr.common.cloud.ZkStateReader$$EnhancerByCGLIB$$a53a0cdb.getZkClient(<generated>) [junit4:junit4] 2> at org.apache.solr.cloud.OverseerCollectionProcessor.createCollection(OverseerCollectionProcessor.java:263) [junit4:junit4] 2> at org.apache.solr.cloud.OverseerCollectionProcessor.processMessage(OverseerCollectionProcessor.java:155) [junit4:junit4] 2> at org.apache.solr.cloud.OverseerCollectionProcessor.run(OverseerCollectionProcessor.java:102) [junit4:junit4] 2> at java.lang.Thread.run(Thread.java:679) [junit4:junit4] 2> The new test also has some hardcoded assumptions about hte context being "solr" and being able to rebuild the URL by replacing "_" with "/" – but since it doesn't subclass the distrib base test, i think those will actualy be ok. i just need help udnerstanding how to tell the easy mock stuff to expect this method call. — The updated attachment: 1) brings things up to date with trunk 2) incorporated my jetty context change suggestion from SOLR-4057 so it's really trivial now to run the example on any context you want 3) fixes a small glitch in the generated the "baseUrl" for a node when context is "/" ... the trailing "/" was being left on the URL, which is inconsistent with past behavior, so i cleaned that up (this was triggering a failure in CloudSolrServerTest because it was finding that "http://foo:8983//collection1" wasn't equal to "http://foo:8983/collection1" when checking the leader URL) ...but it still suffers from the easy mock failure.
          Hide
          Mark Miller added a comment -

          You need something along these lines:

          Index: solr/core/src/test/org/apache/solr/cloud/OverseerCollectionProcessorTest.java
          ===================================================================
          --- solr/core/src/test/org/apache/solr/cloud/OverseerCollectionProcessorTest.java	(revision 1420361)
          +++ solr/core/src/test/org/apache/solr/cloud/OverseerCollectionProcessorTest.java	(working copy)
          @@ -35,6 +35,7 @@
           
           import org.apache.solr.SolrTestCaseJ4;
           import org.apache.solr.common.cloud.ClusterState;
          +import org.apache.solr.common.cloud.SolrZkClient;
           import org.apache.solr.common.cloud.ZkNodeProps;
           import org.apache.solr.common.cloud.ZkStateReader;
           import org.apache.solr.common.params.CoreAdminParams;
          @@ -61,6 +62,7 @@
             private ShardHandler shardHandlerMock;
             private ZkStateReader zkStateReaderMock;
             private ClusterState clusterStateMock;
          +  private SolrZkClient solrZkClientMock;
             
             private Thread thread;
             private Queue<byte[]> queue = new BlockingArrayQueue<byte[]>();
          @@ -88,6 +90,7 @@
               shardHandlerMock = createMock(ShardHandler.class);
               zkStateReaderMock = createMock(ZkStateReader.class);
               clusterStateMock = createMock(ClusterState.class);
          +    solrZkClientMock = createMock(SolrZkClient.class);
               underTest = new OverseerCollectionProcessorToBeTested(zkStateReaderMock,
                   "1234", shardHandlerMock, ADMIN_PATH, workQueueMock);
             }
          @@ -129,6 +132,15 @@
                 }
               }).anyTimes();
               
          +    zkStateReaderMock.getZkClient();
          +    expectLastCall().andAnswer(new IAnswer<Object>() {
          +      @Override
          +      public Object answer() throws Throwable {
          +        return solrZkClientMock;
          +      }
          +    }).anyTimes();
          +    
          +    
               clusterStateMock.getCollections();
               expectLastCall().andAnswer(new IAnswer<Object>() {
                 @Override
          @@ -138,7 +150,19 @@
               }).anyTimes();
               final Set<String> liveNodes = new HashSet<String>();
               for (int i = 0; i < liveNodesCount; i++) {
          -      liveNodes.add("localhost:" + (8963 + i) + "_solr");
          +      final String address = "localhost:" + (8963 + i) + "_solr";
          +      liveNodes.add(address);
          +      
          +      solrZkClientMock.getBaseUrlForNodeName(address);
          +      expectLastCall().andAnswer(new IAnswer<Object>() {
          +        @Override
          +        public Object answer() throws Throwable {
          +          // This works as long as this test does not use a 
          +          // webapp context with an underscore in it
          +          return address.replaceAll("_", "/");
          +        }
          +      }).anyTimes();
          +      
               }
               clusterStateMock.getLiveNodes();
               expectLastCall().andAnswer(new IAnswer<Object>() {
          @@ -336,6 +360,7 @@
               }
               
               replay(workQueueMock);
          +    replay(solrZkClientMock);
               replay(zkStateReaderMock);
               replay(clusterStateMock);
               replay(shardHandlerMock);
          
          Show
          Mark Miller added a comment - You need something along these lines: Index: solr/core/src/test/org/apache/solr/cloud/OverseerCollectionProcessorTest.java =================================================================== --- solr/core/src/test/org/apache/solr/cloud/OverseerCollectionProcessorTest.java (revision 1420361) +++ solr/core/src/test/org/apache/solr/cloud/OverseerCollectionProcessorTest.java (working copy) @@ -35,6 +35,7 @@ import org.apache.solr.SolrTestCaseJ4; import org.apache.solr.common.cloud.ClusterState; +import org.apache.solr.common.cloud.SolrZkClient; import org.apache.solr.common.cloud.ZkNodeProps; import org.apache.solr.common.cloud.ZkStateReader; import org.apache.solr.common.params.CoreAdminParams; @@ -61,6 +62,7 @@ private ShardHandler shardHandlerMock; private ZkStateReader zkStateReaderMock; private ClusterState clusterStateMock; + private SolrZkClient solrZkClientMock; private Thread thread; private Queue<byte[]> queue = new BlockingArrayQueue<byte[]>(); @@ -88,6 +90,7 @@ shardHandlerMock = createMock(ShardHandler.class); zkStateReaderMock = createMock(ZkStateReader.class); clusterStateMock = createMock(ClusterState.class); + solrZkClientMock = createMock(SolrZkClient.class); underTest = new OverseerCollectionProcessorToBeTested(zkStateReaderMock, "1234", shardHandlerMock, ADMIN_PATH, workQueueMock); } @@ -129,6 +132,15 @@ } }).anyTimes(); + zkStateReaderMock.getZkClient(); + expectLastCall().andAnswer(new IAnswer<Object>() { + @Override + public Object answer() throws Throwable { + return solrZkClientMock; + } + }).anyTimes(); + + clusterStateMock.getCollections(); expectLastCall().andAnswer(new IAnswer<Object>() { @Override @@ -138,7 +150,19 @@ }).anyTimes(); final Set<String> liveNodes = new HashSet<String>(); for (int i = 0; i < liveNodesCount; i++) { - liveNodes.add("localhost:" + (8963 + i) + "_solr"); + final String address = "localhost:" + (8963 + i) + "_solr"; + liveNodes.add(address); + + solrZkClientMock.getBaseUrlForNodeName(address); + expectLastCall().andAnswer(new IAnswer<Object>() { + @Override + public Object answer() throws Throwable { + // This works as long as this test does not use a + // webapp context with an underscore in it + return address.replaceAll("_", "/"); + } + }).anyTimes(); + } clusterStateMock.getLiveNodes(); expectLastCall().andAnswer(new IAnswer<Object>() { @@ -336,6 +360,7 @@ } replay(workQueueMock); + replay(solrZkClientMock); replay(zkStateReaderMock); replay(clusterStateMock); replay(shardHandlerMock);
          Hide
          Hoss Man added a comment -

          Mark: that's all greek to me, bu it seems to work, and saved me the trouble of learning greek!

          Committed revision 1420497. - trunk

          backporting and testing on 4x now.

          Show
          Hoss Man added a comment - Mark: that's all greek to me, bu it seems to work, and saved me the trouble of learning greek! Committed revision 1420497. - trunk backporting and testing on 4x now.
          Hide
          Per Steffensen added a comment -

          Can confirm that Marks patch for OverseerCollectionProcessorTest seems to be good. In a mocked test you mock components used by the "class under test" (OverseerCollectionProcessor in this case) and basically tells those mocks how to behave when used by the "class under test". This way you test only the logic in the "class under test" and at the same time you get to verify that it communicates (calles stuff on) the components in interacts with (the mocked ones) in exactly the way you expect. I guess you have changed createCollection so that it also calls zkStateReaderMock.getZkClient().getBaseUrlForNodeName() which it didnt before, so that call wasnt "expected" by the test and you didnt tell the mocked component (ZkStateReader) how to "simulate" correct behaviour. Mark added this and the test should be fine again.

          Basically since you changed createCollection in a way so that it did not anylonger behave exactly as verified by the OverseerCollectionProcessorTest you needed to tweak the test also. This is a nice thing!

          Show
          Per Steffensen added a comment - Can confirm that Marks patch for OverseerCollectionProcessorTest seems to be good. In a mocked test you mock components used by the "class under test" (OverseerCollectionProcessor in this case) and basically tells those mocks how to behave when used by the "class under test". This way you test only the logic in the "class under test" and at the same time you get to verify that it communicates (calles stuff on) the components in interacts with (the mocked ones) in exactly the way you expect. I guess you have changed createCollection so that it also calls zkStateReaderMock.getZkClient().getBaseUrlForNodeName() which it didnt before, so that call wasnt "expected" by the test and you didnt tell the mocked component (ZkStateReader) how to "simulate" correct behaviour. Mark added this and the test should be fine again. Basically since you changed createCollection in a way so that it did not anylonger behave exactly as verified by the OverseerCollectionProcessorTest you needed to tweak the test also. This is a nice thing!
          Hide
          Hoss Man added a comment -

          FYI: staled on backporting due to other test instabilities on 4x at the moment ... i don't wnat to risk making the situation worse.

          Show
          Hoss Man added a comment - FYI: staled on backporting due to other test instabilities on 4x at the moment ... i don't wnat to risk making the situation worse.
          Hide
          Mark Miller added a comment - - edited

          Good call. Looks like Yonik is merging back some critical work in 5x that will help with those fails.

          When doing the Directory first class issue, some of the changes actually introduced/exposed a bunch of fails that i fixed on 5x.

          Somehow, some similar issues seem to have been recently exposed on 4x - but the fixes (both test and core code) are not there yet.

          I'm holding off on back porting to 4x until this merge up is complete.

          Show
          Mark Miller added a comment - - edited Good call. Looks like Yonik is merging back some critical work in 5x that will help with those fails. When doing the Directory first class issue, some of the changes actually introduced /exposed a bunch of fails that i fixed on 5x. Somehow, some similar issues seem to have been recently exposed on 4x - but the fixes (both test and core code) are not there yet. I'm holding off on back porting to 4x until this merge up is complete.
          Hide
          Commit Tag Bot added a comment -

          [trunk commit] Chris M. Hostetter
          http://svn.apache.org/viewvc?view=revision&revision=1421034

          SOLR-4136: fix hostContext randomization to never include double slashes

          Show
          Commit Tag Bot added a comment - [trunk commit] Chris M. Hostetter http://svn.apache.org/viewvc?view=revision&revision=1421034 SOLR-4136 : fix hostContext randomization to never include double slashes
          Hide
          Hoss Man added a comment -

          Yonik merged the 1420497 changes to the 4x branch in r1420992...

          http://svn.apache.org/viewvc?view=revision&revision=1420992

          There is a glitch however in the hostContext randomization code that resulted in contexts with multiple consecutive slashes (eg: "//s") which caused tests to complain about 404s when trying to talk to the jetty instances...

          Committed revision 1421034. - trunk
          Committed revision 1421036. - 4x.

          Show
          Hoss Man added a comment - Yonik merged the 1420497 changes to the 4x branch in r1420992... http://svn.apache.org/viewvc?view=revision&revision=1420992 There is a glitch however in the hostContext randomization code that resulted in contexts with multiple consecutive slashes (eg: "//s") which caused tests to complain about 404s when trying to talk to the jetty instances... Committed revision 1421034. - trunk Committed revision 1421036. - 4x.
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Chris M. Hostetter
          http://svn.apache.org/viewvc?view=revision&revision=1421036

          SOLR-4136: fix hostContext randomization to never include double slashes (merge r1421034)

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Chris M. Hostetter http://svn.apache.org/viewvc?view=revision&revision=1421036 SOLR-4136 : fix hostContext randomization to never include double slashes (merge r1421034)
          Hide
          Mark Miller added a comment -

          I just got this one that seems suspicious - can you take a look hossman?

          Error Message
          
          Server at http://127.0.0.1:44836//unloadcollection2 returned non ok status:404, message:Not Found
          Stacktrace
          
          org.apache.solr.common.SolrException: Server at http://127.0.0.1:44836//unloadcollection2 returned non ok status:404, message:Not Found
          	at __randomizedtesting.SeedInfo.seed([EB9D48266225215F:6A7BC63E157A4163]:0)
          	at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
          	at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
          	at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
          	at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116)
          	at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102)
          	at org.apache.solr.cloud.BasicDistributedZkTest.testCoreUnloadAndLeaders(BasicDistributedZkTest.java:548)
          	at org.apache.solr.cloud.BasicDistributedZkTest.doTest(BasicDistributedZkTest.java:352)
          	at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:793)
          Show
          Mark Miller added a comment - I just got this one that seems suspicious - can you take a look hossman? Error Message Server at http://127.0.0.1:44836//unloadcollection2 returned non ok status:404, message:Not Found Stacktrace org.apache.solr.common.SolrException: Server at http://127.0.0.1:44836//unloadcollection2 returned non ok status:404, message:Not Found at __randomizedtesting.SeedInfo.seed([EB9D48266225215F:6A7BC63E157A4163]:0) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102) at org.apache.solr.cloud.BasicDistributedZkTest.testCoreUnloadAndLeaders(BasicDistributedZkTest.java:548) at org.apache.solr.cloud.BasicDistributedZkTest.doTest(BasicDistributedZkTest.java:352) at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:793)
          Hide
          Mark Miller added a comment -

          Eg, it looks like perhaps an empty context? And some tests are not prepared to work with that if it is indeed running on the root? I dunno...first guess...pretty rare...

          Show
          Mark Miller added a comment - Eg, it looks like perhaps an empty context? And some tests are not prepared to work with that if it is indeed running on the root? I dunno...first guess...pretty rare...
          Hide
          Hoss Man added a comment -

          Hmmm, mark: i tried testing on both trunk@1422017 and 4x@1422022...

          • that seed did in fact produce the root context "/" for me
          • that tests passed with that seed and the root context
          • no where in the test log did get any mention of "//unloadcollection2" (which i would suspect if there was a bug building up a client url using the baseUrlWithTrailingSlash + "/corename" ... but there should never be a baseUrlWithTrailingSlash)

          I then modified my local working copy of trunk like so, in order to force every test to run with the root context, and even then i could not reproduce...

          Index: solr/test-framework/src/java/org/apache/solr/BaseDistributedSearchTestCase.java
          ===================================================================
          --- solr/test-framework/src/java/org/apache/solr/BaseDistributedSearchTestCase.java	(revision 1422017)
          +++ solr/test-framework/src/java/org/apache/solr/BaseDistributedSearchTestCase.java	(working copy)
          @@ -110,7 +110,9 @@
                 }
               }
               // paranoia, we *really* don't want to ever get "//" in a path...
          -    final String hc = hostContext.toString().replaceAll("\\/+","/");
          +    // final String hc = hostContext.toString().replaceAll("\\/+","/");
          +    // :nocommit: ... test the shit out of the root context
          +    final String hc = "/";
           
               log.info("Setting hostContext system property: " + hc);
               System.setProperty("hostContext", hc);
          

          ...this smells to me like it must have run against the code prior to my r1421034 commit (to get that "//" URL)

          Show
          Hoss Man added a comment - Hmmm, mark: i tried testing on both trunk@1422017 and 4x@1422022... that seed did in fact produce the root context "/" for me that tests passed with that seed and the root context no where in the test log did get any mention of "//unloadcollection2" (which i would suspect if there was a bug building up a client url using the baseUrlWithTrailingSlash + "/corename" ... but there should never be a baseUrlWithTrailingSlash) I then modified my local working copy of trunk like so, in order to force every test to run with the root context, and even then i could not reproduce... Index: solr/test-framework/src/java/org/apache/solr/BaseDistributedSearchTestCase.java =================================================================== --- solr/test-framework/src/java/org/apache/solr/BaseDistributedSearchTestCase.java (revision 1422017) +++ solr/test-framework/src/java/org/apache/solr/BaseDistributedSearchTestCase.java (working copy) @@ -110,7 +110,9 @@ } } // paranoia, we *really* don't want to ever get "//" in a path... - final String hc = hostContext.toString().replaceAll("\\/+","/"); + // final String hc = hostContext.toString().replaceAll("\\/+","/"); + // :nocommit: ... test the shit out of the root context + final String hc = "/"; log.info("Setting hostContext system property: " + hc); System.setProperty("hostContext", hc); ...this smells to me like it must have run against the code prior to my r1421034 commit (to get that "//" URL)
          Hide
          Hoss Man added a comment -

          ...this smells to me like it must have run against the code prior to my r1421034 commit (to get that "//" URL)

          Hmmm, except if that were the case, this line number would be different...

          ...testDistribSearch(BaseDistributedSearchTestCase.java:793)

          I really have no idea what caused that failure you got.

          Show
          Hoss Man added a comment - ...this smells to me like it must have run against the code prior to my r1421034 commit (to get that "//" URL) Hmmm, except if that were the case, this line number would be different... ...testDistribSearch(BaseDistributedSearchTestCase.java:793) I really have no idea what caused that failure you got.
          Hide
          Mark Miller added a comment -

          Thanks for looking - I'll keep an eye for a repeat - I have the logs somewhere too. I'll take it to another issue.

          Show
          Mark Miller added a comment - Thanks for looking - I'll keep an eye for a repeat - I have the logs somewhere too. I'll take it to another issue.
          Hide
          Commit Tag Bot added a comment -

          [trunk commit] Chris M. Hostetter
          http://svn.apache.org/viewvc?view=revision&revision=1424755

          SOLR-4136: followup: fix getBaseUrlForNodeName to never including trailing slash when using the root context, and harden generateNodeName to not trust the caller as far as leading/trailing slashes

          Show
          Commit Tag Bot added a comment - [trunk commit] Chris M. Hostetter http://svn.apache.org/viewvc?view=revision&revision=1424755 SOLR-4136 : followup: fix getBaseUrlForNodeName to never including trailing slash when using the root context, and harden generateNodeName to not trust the caller as far as leading/trailing slashes
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Chris M. Hostetter
          http://svn.apache.org/viewvc?view=revision&revision=1424762

          SOLR-4136: followup: fix getBaseUrlForNodeName to never including trailing slash when using the root context, and harden generateNodeName to not trust the caller as far as leading/trailing slashes (merge r1424755)

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Chris M. Hostetter http://svn.apache.org/viewvc?view=revision&revision=1424762 SOLR-4136 : followup: fix getBaseUrlForNodeName to never including trailing slash when using the root context, and harden generateNodeName to not trust the caller as far as leading/trailing slashes (merge r1424755)
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Chris M. Hostetter
          http://svn.apache.org/viewvc?view=revision&revision=1424768

          SOLR-4136: revert unclean r1424762

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Chris M. Hostetter http://svn.apache.org/viewvc?view=revision&revision=1424768 SOLR-4136 : revert unclean r1424762
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Chris M. Hostetter
          http://svn.apache.org/viewvc?view=revision&revision=1424771

          SOLR-4136: followup: fix getBaseUrlForNodeName to never including trailing slash when using the root context, and harden generateNodeName to not trust the caller as far as leading/trailing slashes (merge r1424755 - redo)

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Chris M. Hostetter http://svn.apache.org/viewvc?view=revision&revision=1424771 SOLR-4136 : followup: fix getBaseUrlForNodeName to never including trailing slash when using the root context, and harden generateNodeName to not trust the caller as far as leading/trailing slashes (merge r1424755 - redo)

            People

            • Assignee:
              Hoss Man
              Reporter:
              Hoss Man
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development