Solr
  1. Solr
  2. SOLR-7171

BaseDistributedSearchTestCase.getSolrXml() not used consistently - multiple divergent ways a JettySolrRunner may be inited

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.1, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      As part of SOLR-6349, i wanted to take advantage of the new features in SOLR-7147 to inspect shard requests in TestDistributedSearch, but even though i added a proper override of getSolrXml...

        @Override
        protected String getSolrXml() {
          return "solr-trackingshardhandler.xml"; 
        }
      

      ...that value was being ignored, and i was never getting an instance of TrackingShardHandlerFactory.

      I poked around a bit and confirmed that getSolrXml() is used by "setupJettySolrHome(File)" but that method is never called by BaseDistributedSearchTestCase - it's only called in framework subclasses like AbstractDistribZkTestBase and AbstractFullDistribZkTestBase. Instead, for simple subclasses of BaseDistributedSearchTestCase the jetty instances seem to be coming from createServers(int)


      I don't really understand why there are so many diff ways for a shard instance to be inited, and presumably that should be refactored? .. but a more immediate concern is that subclasses of BaseDistributedSearchTestCase which attempt to override the solr.xml file used can't (unless they are actually a subclass of AbstractDistribZkTestBase of AbstractFullDistribZkTestBase)

      1. SOLR-7171.patch
        10 kB
        Hoss Man
      2. SOLR-7171.patch
        7 kB
        Hoss Man

        Activity

        Hide
        Hoss Man added a comment -

        here's a patch i'm currently hammering on ... doesn't completley eliminate all the code/logic duplication, but at least gets us consistently using getSolrXml() (as far as i can tell)

        Part of making this work, was to change createControlJetty() and createServers(int numShards) to clone the getSolrHome() dir ... previously (if you subclassed BaseDistributedSearchTestCase) these methods would just re-use the same getSolrHome() dir for all the jetty instances, and only override the cores & data dirs.

        Not sure if folks have any strong opinion against this? it seems like a "fix" to the previous awkward hoops, and makes these simple distrib test work more like the cloud tests .. but it would certainly be possible to add more awkwards hoops to only clone getSolrHome() in the event that we need to override the solr.xml.

        Show
        Hoss Man added a comment - here's a patch i'm currently hammering on ... doesn't completley eliminate all the code/logic duplication, but at least gets us consistently using getSolrXml() (as far as i can tell) Part of making this work, was to change createControlJetty() and createServers(int numShards) to clone the getSolrHome() dir ... previously (if you subclassed BaseDistributedSearchTestCase) these methods would just re-use the same getSolrHome() dir for all the jetty instances, and only override the cores & data dirs. Not sure if folks have any strong opinion against this? it seems like a "fix" to the previous awkward hoops, and makes these simple distrib test work more like the cloud tests .. but it would certainly be possible to add more awkwards hoops to only clone getSolrHome() in the event that we need to override the solr.xml.
        Hide
        Hoss Man added a comment -

        the previous patch broke DistributedClusteringComponentTest - which is evidently the only contrib test we have that test distributed queries.

        Once i got logs enabled for the test, it was clear thatthe problem was that in the contrib test, where getSolrHome() was overridden to point to the contrib test directory with a collection1 that had all the clustering configs enabled, Solr was failing to init because there were do directories under the coreRootDirectory that both had a core.properties claiming they were collection1 ... there was the real collection1 that the test wanted, but there was also an empty cores directory that contained nothing but a core.properties file (claiming to be name=collection1.

        I spent a while trying to figure out how the hell this was working before my patch, before finally giving up – because whatever it was actually doing, it seems pretty clear this was some kind of mistake – the core.properties file being created was directly in a ./cores/ directory that lookd like it was intended to be the coreRootDirectory – except the coreRootDirectory was always getting set to be the same as getSolrHome().

        In any case: here's an updated patch where the (previously) odd ./cores/ directory is no longer created; the (now cloned) getSolrHome() directory is still used as the coreRootDirectory; a ./collection1/core.properties file is created if and only if it does not exist.

        This fixes the problem with DistributedClusteringComponentTest in a way that also keeps my TestSimpleTrackingShardHandler working w/o modifications to either – so hopefully it should work for any end-user subclasses as well.

        Still hammering on the full test suite, but i'd really appreciate it if Alan Woodward and Erick Erickson could review this since it seems like a lot of this "./cores/ dir as collection1" weirdness came about as a result of SOLR-6840 (and perhaps SOLR-6902? ... not certain)

        Show
        Hoss Man added a comment - the previous patch broke DistributedClusteringComponentTest - which is evidently the only contrib test we have that test distributed queries. Once i got logs enabled for the test, it was clear thatthe problem was that in the contrib test, where getSolrHome() was overridden to point to the contrib test directory with a collection1 that had all the clustering configs enabled, Solr was failing to init because there were do directories under the coreRootDirectory that both had a core.properties claiming they were collection1 ... there was the real collection1 that the test wanted, but there was also an empty cores directory that contained nothing but a core.properties file (claiming to be name=collection1 . I spent a while trying to figure out how the hell this was working before my patch, before finally giving up – because whatever it was actually doing, it seems pretty clear this was some kind of mistake – the core.properties file being created was directly in a ./cores/ directory that lookd like it was intended to be the coreRootDirectory – except the coreRootDirectory was always getting set to be the same as getSolrHome() . In any case: here's an updated patch where the (previously) odd ./cores/ directory is no longer created; the (now cloned) getSolrHome() directory is still used as the coreRootDirectory ; a ./collection1/core.properties file is created if and only if it does not exist. This fixes the problem with DistributedClusteringComponentTest in a way that also keeps my TestSimpleTrackingShardHandler working w/o modifications to either – so hopefully it should work for any end-user subclasses as well. Still hammering on the full test suite, but i'd really appreciate it if Alan Woodward and Erick Erickson could review this since it seems like a lot of this " ./cores/ dir as collection1" weirdness came about as a result of SOLR-6840 (and perhaps SOLR-6902 ? ... not certain)
        Hide
        Alan Woodward added a comment -

        Oh God, I spent days trying to refactor this before I gave up and used the hacky "just create a damn core.properties file" solution. The issue with SOLR-6840 was that there were a bunch of solr.xml files that defined a core called collection1, and the tests assumed that this core would exist when they started. But the new format doesn't allow for that, so the tests now had to create core.properties files instead.

        Anyway, your fix looks good Hoss. I like that it makes it explicit that it's creating a new solr.xml and empty core.

        Show
        Alan Woodward added a comment - Oh God, I spent days trying to refactor this before I gave up and used the hacky "just create a damn core.properties file" solution. The issue with SOLR-6840 was that there were a bunch of solr.xml files that defined a core called collection1, and the tests assumed that this core would exist when they started. But the new format doesn't allow for that, so the tests now had to create core.properties files instead. Anyway, your fix looks good Hoss. I like that it makes it explicit that it's creating a new solr.xml and empty core.
        Hide
        Erick Erickson added a comment -

        What Alan said.

        It looks fine to me too. One quick note is that it's not even necessary (in non-cloud mode at least) for anything to be in the core.properties file if all the defaults are acceptable, its mere existence is all that's absolutely required. Then the core name is taken from the directory core.properties is found in, the data dir is expected to be a directory in at the same level as core.properties etc..

        On a side note, it'd be nice, sometime, to get all traces of "collection1" out of there, but that's not something for the faint of heart, nor should it be conflated with this issue...

        Show
        Erick Erickson added a comment - What Alan said. It looks fine to me too. One quick note is that it's not even necessary (in non-cloud mode at least) for anything to be in the core.properties file if all the defaults are acceptable, its mere existence is all that's absolutely required. Then the core name is taken from the directory core.properties is found in, the data dir is expected to be a directory in at the same level as core.properties etc.. On a side note, it'd be nice, sometime, to get all traces of "collection1" out of there, but that's not something for the faint of heart, nor should it be conflated with this issue...
        Hide
        ASF subversion and git services added a comment -

        Commit 1663381 from hossman@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1663381 ]

        SOLR-7171: BaseDistributedSearchTestCase now clones getSolrHome() for each subclass, and consistently uses getSolrXml()

        Show
        ASF subversion and git services added a comment - Commit 1663381 from hossman@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1663381 ] SOLR-7171 : BaseDistributedSearchTestCase now clones getSolrHome() for each subclass, and consistently uses getSolrXml()
        Hide
        ASF subversion and git services added a comment -

        Commit 1663421 from hossman@apache.org in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1663421 ]

        SOLR-7171: BaseDistributedSearchTestCase now clones getSolrHome() for each subclass, and consistently uses getSolrXml() (merge r1663381)

        Show
        ASF subversion and git services added a comment - Commit 1663421 from hossman@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1663421 ] SOLR-7171 : BaseDistributedSearchTestCase now clones getSolrHome() for each subclass, and consistently uses getSolrXml() (merge r1663381)
        Hide
        Timothy Potter added a comment -

        Bulk close after 5.1 release

        Show
        Timothy Potter added a comment - Bulk close after 5.1 release

          People

          • Assignee:
            Hoss Man
            Reporter:
            Hoss Man
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development