Solr
  1. Solr
  2. SOLR-4948

Tidy up CoreContainer construction logic

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.4
    • Component/s: None
    • Labels:
      None

      Description

      While writing tests for SOLR-4914, I discovered that it's really difficult to create a CoreContainer. There are a bunch of constructors which initialise different things, one (but only one!) of which also loads all the cores. Then you have the Initializer object, which basically does the same thing. Sort of. And then the TestHarness doesn't actually use CoreContainer, but an anonymous subclass of CoreContainer which has it's own initialisation logic. It would be nice to clean this up!

      1. SOLR-4948.patch
        126 kB
        Alan Woodward
      2. SOLR-4948.patch
        123 kB
        Alan Woodward
      3. SOLR-4948.patch
        112 kB
        Alan Woodward
      4. SOLR-4948.patch
        108 kB
        Alan Woodward
      5. SOLR-4948.patch
        88 kB
        Alan Woodward
      6. SOLR-4948.patch
        86 kB
        Alan Woodward
      7. SOLR-4948.patch
        80 kB
        Alan Woodward

        Issue Links

          Activity

          Hide
          Alan Woodward added a comment -

          Patch that does said tidying.

          1) Rationalizes the constructor to just taking a SolrResourceLoader and ConfigSolr object (plus various defaults). None of the constructors load the cores.

          2) Remove the arguments to load() - everything necessary for loading has already been passed to the constructor.

          3) Add a convenience createAndLoad() static method.

          4) Adds some convenience methods for creating ConfigSolr objects.

          5) Removes the Initializer object entirely

          6) Refactors TestHarness to create its own ConfigSolr instance, rather than doing any funky subclassing and initializing.

          There are a couple of test fails (both ZK tests) which I will get on with fixing.

          Show
          Alan Woodward added a comment - Patch that does said tidying. 1) Rationalizes the constructor to just taking a SolrResourceLoader and ConfigSolr object (plus various defaults). None of the constructors load the cores. 2) Remove the arguments to load() - everything necessary for loading has already been passed to the constructor. 3) Add a convenience createAndLoad() static method. 4) Adds some convenience methods for creating ConfigSolr objects. 5) Removes the Initializer object entirely 6) Refactors TestHarness to create its own ConfigSolr instance, rather than doing any funky subclassing and initializing. There are a couple of test fails (both ZK tests) which I will get on with fixing.
          Hide
          Mark Miller added a comment -

          Only did a glance review, but +1 given that.

          Show
          Mark Miller added a comment - Only did a glance review, but +1 given that.
          Hide
          Erick Erickson added a comment -

          If you can fix up the nonsense about whether or not there's a cfg in core container that's null that appears to only be true in the test shell you would have my eternal gratitude. There are a series of comments around if (cfg != null)

          {do one thing}

          else

          { do another}

          . I really, really hate having the main-line code have conditions that are only there for testing, but never tried to take on fixing the test-harness sensitivites.

          Haven't reviewed quite yet, may be able to this weekend.

          Show
          Erick Erickson added a comment - If you can fix up the nonsense about whether or not there's a cfg in core container that's null that appears to only be true in the test shell you would have my eternal gratitude. There are a series of comments around if (cfg != null) {do one thing} else { do another} . I really, really hate having the main-line code have conditions that are only there for testing, but never tried to take on fixing the test-harness sensitivites. Haven't reviewed quite yet, may be able to this weekend.
          Hide
          Alan Woodward added a comment -

          The cfg is final and required to be passed in to the constructor - I should probably add something that ensures that it's not null as well, which would stop any further test bugs creeping in.

          Show
          Alan Woodward added a comment - The cfg is final and required to be passed in to the constructor - I should probably add something that ensures that it's not null as well, which would stop any further test bugs creeping in.
          Hide
          Alan Woodward added a comment -

          Latest patch. I still have two test failures in SolrCoreCheckLockOnStartupTest, which I think is due to the old TestHarness not using the normal load() logic and just creating a core on its own.

          The tests create an IndexWriter on a directory, and then try to open a new SolrCore over the same instance dir. This should fail, but it seems that the default DirectoryFactory notices that there's already an index there, and creates a new index_temp directory to hold the Core's index. Which means no exceptions get thrown, and the test passes. Will try and work out which DirectoryFactory was used before...

          Show
          Alan Woodward added a comment - Latest patch. I still have two test failures in SolrCoreCheckLockOnStartupTest, which I think is due to the old TestHarness not using the normal load() logic and just creating a core on its own. The tests create an IndexWriter on a directory, and then try to open a new SolrCore over the same instance dir. This should fail, but it seems that the default DirectoryFactory notices that there's already an index there, and creates a new index_temp directory to hold the Core's index. Which means no exceptions get thrown, and the test passes. Will try and work out which DirectoryFactory was used before...
          Hide
          Alan Woodward added a comment -

          Actually, no, ignore that last comment. It's because the test is creating a core under test-files, rather than using the tmp datadir that it should be, so the locked index is in a completely different place to the core...

          Show
          Alan Woodward added a comment - Actually, no, ignore that last comment. It's because the test is creating a core under test-files, rather than using the tmp datadir that it should be, so the locked index is in a completely different place to the core...
          Hide
          Alan Woodward added a comment -

          OK, think that's it; a combination of not setting dataDir properly on the TestHarness solr.xml, and checking for thrown exceptions rather than looking at CoreContainer.getCoreInitFailures(). Will run all tests again here, but I think this is ready to commit.

          Show
          Alan Woodward added a comment - OK, think that's it; a combination of not setting dataDir properly on the TestHarness solr.xml, and checking for thrown exceptions rather than looking at CoreContainer.getCoreInitFailures(). Will run all tests again here, but I think this is ready to commit.
          Hide
          Alan Woodward added a comment -

          Am still getting a few test failures, mainly due to the TestHarness code assuming that we want to start up a single core before doing any tests. Gradually working through them...

          Show
          Alan Woodward added a comment - Am still getting a few test failures, mainly due to the TestHarness code assuming that we want to start up a single core before doing any tests. Gradually working through them...
          Hide
          Alan Woodward added a comment -

          Still two test failures:

          TestCoreContainer.testPersist, which wants a core's instanceDir to be persisted as an absolute directory. However, if I do that, then all of Erick's TestSolrXmlPersistence tests fail, as they seem to expect relative ones... Can we decide which one should be the correct behaviour?

          ClusterStateUpdateTest.testCoreRegistration, which is failing with Zookeeper SessionExpiredExceptions. Still digging on this one.

          Show
          Alan Woodward added a comment - Still two test failures: TestCoreContainer.testPersist, which wants a core's instanceDir to be persisted as an absolute directory. However, if I do that, then all of Erick's TestSolrXmlPersistence tests fail, as they seem to expect relative ones... Can we decide which one should be the correct behaviour? ClusterStateUpdateTest.testCoreRegistration, which is failing with Zookeeper SessionExpiredExceptions. Still digging on this one.
          Hide
          Erick Erickson added a comment -

          Alan:

          first, I'm soooo glad you're tackling this and not me, but I'm a little jealous that your hard work will actually live on and mine will be obsolete in 5.0 <G>....

          Which of the various asserts in TestCoreContainer.testPersist fail? There are several.... And are these with the latest patch? I have to go out for much of today and I happen to be on California time so I won't be able to look at this until probably after you've gone to sleep.

          But here's the general rule, the instancedir that's persisted should be what was originally specified when the core was created. So if I create a core with a relative instancedir, what gets persisted should be a relative instance dir and vice-versa. I'm not actually sure how all this gets handled when creating a core from another core like happens in that test, I don't remember writing any tests to cover this in the persistence tests I wrote lately. There are tests in there to handle creating cores from the admin handler but not from a "template" core.... It's entirely possible that we need to change the behavior in this case.

          Here's what I'd say (without actually looking). The persisted core should have the same instancedir form as the template core's original definition did. But you can't really tell this from core.getInstanceDir since that might substitute sys vars and all that kind of thing. But we can just change the test based on that principle...

          So if the latest patch on this JIRA shows this behavior, I'll see if I can take a whack at it later today.

          Show
          Erick Erickson added a comment - Alan: first, I'm soooo glad you're tackling this and not me, but I'm a little jealous that your hard work will actually live on and mine will be obsolete in 5.0 <G>.... Which of the various asserts in TestCoreContainer.testPersist fail? There are several.... And are these with the latest patch? I have to go out for much of today and I happen to be on California time so I won't be able to look at this until probably after you've gone to sleep. But here's the general rule, the instancedir that's persisted should be what was originally specified when the core was created. So if I create a core with a relative instancedir, what gets persisted should be a relative instance dir and vice-versa. I'm not actually sure how all this gets handled when creating a core from another core like happens in that test, I don't remember writing any tests to cover this in the persistence tests I wrote lately. There are tests in there to handle creating cores from the admin handler but not from a "template" core.... It's entirely possible that we need to change the behavior in this case. Here's what I'd say (without actually looking). The persisted core should have the same instancedir form as the template core's original definition did. But you can't really tell this from core.getInstanceDir since that might substitute sys vars and all that kind of thing. But we can just change the test based on that principle... So if the latest patch on this JIRA shows this behavior, I'll see if I can take a whack at it later today.
          Hide
          Alan Woodward added a comment -

          OK, I think this patch fixes everything.

          The persistence problems were eventually due to the fact that CoreDescriptor sticks a slash on the end of instancedir, but the persistence logic pulls the information from the Config object, not the CoreDescriptor. I've changed the tests to always expect a '/' on the end, which isn't the nicest solution, but it at least makes them pass.

          SOLR-4914 will do everything through the CoreDescriptor, which should hopefully make all this a lot easier.

          Will run tests again, but I think this is ready to commit.

          Show
          Alan Woodward added a comment - OK, I think this patch fixes everything. The persistence problems were eventually due to the fact that CoreDescriptor sticks a slash on the end of instancedir, but the persistence logic pulls the information from the Config object, not the CoreDescriptor. I've changed the tests to always expect a '/' on the end, which isn't the nicest solution, but it at least makes them pass. SOLR-4914 will do everything through the CoreDescriptor, which should hopefully make all this a lot easier. Will run tests again, but I think this is ready to commit.
          Hide
          Erick Erickson added a comment -

          Watch out! I did the exact same thing then the tests ran fine on my Mac but failed on Windows boxes. Had to use File.separator.....

          My excuse was that since this is limited-time code I was willing to just hack in a slash <G>..

          Erick

          Show
          Erick Erickson added a comment - Watch out! I did the exact same thing then the tests ran fine on my Mac but failed on Windows boxes. Had to use File.separator..... My excuse was that since this is limited-time code I was willing to just hack in a slash <G>.. Erick
          Hide
          Alan Woodward added a comment -

          Re-opening to fix test failures - some tests are now trying to create cores outside the test sandbox.

          Show
          Alan Woodward added a comment - Re-opening to fix test failures - some tests are now trying to create cores outside the test sandbox.
          Hide
          Alan Woodward added a comment -

          Have been away for a week, now getting back round to this.

          Test failures were all under SolrJ, largely related to the default static initialization code that assumes we want a CoreContainer with a single core set up. This patch should catch them all.

          N.B., it turns out that actually running SolrJ tests is a good way to catch these before committing. I had managed to just run everything under solr-core...

          Show
          Alan Woodward added a comment - Have been away for a week, now getting back round to this. Test failures were all under SolrJ, largely related to the default static initialization code that assumes we want a CoreContainer with a single core set up. This patch should catch them all. N.B., it turns out that actually running SolrJ tests is a good way to catch these before committing. I had managed to just run everything under solr-core...
          Hide
          Alan Woodward added a comment -

          Final patch, fixing the remaining test issues. Committing shortly.

          Show
          Alan Woodward added a comment - Final patch, fixing the remaining test issues. Committing shortly.
          Hide
          ASF subversion and git services added a comment -

          Commit 1498990 from Alan Woodward
          [ https://svn.apache.org/r1498990 ]

          SOLR-4948: Tidy up CoreContainer construction logic

          Show
          ASF subversion and git services added a comment - Commit 1498990 from Alan Woodward [ https://svn.apache.org/r1498990 ] SOLR-4948 : Tidy up CoreContainer construction logic
          Hide
          Alan Woodward added a comment -

          Will let this bake in 5.0 for a bit

          Show
          Alan Woodward added a comment - Will let this bake in 5.0 for a bit
          Hide
          Uwe Schindler added a comment -

          Alan: Once you backported this, can you also backport SOLR-5009 ?

          With this commit you introduced the SOLR-5009 bug (creates multiple SolrResourceLoaders for the same config dir). I can help to backport, but I was unable to merge.

          Show
          Uwe Schindler added a comment - Alan: Once you backported this, can you also backport SOLR-5009 ? With this commit you introduced the SOLR-5009 bug (creates multiple SolrResourceLoaders for the same config dir). I can help to backport, but I was unable to merge.
          Hide
          Erick Erickson added a comment -

          Alan Woodward Uwe uncovered a problem with this patch, possibly accounting for the PermGen errors we've seen lately in the tests. Plus, I have some fixes I want to get into 4.4. So my vote is to go ahead and merge this into 4.x and we'll fix anything that crops up.

          If Uwe backports stuff then merging the changes for this JIRA and his changes get complicated. Maybe it's time to just bite the bullet?

          Show
          Erick Erickson added a comment - Alan Woodward Uwe uncovered a problem with this patch, possibly accounting for the PermGen errors we've seen lately in the tests. Plus, I have some fixes I want to get into 4.4. So my vote is to go ahead and merge this into 4.x and we'll fix anything that crops up. If Uwe backports stuff then merging the changes for this JIRA and his changes get complicated. Maybe it's time to just bite the bullet?
          Hide
          ASF subversion and git services added a comment -

          Commit 1500166 from Uwe Schindler
          [ https://svn.apache.org/r1500166 ]

          SOLR-5009: CHANGES.txt now only list SOLR-5009 as part of SOLR-4948

          Show
          ASF subversion and git services added a comment - Commit 1500166 from Uwe Schindler [ https://svn.apache.org/r1500166 ] SOLR-5009 : CHANGES.txt now only list SOLR-5009 as part of SOLR-4948
          Hide
          Uwe Schindler added a comment - - edited

          One comment: When you backport to 4.x, don't forget to preserve the assumeTrue(Constants.JRE_IS_MINIMUM_JAVA_7), otherwise windows will fail again.

          Show
          Uwe Schindler added a comment - - edited One comment: When you backport to 4.x, don't forget to preserve the assumeTrue(Constants.JRE_IS_MINIMUM_JAVA_7), otherwise windows will fail again.
          Hide
          Alan Woodward added a comment -

          Thanks Uwe, Erick. I'll backport both of these this weekend.

          Show
          Alan Woodward added a comment - Thanks Uwe, Erick. I'll backport both of these this weekend.
          Hide
          Erick Erickson added a comment -

          BTW, in case I didn't say it, this was timely. I had to, you guessed it, create a core container that didn't make all the assumptions built into TestHarness. This will make testing core-less setups waaaay easier.

          Thanks for tackling this!

          Show
          Erick Erickson added a comment - BTW, in case I didn't say it, this was timely. I had to, you guessed it, create a core container that didn't make all the assumptions built into TestHarness. This will make testing core-less setups waaaay easier. Thanks for tackling this!
          Hide
          Uwe Schindler added a comment -

          Thanks Alan, I will help you on any problems! I hope SOLR-5009 is obvious, why it is a bad idea to lazyly create SolrResourceLoaders. As Erick said, this might be the issue for the recent permgen issues on trunk!

          Currently I am investigating all other places where Solr creates SolrResourceLoader but does not close it. The only remaining part is SolrCore.reload(), I have to figure out if the old Core is correctly closed and also closes the ResourceLoader!

          Show
          Uwe Schindler added a comment - Thanks Alan, I will help you on any problems! I hope SOLR-5009 is obvious, why it is a bad idea to lazyly create SolrResourceLoaders. As Erick said, this might be the issue for the recent permgen issues on trunk! Currently I am investigating all other places where Solr creates SolrResourceLoader but does not close it. The only remaining part is SolrCore.reload(), I have to figure out if the old Core is correctly closed and also closes the ResourceLoader!
          Hide
          Uwe Schindler added a comment -

          I opened SOLR-4914 to correctly close SolrResourceLoaders for SolrCore and reloaded SolrCores. Its a mess, I have no idea how to handle resource tracking!

          Show
          Uwe Schindler added a comment - I opened SOLR-4914 to correctly close SolrResourceLoaders for SolrCore and reloaded SolrCores. Its a mess, I have no idea how to handle resource tracking!
          Hide
          ASF subversion and git services added a comment -

          Commit 1500315 from Alan Woodward
          [ https://svn.apache.org/r1500315 ]

          SOLR-4948, SOLR-5009: Tidy up CoreContainer construction logic

          Show
          ASF subversion and git services added a comment - Commit 1500315 from Alan Woodward [ https://svn.apache.org/r1500315 ] SOLR-4948 , SOLR-5009 : Tidy up CoreContainer construction logic
          Hide
          Alan Woodward added a comment -

          Thanks all!

          Show
          Alan Woodward added a comment - Thanks all!
          Hide
          ASF subversion and git services added a comment -

          Commit 1500316 from Alan Woodward
          [ https://svn.apache.org/r1500316 ]

          Move SOLR-4948 CHANGES.txt entry to 4.4

          Show
          ASF subversion and git services added a comment - Commit 1500316 from Alan Woodward [ https://svn.apache.org/r1500316 ] Move SOLR-4948 CHANGES.txt entry to 4.4
          Hide
          Uwe Schindler added a comment -

          Thanks! Commit looks fine, I think Windows tests will succeed also on 4.x

          Show
          Uwe Schindler added a comment - Thanks! Commit looks fine, I think Windows tests will succeed also on 4.x

            People

            • Assignee:
              Alan Woodward
              Reporter:
              Alan Woodward
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development