Solr
  1. Solr
  2. SOLR-8221

Some improvements to MiniSolrCloudCluster API

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.4, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      Spin-off from discussion on SOLR-8196.

      MiniSolrCloudCluster should create subdirectories for all its child nodes.

      1. SOLR-8221.patch
        45 kB
        Alan Woodward
      2. SOLR-8221.patch
        41 kB
        Alan Woodward

        Activity

        Hide
        Alan Woodward added a comment -

        Patch.

        MiniSolrCloudCluster now takes a Path to a base directory in its constructor, gives names to its child JettySolrRunners, and creates subdirectories for them using those names.

        This also changes the API to take solr.xml as a String rather than a File, and adds default solr.xml and jetty configs. There was only a single test that was using a non-standard solr.xml here.

        You can now spin up a cluster as easily as:

        MiniSolrCloudCluster cluster = new MiniSolrCloudCluster(4, createTempDir());
        

        Giving the cluster control of its own directories has meant tweaks to a couple of tests in TestMiniSolrCloudCluster. We now always test that a collection can be deleted and then re-created, as the 'run this function twice sometimes' logic relied on some weird behaviour to do with re-use of existing directories. The test for async core loading was also not actually testing what it should have been - it passed because a newly started jetty was failing due to it trying to open up cores that were owned by other nodes, rather than because of any kind of async status. There are other tests for async loading, so I've just removed this.

        One other issue that this has turned up is that creating collections is quite slow, due to non-leader nodes going into recovery and then waiting for 7 seconds (SOLR-7141). I think this may be slowing down the entire test suite by several minutes (lots of tests create and delete collections, and 7 seconds per test adds up quickly). I'll open another ticket to investigate speeding that up.

        Show
        Alan Woodward added a comment - Patch. MiniSolrCloudCluster now takes a Path to a base directory in its constructor, gives names to its child JettySolrRunners, and creates subdirectories for them using those names. This also changes the API to take solr.xml as a String rather than a File, and adds default solr.xml and jetty configs. There was only a single test that was using a non-standard solr.xml here. You can now spin up a cluster as easily as: MiniSolrCloudCluster cluster = new MiniSolrCloudCluster(4, createTempDir()); Giving the cluster control of its own directories has meant tweaks to a couple of tests in TestMiniSolrCloudCluster. We now always test that a collection can be deleted and then re-created, as the 'run this function twice sometimes' logic relied on some weird behaviour to do with re-use of existing directories. The test for async core loading was also not actually testing what it should have been - it passed because a newly started jetty was failing due to it trying to open up cores that were owned by other nodes, rather than because of any kind of async status. There are other tests for async loading, so I've just removed this. One other issue that this has turned up is that creating collections is quite slow, due to non-leader nodes going into recovery and then waiting for 7 seconds ( SOLR-7141 ). I think this may be slowing down the entire test suite by several minutes (lots of tests create and delete collections, and 7 seconds per test adds up quickly). I'll open another ticket to investigate speeding that up.
        Hide
        Mark Miller added a comment -

        That 7 seconds can simply be lowered back down to 2 for non chaosmonkey tests via sys property config most likely.

        It's a bummer wait in general, but we have to be kind of conservative unless something more active is used to solve the issue.

        Show
        Mark Miller added a comment - That 7 seconds can simply be lowered back down to 2 for non chaosmonkey tests via sys property config most likely. It's a bummer wait in general, but we have to be kind of conservative unless something more active is used to solve the issue.
        Hide
        Mark Miller added a comment -

        I think this may be slowing down the entire test suite by several minutes

        Depends on if you running a lot tests in parallel or not I think. You can take out half the SolrCloud tests on my machine (I run 8 jvms for tests) and get almost the same times for test runs.

        Show
        Mark Miller added a comment - I think this may be slowing down the entire test suite by several minutes Depends on if you running a lot tests in parallel or not I think. You can take out half the SolrCloud tests on my machine (I run 8 jvms for tests) and get almost the same times for test runs.
        Hide
        Alan Woodward added a comment -

        Maybe we could add a 'newcollection' parameter to the core create requests, which mean that the core can skip recovery entirely? And change the collection-complete check to wait until all cores are up before it returns, to ensure that no docs are added until the collection is fully up.

        Show
        Alan Woodward added a comment - Maybe we could add a 'newcollection' parameter to the core create requests, which mean that the core can skip recovery entirely? And change the collection-complete check to wait until all cores are up before it returns, to ensure that no docs are added until the collection is fully up.
        Hide
        Alan Woodward added a comment -

        Patch changing the solrj tests as well. All tests pass.

        Show
        Alan Woodward added a comment - Patch changing the solrj tests as well. All tests pass.
        Hide
        Mark Miller added a comment -

        And change the collection-complete check to wait until all cores are up before it returns, to ensure that no docs are added until the collection is fully up.

        I still think it's tricky - you have to time out at some point, and what if a create call is held up in some buffer and released after the timeout .. it skips recovery? Just seems tricky to nail that. Would be better to spend that effort on making the wait itself active and not passive.

        On my setup I just ran with that wait at 7 seconds and lowered it down to 2 seconds - the total test time was about 40 seconds faster - which is almost within normal variation. Probably a bit faster though.

        Show
        Mark Miller added a comment - And change the collection-complete check to wait until all cores are up before it returns, to ensure that no docs are added until the collection is fully up. I still think it's tricky - you have to time out at some point, and what if a create call is held up in some buffer and released after the timeout .. it skips recovery? Just seems tricky to nail that. Would be better to spend that effort on making the wait itself active and not passive. On my setup I just ran with that wait at 7 seconds and lowered it down to 2 seconds - the total test time was about 40 seconds faster - which is almost within normal variation. Probably a bit faster though.
        Hide
        Alan Woodward added a comment -

        Would be better to spend that effort on making the wait itself active and not passive.

        Yeah, I think you're right. Ah well, on to the next suspect

        Show
        Alan Woodward added a comment - Would be better to spend that effort on making the wait itself active and not passive. Yeah, I think you're right. Ah well, on to the next suspect
        Hide
        ASF subversion and git services added a comment -

        Commit 1711041 from Alan Woodward in branch 'dev/trunk'
        [ https://svn.apache.org/r1711041 ]

        SOLR-8221: MiniSolrCloudCluster creates subdirectories for its child nodes

        Show
        ASF subversion and git services added a comment - Commit 1711041 from Alan Woodward in branch 'dev/trunk' [ https://svn.apache.org/r1711041 ] SOLR-8221 : MiniSolrCloudCluster creates subdirectories for its child nodes
        Hide
        ASF subversion and git services added a comment -

        Commit 1711073 from Alan Woodward in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1711073 ]

        SOLR-8221: MiniSolrCloudCluster creates subdirectories for its child nodes

        Show
        ASF subversion and git services added a comment - Commit 1711073 from Alan Woodward in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1711073 ] SOLR-8221 : MiniSolrCloudCluster creates subdirectories for its child nodes
        Hide
        Steve Rowe added a comment - - edited

        Alan Woodward, my Jenkins (and also Policeman Jenkins) has been seeing Error Message: java.lang.IllegalStateException: Scheme 'http' not registered. for several tests since you committed this - is it possible that your changes caused it?:

        http://jenkins.sarowe.net/job/Lucene-Solr-tests-trunk/3318/
        http://jenkins.sarowe.net/job/Lucene-Solr-tests-5.x-Java7/3228/
        http://jenkins.sarowe.net/job/Lucene-Solr-tests-5.x-Java8/2995/
        http://jenkins.sarowe.net/job/Lucene-Solr-tests-trunk/3321/
        http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/14412/

        Show
        Steve Rowe added a comment - - edited Alan Woodward , my Jenkins (and also Policeman Jenkins) has been seeing Error Message: java.lang.IllegalStateException: Scheme 'http' not registered. for several tests since you committed this - is it possible that your changes caused it?: http://jenkins.sarowe.net/job/Lucene-Solr-tests-trunk/3318/ http://jenkins.sarowe.net/job/Lucene-Solr-tests-5.x-Java7/3228/ http://jenkins.sarowe.net/job/Lucene-Solr-tests-5.x-Java8/2995/ http://jenkins.sarowe.net/job/Lucene-Solr-tests-trunk/3321/ http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/14412/
        Hide
        Alan Woodward added a comment -

        Yes, looks like SSL config is not getting propagated through the cluster settings any more, am looking now...

        Show
        Alan Woodward added a comment - Yes, looks like SSL config is not getting propagated through the cluster settings any more, am looking now...
        Hide
        ASF subversion and git services added a comment -

        Commit 1711112 from Alan Woodward in branch 'dev/trunk'
        [ https://svn.apache.org/r1711112 ]

        SOLR-8221: Ensure that SSL config is passed to MiniSolrCloudCluster

        Show
        ASF subversion and git services added a comment - Commit 1711112 from Alan Woodward in branch 'dev/trunk' [ https://svn.apache.org/r1711112 ] SOLR-8221 : Ensure that SSL config is passed to MiniSolrCloudCluster
        Hide
        ASF subversion and git services added a comment -

        Commit 1711113 from Alan Woodward in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1711113 ]

        SOLR-8221: Ensure that SSL config is passed to MiniSolrCloudCluster

        Show
        ASF subversion and git services added a comment - Commit 1711113 from Alan Woodward in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1711113 ] SOLR-8221 : Ensure that SSL config is passed to MiniSolrCloudCluster

          People

          • Assignee:
            Alan Woodward
            Reporter:
            Alan Woodward
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development