Solr
  1. Solr
  2. SOLR-4114

Collection API: Allow multiple shards from one collection on the same Solr server

    Details

      Description

      We should support running multiple shards from one collection on the same Solr server - the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running 2 shards).

      Performance tests at our side has shown that this is a good idea, and it is also a good idea for easy elasticity later on - it is much easier to move an entire existing shards from one Solr server to another one that just joined the cluter than it is to split an exsiting shard among the Solr that used to run it and the new Solr.

      See dev mailing list discussion "Multiple shards for one collection on the same Solr server"

      1. SOLR-4114_mocking_OverseerCollectionProcessorTest_branch_4x.patch
        19 kB
        Per Steffensen
      2. SOLR-4114_trunk.patch
        39 kB
        Per Steffensen
      3. SOLR-4114.patch
        31 kB
        Mark Miller
      4. SOLR-4114.patch
        39 kB
        Mark Miller
      5. SOLR-4114.patch
        32 kB
        Per Steffensen
      6. SOLR-4114.patch
        22 kB
        Per Steffensen

        Issue Links

          Activity

          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Mark Robert Miller
          http://svn.apache.org/viewvc?view=revision&revision=1417070

          SOLR-4114: Allow creating more than one shard per instance with the Collection API.

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1417070 SOLR-4114 : Allow creating more than one shard per instance with the Collection API.
          Hide
          Per Steffensen added a comment -

          Guess the following will do

          && (server.getAttribute(mbean, "coreName") != null)
          
          Show
          Per Steffensen added a comment - Guess the following will do && (server.getAttribute(mbean, "coreName" ) != null )
          Hide
          Per Steffensen added a comment -

          Maybe you ought to fix the real problem instaed - making Solr behave equally whether it is checked out from GitHub or SVN.

          Had a look at your fix - it ought to be

          && ((value = server.getAttribute(mbean, "coreName")) != null)
          

          instead of

          && ((indexDir = server.getAttribute(mbean, "coreName")) != null)
          

          Regards, Per Steffensen

          Show
          Per Steffensen added a comment - Maybe you ought to fix the real problem instaed - making Solr behave equally whether it is checked out from GitHub or SVN. Had a look at your fix - it ought to be && ((value = server.getAttribute(mbean, "coreName" )) != null ) instead of && ((indexDir = server.getAttribute(mbean, "coreName" )) != null ) Regards, Per Steffensen
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Mark Robert Miller
          http://svn.apache.org/viewvc?view=revision&revision=1426330

          SOLR-4114: tests: make happy with git - source attrib counts on svn substitution

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1426330 SOLR-4114 : tests: make happy with git - source attrib counts on svn substitution
          Hide
          Commit Tag Bot added a comment -

          [trunk commit] Mark Robert Miller
          http://svn.apache.org/viewvc?view=revision&revision=1426329

          SOLR-4114: tests: make happy with git - source attrib counts on svn substitution

          Show
          Commit Tag Bot added a comment - [trunk commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1426329 SOLR-4114 : tests: make happy with git - source attrib counts on svn substitution
          Hide
          Mark Miller added a comment -

          Ran into some troubles with the 'no two shards use the same index dir' check with a git checkout - finally traced it down to checking the source attribute from the the solrcore mbean - and when not using svn, this can be $URL. I'll look at an alternate check to do instead.

          Show
          Mark Miller added a comment - Ran into some troubles with the 'no two shards use the same index dir' check with a git checkout - finally traced it down to checking the source attribute from the the solrcore mbean - and when not using svn, this can be $URL. I'll look at an alternate check to do instead.
          Hide
          Mark Miller added a comment -

          Thanks for the contribution Per!

          Show
          Mark Miller added a comment - Thanks for the contribution Per!
          Hide
          Mark Miller added a comment -

          I don't often run precommit - takes too long - but I generally do when adding new dependencies because i know there will be license stuff. Jenkins will catch it even if we dont before commit though.

          You have to extend LuceneTestCase because thats what hooks you into our "test framework" I believe - eg it enforces things like setup and teardown calling super and a ton of other nice stuff.

          If you don't do it, jenkins will complain and fail.

          I'll look at SOLR-4120.

          Show
          Mark Miller added a comment - I don't often run precommit - takes too long - but I generally do when adding new dependencies because i know there will be license stuff. Jenkins will catch it even if we dont before commit though. You have to extend LuceneTestCase because thats what hooks you into our "test framework" I believe - eg it enforces things like setup and teardown calling super and a ton of other nice stuff. If you don't do it, jenkins will complain and fail. I'll look at SOLR-4120 .
          Hide
          Per Steffensen added a comment -

          Feel free to give feedback in that issue - this was my first experience looking at easymock.

          I added a comment on SOLR-4136, but basically you did a fine job!

          Now that SOLR-4114 is closed would you mind continuing with SOLR-4120, Mark?

          Show
          Per Steffensen added a comment - Feel free to give feedback in that issue - this was my first experience looking at easymock. I added a comment on SOLR-4136 , but basically you did a fine job! Now that SOLR-4114 is closed would you mind continuing with SOLR-4120 , Mark?
          Hide
          Per Steffensen added a comment -

          Great patch! I just had to make a couple small tweaks to make the policeman happy - added license files for the test jars, changed the new test to extend LuceneTestCase. Good stuff found by the long but useful precommit ant target.

          Yes, I do not do precommit test before sending a patch. Of course I will do before committing myself if/when I become committer.

          Just out of curiosity, why do tests HAVE to extend LuceneTestCase?

          I think that covers all of your points Per - let me know

          Checking only on branch_4x, hopeing that you did the "exact" same changes on trunk. Updated my local checkout of branch_4x, made a check to see if everything seemed to be there. It was. I am ready to close this ticket.

          Show
          Per Steffensen added a comment - Great patch! I just had to make a couple small tweaks to make the policeman happy - added license files for the test jars, changed the new test to extend LuceneTestCase. Good stuff found by the long but useful precommit ant target. Yes, I do not do precommit test before sending a patch. Of course I will do before committing myself if/when I become committer. Just out of curiosity, why do tests HAVE to extend LuceneTestCase? I think that covers all of your points Per - let me know Checking only on branch_4x, hopeing that you did the "exact" same changes on trunk. Updated my local checkout of branch_4x, made a check to see if everything seemed to be there. It was. I am ready to close this ticket.
          Hide
          Mark Miller added a comment -

          note: hossman has run into trouble with the easymock test in SOLR-4136 - I've attached a small patch there that extends the mocking to cover solrzkclient and a call to get the base url.

          I suppose it's not ideal in that it simply returns _ replaced with /, so the test won't work with a solr context using an underscore - but it's a start, works currently, and can be improved.

          Feel free to give feedback in that issue - this was my first experience looking at easymock.

          Show
          Mark Miller added a comment - note: hossman has run into trouble with the easymock test in SOLR-4136 - I've attached a small patch there that extends the mocking to cover solrzkclient and a call to get the base url. I suppose it's not ideal in that it simply returns _ replaced with /, so the test won't work with a solr context using an underscore - but it's a start, works currently, and can be improved. Feel free to give feedback in that issue - this was my first experience looking at easymock.
          Hide
          Mark Miller added a comment -

          I think that covers all of your points Per - let me know.

          Show
          Mark Miller added a comment - I think that covers all of your points Per - let me know.
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Mark Robert Miller
          http://svn.apache.org/viewvc?view=revision&revision=1420329

          SOLR-4114: add back commented out test with 10 second wait, add Per's new test, add test to ensure no two cores use the same index directory

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1420329 SOLR-4114 : add back commented out test with 10 second wait, add Per's new test, add test to ensure no two cores use the same index directory
          Hide
          Commit Tag Bot added a comment -

          [trunk commit] Mark Robert Miller
          http://svn.apache.org/viewvc?view=revision&revision=1420327

          SOLR-4114: add back commented out test with 10 second wait, add Per's new test, add test to ensure no two cores use the same index directory

          Show
          Commit Tag Bot added a comment - [trunk commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1420327 SOLR-4114 : add back commented out test with 10 second wait, add Per's new test, add test to ensure no two cores use the same index directory
          Hide
          Mark Miller added a comment -

          Add the checkNoTwoShardsUseTheSameIndexDir thing if you (also) find it usefull (see above)

          Missed that comment - I'll add this in to the mix.

          Show
          Mark Miller added a comment - Add the checkNoTwoShardsUseTheSameIndexDir thing if you (also) find it usefull (see above) Missed that comment - I'll add this in to the mix.
          Hide
          Mark Miller added a comment -

          Great patch! I just had to make a couple small tweaks to make the policeman happy - added license files for the test jars, changed the new test to extend LuceneTestCase. Good stuff found by the long but useful precommit ant target.

          I also added in the 10 second wait for now.

          I'll commit very shortly.

          Show
          Mark Miller added a comment - Great patch! I just had to make a couple small tweaks to make the policeman happy - added license files for the test jars, changed the new test to extend LuceneTestCase. Good stuff found by the long but useful precommit ant target. I also added in the 10 second wait for now. I'll commit very shortly.
          Hide
          Per Steffensen added a comment -

          Thanks Mark

          Guess what is left is:

          • Decide if you want to enable the call to "checkCollectionIsNotCreated" and set the wait in it to 10 sec (or so), or if you are prepared to drop this part of the test, now that you got the OverseerCollectionProcessorTest
          • Maybe make a comment on SOLR-4043 that you ought to modify the BasicDistributedZkTest.testCollectionsAPI test as a part of solving that issue.
          • Add the checkNoTwoShardsUseTheSameIndexDir thing if you (also) find it usefull (see above)
          • Add SOLR-4114_mocking_OverseerCollectionProcessorTest_branch_4x.patch
            But you probably want to read through the (last of the) comments above to make sure we didnt miss something we promissed (eash other and/or Robert Muir)

          Regards, Per Steffensen

          Show
          Per Steffensen added a comment - Thanks Mark Guess what is left is: Decide if you want to enable the call to "checkCollectionIsNotCreated" and set the wait in it to 10 sec (or so), or if you are prepared to drop this part of the test, now that you got the OverseerCollectionProcessorTest Maybe make a comment on SOLR-4043 that you ought to modify the BasicDistributedZkTest.testCollectionsAPI test as a part of solving that issue. Add the checkNoTwoShardsUseTheSameIndexDir thing if you (also) find it usefull (see above) Add SOLR-4114 _mocking_OverseerCollectionProcessorTest_branch_4x.patch But you probably want to read through the (last of the) comments above to make sure we didnt miss something we promissed (eash other and/or Robert Muir) Regards, Per Steffensen
          Hide
          Mark Miller added a comment -

          Hey Per - thanks! I'll try and get back to helping finish up this issue - at a minimum I'll get a commit or two in on it today.

          Show
          Mark Miller added a comment - Hey Per - thanks! I'll try and get back to helping finish up this issue - at a minimum I'll get a commit or two in on it today.
          Hide
          Per Steffensen added a comment -

          I promissed Robert Muir to make a test of the feature introduced here in SOLR-4114 as a unit-test directly on OverseerCollectionProcessor. I did this in attached SOLR-4114_mocking_OverseerCollectionProcessorTest_branch_4x.patch. It fits on top of revision 1420194 of branch_4x, but shouldnt be hard to port to other branches, since it is basically just a new test OverseerCollectionProcessorTest.

          Besides the new test, OverseerCollectionProcessor has been modified a little in order to easily be able to extend it in the test.

          OverseerCollectionProcessorTest tests OverseerCollectionProcessor alone, by mocking the components it interacts with directly:

          • DistributedQueue - the work-queue with messages from ZK
          • ZkStateReader
          • ClusterState
          • ShardHandler - the component handling/distributing the CoreAdmin requests comming out of OverseerCollectionProcessor.

          I wanted to use mockito but found that you are already using easymock, so I decided to use that. I had to upgrade easymock from version 2.0 to version 3.0, because I wanted to mock classes (not only interfaces) - nothing is interfaces in Solr. Guess no one would mind that.

          OverseerCollectionProcessorTest tests a few things including the feature introduced here in SOLR-4114, and to some extend eliminates the additional test-parts added to BasicDistributedZkTest here in SOLR-4114. A.o. the controversial 10-60 sec wait test

          int liveNodes = getCommonCloudSolrServer().getZkStateReader().getClusterState().getLiveNodes().size();
          int numShards = (liveNodes/2) + 1;
          int numReplica = 1;
          int maxShardsPerNode = 1;
          collectionInfos = new HashMap<String,List<Integer>>();
          createCollection(collectionInfos, cnt, numShards, numReplica, maxShardsPerNode);
          checkCollectionIsNotCreated(collectionInfos.keySet().iterator().next());
          

          OverseerCollectionProcessorTest establishes a nice platform for testing OverseerCollectionProcessor on unit-level using mocking, and can probably be extended to further eliminate tests in BasicDistributedZkTest.testCollectionAPI. And it can be extended to do more than just "create"-tests - also do "reload"-tests and "remove"-tests.

          Show
          Per Steffensen added a comment - I promissed Robert Muir to make a test of the feature introduced here in SOLR-4114 as a unit-test directly on OverseerCollectionProcessor. I did this in attached SOLR-4114 _mocking_OverseerCollectionProcessorTest_branch_4x.patch. It fits on top of revision 1420194 of branch_4x, but shouldnt be hard to port to other branches, since it is basically just a new test OverseerCollectionProcessorTest. Besides the new test, OverseerCollectionProcessor has been modified a little in order to easily be able to extend it in the test. OverseerCollectionProcessorTest tests OverseerCollectionProcessor alone, by mocking the components it interacts with directly: DistributedQueue - the work-queue with messages from ZK ZkStateReader ClusterState ShardHandler - the component handling/distributing the CoreAdmin requests comming out of OverseerCollectionProcessor. I wanted to use mockito but found that you are already using easymock, so I decided to use that. I had to upgrade easymock from version 2.0 to version 3.0, because I wanted to mock classes (not only interfaces) - nothing is interfaces in Solr. Guess no one would mind that. OverseerCollectionProcessorTest tests a few things including the feature introduced here in SOLR-4114 , and to some extend eliminates the additional test-parts added to BasicDistributedZkTest here in SOLR-4114 . A.o. the controversial 10-60 sec wait test int liveNodes = getCommonCloudSolrServer().getZkStateReader().getClusterState().getLiveNodes().size(); int numShards = (liveNodes/2) + 1; int numReplica = 1; int maxShardsPerNode = 1; collectionInfos = new HashMap< String ,List< Integer >>(); createCollection(collectionInfos, cnt, numShards, numReplica, maxShardsPerNode); checkCollectionIsNotCreated(collectionInfos.keySet().iterator().next()); OverseerCollectionProcessorTest establishes a nice platform for testing OverseerCollectionProcessor on unit-level using mocking, and can probably be extended to further eliminate tests in BasicDistributedZkTest.testCollectionAPI. And it can be extended to do more than just "create"-tests - also do "reload"-tests and "remove"-tests.
          Hide
          Per Steffensen added a comment -

          I verified that the removal of my "controlled" instance-dir and data-dir OverseerCollectionProcessor.createCollection is ok. I needed to do some investigations on how instance-dir and data-dir works. Now I know and can see that the "controlled" instance-dir and data-dir was a bad idea. Thanks for being so thorough, Mark.

          During my investigations of instance-dir and data-dir I came up with an additional test for BasicDistributedZkTest.testCollectionAPI, namely to do a test making sure that when you have created a lot of collections you will not end up with any two (or more) shards using the same index-dir - that was actually what I was affraid would happen when you (Mark) removed the "controlled" instance- and data-dir. This additional test-part will run very fast (200 ms on my local machine), so it will not extend the run-time of the test noticeably to include it. Instead of sending a patch I will just explain what to do to get this additional testing into BasicDistributedZkTest (this description works on 4.0, but I couldnt imagine that it wouldnt on 5.x or 4.x):

          • Add this method somewhere in BasicDistributedZkTest
              private void checkNoTwoShardsUseTheSameIndexDir() throws Exception {
                Map<String, Set<String>> indexDirToShardNamesMap = new HashMap<String, Set<String>>();
                
                List<MBeanServer> servers = new LinkedList<MBeanServer>();
                servers.add(ManagementFactory.getPlatformMBeanServer());
                servers.addAll(MBeanServerFactory.findMBeanServer(null));
                for (final MBeanServer server : servers) {
                  Set<ObjectName> mbeans = new HashSet<ObjectName>();
                  mbeans.addAll(server.queryNames(null, null));
                  for (final ObjectName mbean : mbeans) {
                    Object value;
                    Object indexDir;
                    Object name;
                    try {
                      if (((value = server.getAttribute(mbean, "category")) != null && value.toString().equals(Category.CORE.toString())) &&
                          ((value = server.getAttribute(mbean, "source")) != null && value.toString().contains(SolrCore.class.getSimpleName())) &&
                          ((indexDir = server.getAttribute(mbean, "indexDir")) != null) &&
                          ((name = server.getAttribute(mbean, "name")) != null)) {
                          if (!indexDirToShardNamesMap.containsKey(indexDir.toString())) {
                            indexDirToShardNamesMap.put(indexDir.toString(), new HashSet<String>());
                          }
                          indexDirToShardNamesMap.get(indexDir.toString()).add(name.toString());
                      }
                    } catch (Exception e) {
                      // ignore, just continue - probably a "category" or "source" attribute not found
                    }
                  }
                }
                
                assertTrue("Something is broken in the assert for no shards using the same indexDir - probably something was changed in the attributes published in the MBean of " + SolrCore.class.getSimpleName(), indexDirToShardNamesMap.size() > 0);
                for (Entry<String, Set<String>> entry : indexDirToShardNamesMap.entrySet()) {
                  if (entry.getValue().size() > 1) {
                    fail("We have shards using the same indexDir. E.g. shards " + entry.getValue().toString() + " all use indexDir " + entry.getKey());
                  }
                }
                
              }
            
          • Add a call to this method (checkNoTwoShardsUseTheSameIndexDir() at the end of BasicDistributedZkTest.testCollectionsAPI
          • Add the line "lst.add("indexDir", getIndexDir());" to SolrCore.getStatistics() so that index-dir will also be part of the information exposed in the MBean of SolrCore

          Please consider including the additional test. It scans all SolrCores in the system to see if any of them share index-dir. I do the scanning by accessing MBean info from SolrCores - the simplest way I could come up with. It means that SolrCore will now also expose index-dir through its MBean, but I guess no one would have anything against that.

          Regards, Per Steffensen

          Show
          Per Steffensen added a comment - I verified that the removal of my "controlled" instance-dir and data-dir OverseerCollectionProcessor.createCollection is ok. I needed to do some investigations on how instance-dir and data-dir works. Now I know and can see that the "controlled" instance-dir and data-dir was a bad idea. Thanks for being so thorough, Mark. During my investigations of instance-dir and data-dir I came up with an additional test for BasicDistributedZkTest.testCollectionAPI, namely to do a test making sure that when you have created a lot of collections you will not end up with any two (or more) shards using the same index-dir - that was actually what I was affraid would happen when you (Mark) removed the "controlled" instance- and data-dir. This additional test-part will run very fast (200 ms on my local machine), so it will not extend the run-time of the test noticeably to include it. Instead of sending a patch I will just explain what to do to get this additional testing into BasicDistributedZkTest (this description works on 4.0, but I couldnt imagine that it wouldnt on 5.x or 4.x): Add this method somewhere in BasicDistributedZkTest private void checkNoTwoShardsUseTheSameIndexDir() throws Exception { Map< String , Set< String >> indexDirToShardNamesMap = new HashMap< String , Set< String >>(); List<MBeanServer> servers = new LinkedList<MBeanServer>(); servers.add(ManagementFactory.getPlatformMBeanServer()); servers.addAll(MBeanServerFactory.findMBeanServer( null )); for ( final MBeanServer server : servers) { Set<ObjectName> mbeans = new HashSet<ObjectName>(); mbeans.addAll(server.queryNames( null , null )); for ( final ObjectName mbean : mbeans) { Object value; Object indexDir; Object name; try { if (((value = server.getAttribute(mbean, "category" )) != null && value.toString().equals(Category.CORE.toString())) && ((value = server.getAttribute(mbean, "source" )) != null && value.toString().contains(SolrCore.class.getSimpleName())) && ((indexDir = server.getAttribute(mbean, "indexDir" )) != null ) && ((name = server.getAttribute(mbean, "name" )) != null )) { if (!indexDirToShardNamesMap.containsKey(indexDir.toString())) { indexDirToShardNamesMap.put(indexDir.toString(), new HashSet< String >()); } indexDirToShardNamesMap.get(indexDir.toString()).add(name.toString()); } } catch (Exception e) { // ignore, just continue - probably a "category" or "source" attribute not found } } } assertTrue( "Something is broken in the assert for no shards using the same indexDir - probably something was changed in the attributes published in the MBean of " + SolrCore.class.getSimpleName(), indexDirToShardNamesMap.size() > 0); for (Entry< String , Set< String >> entry : indexDirToShardNamesMap.entrySet()) { if (entry.getValue().size() > 1) { fail( "We have shards using the same indexDir. E.g. shards " + entry.getValue().toString() + " all use indexDir " + entry.getKey()); } } } Add a call to this method (checkNoTwoShardsUseTheSameIndexDir() at the end of BasicDistributedZkTest.testCollectionsAPI Add the line "lst.add("indexDir", getIndexDir());" to SolrCore.getStatistics() so that index-dir will also be part of the information exposed in the MBean of SolrCore Please consider including the additional test. It scans all SolrCores in the system to see if any of them share index-dir. I do the scanning by accessing MBean info from SolrCores - the simplest way I could come up with. It means that SolrCore will now also expose index-dir through its MBean, but I guess no one would have anything against that. Regards, Per Steffensen
          Hide
          Per Steffensen added a comment -

          I'm personally okay with adding like a 10 second wait until that gets in.

          So please do it.
          Enable the call to "checkCollectionIsNotCreated" and set the wait in it to 10 sec.
          Maybe also make a comment on SOLR-4043 that you ought to modify the BasicDistributedZkTest.testCollectionsAPI test as a part of solving this issue.

          Show
          Per Steffensen added a comment - I'm personally okay with adding like a 10 second wait until that gets in. So please do it. Enable the call to "checkCollectionIsNotCreated" and set the wait in it to 10 sec. Maybe also make a comment on SOLR-4043 that you ought to modify the BasicDistributedZkTest.testCollectionsAPI test as a part of solving this issue.
          Hide
          Mark Miller added a comment -

          SOLR-4043 (collection api responses) should make the missing test much easier it seems to me. I'm personally okay with adding like a 10 second wait until that gets in.

          Show
          Mark Miller added a comment - SOLR-4043 (collection api responses) should make the missing test much easier it seems to me. I'm personally okay with adding like a 10 second wait until that gets in.
          Hide
          Mark Miller added a comment -

          vetoing those commits before they can be added at all,

          Vetos are not made for that and IMO would not apply, but okay...

          Show
          Mark Miller added a comment - vetoing those commits before they can be added at all, Vetos are not made for that and IMO would not apply, but okay...
          Hide
          Mark Miller added a comment -

          A veto requires a formal announcement accompanied with a valid technical argument. I know you are used to throwing them around rather casually, but generally, -1, +1 are used to indicate votes of direction at Apache.

          This test in particular is actually many tests - compiled together to save the time of setting up and tearing down lots of jetties.

          If you were to @Nightly it in the name of so called "progress", I would break it up into many tests, each under a minute, and increase the total test run time. I'd prefer it was all faster this way, but I'm okay with that way too.

          @Slow was introduced the last time these discussion came up and if you don't want to run these tests, I suggest you take advantage of it.

          Show
          Mark Miller added a comment - A veto requires a formal announcement accompanied with a valid technical argument. I know you are used to throwing them around rather casually, but generally, -1, +1 are used to indicate votes of direction at Apache. This test in particular is actually many tests - compiled together to save the time of setting up and tearing down lots of jetties. If you were to @Nightly it in the name of so called "progress", I would break it up into many tests, each under a minute, and increase the total test run time. I'd prefer it was all faster this way, but I'm okay with that way too. @Slow was introduced the last time these discussion came up and if you don't want to run these tests, I suggest you take advantage of it.
          Hide
          Robert Muir added a comment -

          thats ok if you want to veto progress mark. I really think we can be reasonable with solr tests and do:

          • reasonably good unit tests that are simpler and easy to debug
          • nightly-annotated slow tests that are more like the solr tests of today.

          This would probably remove a lot of the frustration lucene developers have with solr tests.

          In all cases, its fine if people want to veto the @Nightly annotations of existing tests.

          But these will be the very last slow solr tests we have in our build, because I already mentioned I'm going to raise issues about new tests added with huge sleeps, vetoing those commits before they can be added at all, so the test situation does not continue to get even worse.

          I'm trying to take a stand against the always-failing/super-slow test situation of today. Its been like this for far too long.

          Show
          Robert Muir added a comment - thats ok if you want to veto progress mark. I really think we can be reasonable with solr tests and do: reasonably good unit tests that are simpler and easy to debug nightly-annotated slow tests that are more like the solr tests of today. This would probably remove a lot of the frustration lucene developers have with solr tests. In all cases, its fine if people want to veto the @Nightly annotations of existing tests. But these will be the very last slow solr tests we have in our build, because I already mentioned I'm going to raise issues about new tests added with huge sleeps, vetoing those commits before they can be added at all, so the test situation does not continue to get even worse. I'm trying to take a stand against the always-failing/super-slow test situation of today. Its been like this for far too long.
          Hide
          Mark Miller added a comment -

          I think if a test takes double-digit seconds then it should be @Nightly.

          -1 to running all these solrcloud tests only as nightly.

          Show
          Mark Miller added a comment - I think if a test takes double-digit seconds then it should be @Nightly. -1 to running all these solrcloud tests only as nightly.
          Hide
          Robert Muir added a comment -

          Unfortunately the @Slow is enabled by default.

          I think if a test takes double-digit seconds then it should be @Nightly.
          I looked and its strange that no solr tests are never marked this way, lets start here.

          We do it with lots of lucene tests, keeping the regular runs fast.

          Show
          Robert Muir added a comment - Unfortunately the @Slow is enabled by default. I think if a test takes double-digit seconds then it should be @Nightly. I looked and its strange that no solr tests are never marked this way, lets start here. We do it with lots of lucene tests, keeping the regular runs fast.
          Hide
          Per Steffensen added a comment -

          Add the unit tests for "good" checks during normal test runs

          Ok. I like a good discussion, that that will be your prize for participating so eagerly Will try to find time to do it soon.

          Also add the slower test but mark it @Nightly. So it runs at night in jenkins tests runs.

          So we agree to commit the test with a 10-20 sec wait?

          Yeah I would rather not add the @Nightly, since I do not really know your test-target setup (nightly, pre-commit, etc). I certainly agree that such a slow test needs to be run at night, but actually I guessed that the @Slow annotation was for that. The test is already marked as @Slow. I certainly can add the @Nightly annotation if people agree that it should be added to BasicDistributedZkTest?

          Show
          Per Steffensen added a comment - Add the unit tests for "good" checks during normal test runs Ok. I like a good discussion, that that will be your prize for participating so eagerly Will try to find time to do it soon. Also add the slower test but mark it @Nightly. So it runs at night in jenkins tests runs. So we agree to commit the test with a 10-20 sec wait? Yeah I would rather not add the @Nightly, since I do not really know your test-target setup (nightly, pre-commit, etc). I certainly agree that such a slow test needs to be run at night, but actually I guessed that the @Slow annotation was for that. The test is already marked as @Slow. I certainly can add the @Nightly annotation if people agree that it should be added to BasicDistributedZkTest?
          Hide
          Robert Muir added a comment -

          But i feel like there is a compromise:

          Add the unit tests for "good" checks during normal test runs.
          Also add the slower test but mark it @Nightly. So it runs at night in jenkins tests runs.

          Show
          Robert Muir added a comment - But i feel like there is a compromise: Add the unit tests for "good" checks during normal test runs. Also add the slower test but mark it @Nightly. So it runs at night in jenkins tests runs.
          Hide
          Per Steffensen added a comment - - edited

          Agree with everything you just said!

          But when I see arguments to add an e.g. 60 second test like this, I felt the need to speak up

          I understand, and it is not that I like this kind of test! I just want the feature tested and protected from someone else ruining it tomorrow. And I cant come up with another way of testing that nothing happens, than wait for a while, and assert that it did not. And IMHO unit-tests directly on OverseerCollectionProcessor is not enough to ensure that the functionality seen as a whole from the outside will not be broken - and it IS the main concern (if we had a real customer )

          Show
          Per Steffensen added a comment - - edited Agree with everything you just said! But when I see arguments to add an e.g. 60 second test like this, I felt the need to speak up I understand, and it is not that I like this kind of test! I just want the feature tested and protected from someone else ruining it tomorrow. And I cant come up with another way of testing that nothing happens, than wait for a while, and assert that it did not. And IMHO unit-tests directly on OverseerCollectionProcessor is not enough to ensure that the functionality seen as a whole from the outside will not be broken - and it IS the main concern (if we had a real customer )
          Hide
          Robert Muir added a comment -

          Per I don't want to drown out your issue either. But when I see arguments to add an e.g. 60 second test like this, I felt the need to speak up. Solr has a lot of these tests today (many fail on a daily basis), but I'm not sure how much use they really are.

          If tests are failing constantly and nobody is fixing them: then there is a problem

          Look: I'm totally fine with such tests being annotated @Nightly and running on jenkins at night (as long as they are reliable and debuggable). But slowness itself presents another barrier for someone else trying to debug the test.

          So I think its important to have quicker, simpler unit tests to encourage refactoring and good APIs. Solr is really missing this.

          And of course for the record I agree with you about the whole issue of refactoring in general. I feel like refactoring in solr is not really encouraged, because there is no faith in the test suite. So its safer to just stick with the status quo instead. This is how projects die.

          Show
          Robert Muir added a comment - Per I don't want to drown out your issue either. But when I see arguments to add an e.g. 60 second test like this, I felt the need to speak up. Solr has a lot of these tests today (many fail on a daily basis), but I'm not sure how much use they really are. If tests are failing constantly and nobody is fixing them: then there is a problem Look: I'm totally fine with such tests being annotated @Nightly and running on jenkins at night (as long as they are reliable and debuggable). But slowness itself presents another barrier for someone else trying to debug the test. So I think its important to have quicker, simpler unit tests to encourage refactoring and good APIs. Solr is really missing this. And of course for the record I agree with you about the whole issue of refactoring in general. I feel like refactoring in solr is not really encouraged, because there is no faith in the test suite. So its safer to just stick with the status quo instead. This is how projects die.
          Hide
          Per Steffensen added a comment -

          but it should be used as an adjunct, not a replacement for hard-core unit testing

          Guess I kinda agree, just dont overdue the hard-core unit testing, and certainly not forget to do the behavioural/integration tests.

          Solr is screaming out for a "mocking" capability so that more "integration" testing can be done at the unit test level

          Uhhh yes. Mocking is a nice way of having your "integration" tests not include the whole world. E.g. an a test included the entire request handling, seen from the outside down to the OverseerCollectionProcessor, but where we mock the calls to the Core Admin API and just assert that the requests forwarded to the Core Admin API is as expected. Mocking capability in Solr will take tests to the next level, but why not just start using it? - e.g. mockito is just go-use.

          its hard to tell if its just an unrelated sporatic fail, because the test suite is unreliable in general

          Yes, my impression is also that the tests are unreliable - sometimes fail and sometimes not, and it is really hard to know if you did something wrong. But I will still claim that it is not because of integration tests - it might be because they, as so many other things in Solr, are done in a undisciplined way.

          Well everyone have their experience and believes and I am always glad to discuss with you guys. Glad that you decided to really join Robert, even though you thought that there where nothing to discuss. But I guess I have stated my opinion now, and I dont want to do any more debating here on this issue/ticket. If you want to continue on the mailing-list I will probably join, but this issue/ticket should be about this issue/ticket from now on. We still have unhandled matters (e.g. controlling instance-dir and/or data-dir), and I dont want the discussion about those matters to drown in "partly" unrelated discussions about test-strategies.

          Show
          Per Steffensen added a comment - but it should be used as an adjunct, not a replacement for hard-core unit testing Guess I kinda agree, just dont overdue the hard-core unit testing, and certainly not forget to do the behavioural/integration tests. Solr is screaming out for a "mocking" capability so that more "integration" testing can be done at the unit test level Uhhh yes. Mocking is a nice way of having your "integration" tests not include the whole world. E.g. an a test included the entire request handling, seen from the outside down to the OverseerCollectionProcessor, but where we mock the calls to the Core Admin API and just assert that the requests forwarded to the Core Admin API is as expected. Mocking capability in Solr will take tests to the next level, but why not just start using it? - e.g. mockito is just go-use. its hard to tell if its just an unrelated sporatic fail, because the test suite is unreliable in general Yes, my impression is also that the tests are unreliable - sometimes fail and sometimes not, and it is really hard to know if you did something wrong. But I will still claim that it is not because of integration tests - it might be because they, as so many other things in Solr, are done in a undisciplined way. Well everyone have their experience and believes and I am always glad to discuss with you guys. Glad that you decided to really join Robert, even though you thought that there where nothing to discuss. But I guess I have stated my opinion now, and I dont want to do any more debating here on this issue/ticket. If you want to continue on the mailing-list I will probably join, but this issue/ticket should be about this issue/ticket from now on. We still have unhandled matters (e.g. controlling instance-dir and/or data-dir), and I dont want the discussion about those matters to drown in "partly" unrelated discussions about test-strategies.
          Hide
          Per Steffensen added a comment -

          Its not working out: and its exactly due to this attitude (which must be changed)

          Its also my impression that is does not work out, we just do not share opinion about reason.

          you contradict yourself. you say only the "seen from the outside" matters, yet discourage the unit tests that make it easier to develop new APIs, and the unit tests that "test" actual usage of the API. This is critical to refactoring.

          No I do not contradict myself - you misunderstand me. If an API can be used "from the outside" it certainly needs test-coverage - whether or not you see it as unit or behavioural tests. The collection-creation/reload/removal featurs of OverseerCollectionProcesser (the one i question here) is not accessable from "the outside". The Collection API is, which is triggerede by sending a request getting handled by CollectionHandler etc. eventaully submitting a "job" for the OverseerCollectionProcessor. This should (also) be tested by sending such request "from the outside" to a complete Solr cluster, and assert that the expected collection/shards ends up being created/reloaded/removed, and in case of create that you can index data into the new collection and search the data up again, and i case of remove that the collections/shards disappear from the system (ZK and that data-dirs are delete if that is what you asked for) etc.

          If you want to supplement with tests directly on OverseerCollectionProcessor that is fine. But such tests are mainly usefull during the development process, and not to ensure that no one in the future breaks the feature you introduced. The feature seen from "the outside" is typically unchanged during refactoring, and the feature seen from "the outside" is what matters. Say we some day decides that collection creation shouldnt really be handled asynchronously by the Overseer, but that we want to handle it synchronously before responding to the one that sent the collection creation request. In that case OverseerCollectionProcessor will probably be deleted (yes most of the code will still remain, but will probably be moved/restructured to other classes/units somewhere else), and there will be tests in a OverseerCollectionProcessorTest that needs to be moved and changed, and it is not certain that the one doing the refactor "gets" (the reason or point) of all the tests of OverseerCollectionProcessor or that he is able to tweak them to "simular" tests of the new components handling the same aspects. The behavioral test does not need to change, as it ensures that the feature seen "from the outside" did not change, and that is very important doing refactoring.

          As I said, I am certainly not against "unit"-tests, but is is mainly a "working"-tool for the developers. Behavioural tests are the ones that ensure, that your system as a whole works as it is supposed to - and whether you want it or not, it IS what creating a system is all about.
          Guess you would realize a thing or two by working on a project for a real custumor setting up real demands. Those demands are all requirements to the system as a whole seen from "the outside". He doesnt care about the internal working of the system. Our testers work a lot with the custumor (or the PO representing him) to work out behavioural tests to make sure we fulfill his needs and requirements. We do lots of unit-tests in our project also, but have NO problem what-so-ever refactoring all the time. So I can tell you that behavioural tests and daring to do constant refactoring can go hand in hand. It is a little harder when you have only unit-tests.

          Start thinking about Solr as a product you need to deliver to some "artificial" customer, and try to think the way he would think - only system-level behaviour matters to him. "Unit"-tests are "working"-tools for the developers.

          I'm also against more tests with sleeps

          As I said "me too". But rather test a feature the way it can be tested, than not testing it.

          you can expect to see me vote on commits that have huge sleeps

          Uhh I hope not, you just said you where against sleeps. But I guess mean "you cannot..."

          Show
          Per Steffensen added a comment - Its not working out: and its exactly due to this attitude (which must be changed) Its also my impression that is does not work out, we just do not share opinion about reason. you contradict yourself. you say only the "seen from the outside" matters, yet discourage the unit tests that make it easier to develop new APIs, and the unit tests that "test" actual usage of the API. This is critical to refactoring. No I do not contradict myself - you misunderstand me. If an API can be used "from the outside" it certainly needs test-coverage - whether or not you see it as unit or behavioural tests. The collection-creation/reload/removal featurs of OverseerCollectionProcesser (the one i question here) is not accessable from "the outside". The Collection API is, which is triggerede by sending a request getting handled by CollectionHandler etc. eventaully submitting a "job" for the OverseerCollectionProcessor. This should (also) be tested by sending such request "from the outside" to a complete Solr cluster, and assert that the expected collection/shards ends up being created/reloaded/removed, and in case of create that you can index data into the new collection and search the data up again, and i case of remove that the collections/shards disappear from the system (ZK and that data-dirs are delete if that is what you asked for) etc. If you want to supplement with tests directly on OverseerCollectionProcessor that is fine. But such tests are mainly usefull during the development process, and not to ensure that no one in the future breaks the feature you introduced. The feature seen from "the outside" is typically unchanged during refactoring, and the feature seen from "the outside" is what matters. Say we some day decides that collection creation shouldnt really be handled asynchronously by the Overseer, but that we want to handle it synchronously before responding to the one that sent the collection creation request. In that case OverseerCollectionProcessor will probably be deleted (yes most of the code will still remain, but will probably be moved/restructured to other classes/units somewhere else), and there will be tests in a OverseerCollectionProcessorTest that needs to be moved and changed, and it is not certain that the one doing the refactor "gets" (the reason or point) of all the tests of OverseerCollectionProcessor or that he is able to tweak them to "simular" tests of the new components handling the same aspects. The behavioral test does not need to change, as it ensures that the feature seen "from the outside" did not change, and that is very important doing refactoring. As I said, I am certainly not against "unit"-tests, but is is mainly a "working"-tool for the developers. Behavioural tests are the ones that ensure, that your system as a whole works as it is supposed to - and whether you want it or not, it IS what creating a system is all about. Guess you would realize a thing or two by working on a project for a real custumor setting up real demands. Those demands are all requirements to the system as a whole seen from "the outside". He doesnt care about the internal working of the system. Our testers work a lot with the custumor (or the PO representing him) to work out behavioural tests to make sure we fulfill his needs and requirements. We do lots of unit-tests in our project also, but have NO problem what-so-ever refactoring all the time. So I can tell you that behavioural tests and daring to do constant refactoring can go hand in hand. It is a little harder when you have only unit-tests. Start thinking about Solr as a product you need to deliver to some "artificial" customer, and try to think the way he would think - only system-level behaviour matters to him. "Unit"-tests are "working"-tools for the developers. I'm also against more tests with sleeps As I said "me too". But rather test a feature the way it can be tested, than not testing it. you can expect to see me vote on commits that have huge sleeps Uhh I hope not, you just said you where against sleeps. But I guess mean "you cannot..."
          Hide
          Robert Muir added a comment -

          I agree with Jack about the mocking, its really needed.

          I feel like with solr cloud it could help a lot, e.g. simulate this guy going down and that guy and not deal with timing issues and so on.
          Just like lucene doesn't actually write continuously to your disk until its really full to TestIndexWriterOnDiskFull, it mocks the disk full.
          Sure these mock techniques aren't perfect, but they are much easier to debug.

          for "real integration tests" maybe junit isnt even the best tool for the job anyway, so i would prefer if these were separate from the unit tests.

          These integration tests are especially frustrating for lucene developers, who seriously dont want to break solr when they change things underneath it. but the test suite doesn't really allow this you know, and when something does break its hard to tell if its just an unrelated sporatic fail, because the test suite is unreliable in general.

          There is just no replacement for good unit tests.

          Show
          Robert Muir added a comment - I agree with Jack about the mocking, its really needed. I feel like with solr cloud it could help a lot, e.g. simulate this guy going down and that guy and not deal with timing issues and so on. Just like lucene doesn't actually write continuously to your disk until its really full to TestIndexWriterOnDiskFull, it mocks the disk full. Sure these mock techniques aren't perfect, but they are much easier to debug. for "real integration tests" maybe junit isnt even the best tool for the job anyway, so i would prefer if these were separate from the unit tests. These integration tests are especially frustrating for lucene developers, who seriously dont want to break solr when they change things underneath it. but the test suite doesn't really allow this you know, and when something does break its hard to tell if its just an unrelated sporatic fail, because the test suite is unreliable in general. There is just no replacement for good unit tests.
          Hide
          Jack Krupansky added a comment -

          Definition of integration testing: A process where you spend 75% to 95% of your time (and the time of people tracking down test failures) GUESSING what numbers to use for sleeps.

          I am a fan of integration testing, but it should be used as an adjunct, not a replacement for hard-core unit testing.

          Solr is screaming out for a "mocking" capability so that more "integration" testing can be done at the unit test level. Mocking can also improve testing by varying the characteristics of dependencies in a controlled manner rather than have integration tests that test only a narrow range of characteristics that vary between environments and over time in an unpredictable and non-repeatable manner. I mean, it would be nice if we could develop components that were less sensitive to changes in the performance of components that they depend on.

          Show
          Jack Krupansky added a comment - Definition of integration testing: A process where you spend 75% to 95% of your time (and the time of people tracking down test failures) GUESSING what numbers to use for sleeps. I am a fan of integration testing, but it should be used as an adjunct, not a replacement for hard-core unit testing. Solr is screaming out for a "mocking" capability so that more "integration" testing can be done at the unit test level. Mocking can also improve testing by varying the characteristics of dependencies in a controlled manner rather than have integration tests that test only a narrow range of characteristics that vary between environments and over time in an unpredictable and non-repeatable manner. I mean, it would be nice if we could develop components that were less sensitive to changes in the performance of components that they depend on.
          Hide
          Robert Muir added a comment -

          I'm also against more tests with sleeps. you can expect to see me vote on commits that have huge sleeps.

          Show
          Robert Muir added a comment - I'm also against more tests with sleeps. you can expect to see me vote on commits that have huge sleeps.
          Hide
          Robert Muir added a comment -

          Per: you contradict yourself. you say only the "seen from the outside" matters, yet discourage the unit tests that make it easier to develop new APIs, and the unit tests that "test" actual usage of the API. This is critical to refactoring.

          Its a bonus that you have to change unit tests when you refactor APIs, its like having little example apps that use your apis to "test" if the api change has some quirks and so on. you are forced to actually use it.

          Show
          Robert Muir added a comment - Per: you contradict yourself. you say only the "seen from the outside" matters, yet discourage the unit tests that make it easier to develop new APIs, and the unit tests that "test" actual usage of the API. This is critical to refactoring. Its a bonus that you have to change unit tests when you refactor APIs, its like having little example apps that use your apis to "test" if the api change has some quirks and so on. you are forced to actually use it.
          Hide
          Robert Muir added a comment -

          Its not working out: and its exactly due to this attitude (which must be changed).

          (release manager who had to release solr 4.0, where its own test suite didnt pass because no one really gives a shit about this stuff)

          Show
          Robert Muir added a comment - Its not working out: and its exactly due to this attitude (which must be changed). (release manager who had to release solr 4.0, where its own test suite didnt pass because no one really gives a shit about this stuff)
          Hide
          Per Steffensen added a comment - - edited

          I don't even think we need to argue about it really.

          Well if that is your opinion you will have a problem working on big projects. You should find a private project to work on in your basement. I think it is worth arguing about. Maybe not on this issue/ticket, but in general in the community on the dev mailing list. I am surrounded by professional testers in my everyday work, and what I hear from them is more and more pulling towards behavioural testing.

          Currently solr has a lot of integration tests, how is that working out?!

          I dont know how it works out, but if you see a lot of problems, I wouldnt blame the integration-test over unit-test strategy (is such a strategy exists). I have a gut fealing about then main problem of Solr is that no one ever dare to do refactoring - the code is a mess. And that you really do not "trust your test-suite".

          If you want to to be able to "trust you test-suite" enough to dare doing big refactorings, integration/behavioural tests are by far the best. Typically when you do major refactoring you do not change the system-behaviour seen from the outside. You basically re-organize the code internally in order to be able to keep adding features, fixing bugs etc. without getting too confused. Therefore "integration/behavioural" tests do not have to be changed during a big refactor, and the ensure that your refactoring did not ruin functionality "seen from the outside" (is it IS basically all you care about). Unit-tests usually have to be changed during a major refactoring, because a big refactor often includes getting rid of existing "units", splitting up existing "units", adding new "units" etc.

          Show
          Per Steffensen added a comment - - edited I don't even think we need to argue about it really. Well if that is your opinion you will have a problem working on big projects. You should find a private project to work on in your basement. I think it is worth arguing about. Maybe not on this issue/ticket, but in general in the community on the dev mailing list. I am surrounded by professional testers in my everyday work, and what I hear from them is more and more pulling towards behavioural testing. Currently solr has a lot of integration tests, how is that working out?! I dont know how it works out, but if you see a lot of problems, I wouldnt blame the integration-test over unit-test strategy (is such a strategy exists). I have a gut fealing about then main problem of Solr is that no one ever dare to do refactoring - the code is a mess. And that you really do not "trust your test-suite". If you want to to be able to "trust you test-suite" enough to dare doing big refactorings, integration/behavioural tests are by far the best. Typically when you do major refactoring you do not change the system-behaviour seen from the outside. You basically re-organize the code internally in order to be able to keep adding features, fixing bugs etc. without getting too confused. Therefore "integration/behavioural" tests do not have to be changed during a big refactor, and the ensure that your refactoring did not ruin functionality "seen from the outside" (is it IS basically all you care about). Unit-tests usually have to be changed during a major refactoring, because a big refactor often includes getting rid of existing "units", splitting up existing "units", adding new "units" etc.
          Hide
          Robert Muir added a comment -

          I don't even think we need to argue about it really. Currently solr has a lot of integration tests, how is that working out?!

          Look at solr's queryparser tests. they index documents and run actual queries, instead of simply assertEquals(expectedQuery, actual).

          This provides no benefit, and only makes it harder to debug. instead of seeing what is different about the query structure, you typically get "xpath failed" and have to start digging.

          This is just the simplest possible Captain Obvious example. The problem runs much deeper.

          Show
          Robert Muir added a comment - I don't even think we need to argue about it really. Currently solr has a lot of integration tests, how is that working out?! Look at solr's queryparser tests. they index documents and run actual queries, instead of simply assertEquals(expectedQuery, actual). This provides no benefit, and only makes it harder to debug. instead of seeing what is different about the query structure, you typically get "xpath failed" and have to start digging. This is just the simplest possible Captain Obvious example. The problem runs much deeper.
          Hide
          Per Steffensen added a comment - - edited

          Guess there are two "movements" in the industry at the moment

          • TDD: Mostly focused at testing "units" (single components), hoping that if all "units" work as they are supposed to, the entire system where all those "units" are used in combination will also work as it is supposed to
          • BDD: Mostly focused at testing "behaviour" of a system seen from the outside, because that is basically all you care about. No one cares how the system works internally. Everyone cares about how the system works as a whole, when used to the extend you can use it "from the outside". This is especially true for the end users of the system, and at the end of the day a system is created to fullfil the users needs.

          Integration-level tests aren't nearly as useful

          I completely disagree with that.

          I, personally, are not that much into "unit"-tests because thay basically do not show that the system as a whole "behaves" as it is supposed to. Behavioural/integration-tests does, where you test against an entire system using it the way it can be used "from the outside", and asserting that "result" is as expected seen "from the outside".

          and much more difficult to debug

          You are right about that. I like "unit"-tests to supplement behavioural/integration-tests whenever it is found to be necessary. We can add a OverseerCollectionProcessor "unit"-test, testing some of this new functionality, but in my mind (as I said) "it will not ensure the correct system-level functionality to nearly the same degree", and basically thats all we are interrested in.

          If the community would like we can supplement with "unit"-tests for this new feature, but you will have to fight me (FWIW) to get rid of the integration-ish test of the feature.

          Show
          Per Steffensen added a comment - - edited Guess there are two "movements" in the industry at the moment TDD: Mostly focused at testing "units" (single components), hoping that if all "units" work as they are supposed to, the entire system where all those "units" are used in combination will also work as it is supposed to BDD: Mostly focused at testing "behaviour" of a system seen from the outside, because that is basically all you care about. No one cares how the system works internally. Everyone cares about how the system works as a whole, when used to the extend you can use it "from the outside". This is especially true for the end users of the system, and at the end of the day a system is created to fullfil the users needs. Integration-level tests aren't nearly as useful I completely disagree with that. I, personally, are not that much into "unit"-tests because thay basically do not show that the system as a whole "behaves" as it is supposed to. Behavioural/integration-tests does, where you test against an entire system using it the way it can be used "from the outside", and asserting that "result" is as expected seen "from the outside". and much more difficult to debug You are right about that. I like "unit"-tests to supplement behavioural/integration-tests whenever it is found to be necessary. We can add a OverseerCollectionProcessor "unit"-test, testing some of this new functionality, but in my mind (as I said) "it will not ensure the correct system-level functionality to nearly the same degree", and basically thats all we are interrested in. If the community would like we can supplement with "unit"-tests for this new feature, but you will have to fight me (FWIW) to get rid of the integration-ish test of the feature.
          Hide
          Robert Muir added a comment -

          If we go do a more unit-test-ish test directly on OverseerCollectionProcessor we might be able to do something faster, but it will not ensure the correct system-level functionality to nearly the same degree.

          But these tests are really necessary.

          I think solr has enough "integration" tests and not enough unit tests already. This makes it difficult to change solr code: because there is no direct correlation with the functionality and tests.

          Integration-level tests aren't nearly as useful, and much more difficult to debug.

          Show
          Robert Muir added a comment - If we go do a more unit-test-ish test directly on OverseerCollectionProcessor we might be able to do something faster, but it will not ensure the correct system-level functionality to nearly the same degree. But these tests are really necessary. I think solr has enough "integration" tests and not enough unit tests already. This makes it difficult to change solr code: because there is no direct correlation with the functionality and tests. Integration-level tests aren't nearly as useful, and much more difficult to debug.
          Hide
          Per Steffensen added a comment - - edited

          I suppose a 10 second sleep is more agreeable, but these things add up and and I'd rather come up with a better test.

          I would rather too, but believe it is hard to make sure something does not happen, without giving it a chance to happen, and check that it did not. So there is no better way of testing it, as I see it. At least if you want this kind of integration-ish test, where you start a full system and do something to it as if you where a real client acting against a real system. And I like this kind of tests. If we go do a more unit-test-ish test directly on OverseerCollectionProcessor we might be able to do something faster, but it will not ensure the correct system-level functionality to nearly the same degree.

          I think you should commit with a 10-20 sec wait, and then if you (or someone else) can come up with a faster way for testing it properly, it is fine for me to make the change. I do not believe I will be able to come up with a proper test of this that is faster. But protect the feature "the slow way" until someone comes up with a faster way of protecting it.

          Show
          Per Steffensen added a comment - - edited I suppose a 10 second sleep is more agreeable, but these things add up and and I'd rather come up with a better test. I would rather too, but believe it is hard to make sure something does not happen, without giving it a chance to happen, and check that it did not. So there is no better way of testing it, as I see it. At least if you want this kind of integration-ish test, where you start a full system and do something to it as if you where a real client acting against a real system. And I like this kind of tests. If we go do a more unit-test-ish test directly on OverseerCollectionProcessor we might be able to do something faster, but it will not ensure the correct system-level functionality to nearly the same degree. I think you should commit with a 10-20 sec wait, and then if you (or someone else) can come up with a faster way for testing it properly, it is fine for me to make the change. I do not believe I will be able to come up with a proper test of this that is faster. But protect the feature "the slow way" until someone comes up with a faster way of protecting it.
          Hide
          Mark Miller added a comment -

          I'm happy to commit a test for this, but lets come up with something that doesn't have a long sleep like this.
          I suppose a 10 second sleep is more agreeable, but these things add up and and I'd rather come up with a better test.

          Show
          Mark Miller added a comment - I'm happy to commit a test for this, but lets come up with something that doesn't have a long sleep like this. I suppose a 10 second sleep is more agreeable, but these things add up and and I'd rather come up with a better test.
          Hide
          Per Steffensen added a comment -

          Ok, will you please consider enabling the 60 sec test (maybe reduce it to 10 or 30 sec) so that the feature is protected until someone comes up with a better/faster test. Please!!!

          Show
          Per Steffensen added a comment - Ok, will you please consider enabling the 60 sec test (maybe reduce it to 10 or 30 sec) so that the feature is protected until someone comes up with a better/faster test. Please!!!
          Hide
          Mark Miller added a comment -

          Still easier to just take it in

          For me it's easier to not take it in I have to vet what I take in. I think you will find it easier to get stuff in if your refactoring is related to the issue. Otherwise make a refactoring issue.

          Trust your test-suite

          It's not my test-suite, it's a huge shard test-suite and I don't blindly trust it. It certainly doesn't test everything.

          Show
          Mark Miller added a comment - Still easier to just take it in For me it's easier to not take it in I have to vet what I take in. I think you will find it easier to get stuff in if your refactoring is related to the issue. Otherwise make a refactoring issue. Trust your test-suite It's not my test-suite, it's a huge shard test-suite and I don't blindly trust it. It certainly doesn't test everything.
          Hide
          Per Steffensen added a comment -

          We poll so that we wait up to 120 seconds for a slow comp, but a fast comp won't need to wait nearly that long. The 60 wait hits everyone no matter what. We generally try avoid any hard waits like that. I understand you can't really poll in this case, so I'm not sure the best way to test that - I made it a TODO for now.

          Ok with a TODO, but it should be about making a more clever solution than the 60 sec wait. You commented out the assert-method "checkCollectionIsNotCreated", which means that we now have an unprotected feature in the code-base. Anyone can go ruin this feature tomorrow and no one will notice. Yes, I believe the main value of tests is their ability to "protect" other people from accidently ruining existing fuctionality. All the comments below are very unimportant compared to this - I am really serious about this. Get "checkCollectionIsNotCreated" running now so that the code is protected, then think about a more clever solution later (if you think such a thing exists). TODO's have a tendency to be forgotten.

          Yeah, I minimized the patch down to just dealing with this issue. I'm away from home and just looking to get this issue completed with minimum fuss.

          Then the easiest thing would probably have been, to take everything in, except for things you thought would actually ruin existing stuff. Instead of using time to find every little detail that could be left out. Do not misunderstand me, I am glad you used your time to get it committed, but I also want to influence the way you committers work, whenever I have the chance. Only thinking about our common interrest - a good Solr. I have a bad gut feeling that the code-base is so messy because no one ever refactors. Refactoring is something you usually do while solving some specific (potentially) unrelated task. No one goes refactoring just to do refactoring, but it is extremely important that refactoring has an easy way into the code-base.

          'nodeUrlWithoutProtocolPart' is just so long and I didn't see it helping much in terms of code readability - if you have a better fitting name that is a little less verbose, I think I'd be more into it.

          Well, first of all a long saying name is much better than a short misleading name, and second of all that name really isnt very long

          Yeah, I don't think I agree with changing the users params on him - I'd rather warn and let the user do what he wants to do rather than trying to outthink him ahead of time. If he decides he wants more than one repica on an instance for some reason, that's his deal - we can warn him though.

          Ok, cool

          Right - it should match collection1 - eg newcollection/data should be the data dir just like collection1/data and rather than something like newcollection_data. In my experience and testing, data ends up in the cores own instance dir - not some common dir.

          Didnt learn much from this explanation, but I will have to do a little studying on instance-dir and data-dir to understand how your solution will ever work. I will get back with an official complaint (just a comment here or a mail on the mailing list ) if I still do not think it will work after I have done my studying.

          Yup - it seemed unrelated and I'm busy so I didn't want to think about it...

          Still easier to just take it in, unless you saw it harming more than it helped. I am worried about refactoring in this project! Trust your test-suite

          Show
          Per Steffensen added a comment - We poll so that we wait up to 120 seconds for a slow comp, but a fast comp won't need to wait nearly that long. The 60 wait hits everyone no matter what. We generally try avoid any hard waits like that. I understand you can't really poll in this case, so I'm not sure the best way to test that - I made it a TODO for now. Ok with a TODO, but it should be about making a more clever solution than the 60 sec wait. You commented out the assert-method "checkCollectionIsNotCreated", which means that we now have an unprotected feature in the code-base. Anyone can go ruin this feature tomorrow and no one will notice. Yes, I believe the main value of tests is their ability to "protect" other people from accidently ruining existing fuctionality. All the comments below are very unimportant compared to this - I am really serious about this. Get "checkCollectionIsNotCreated" running now so that the code is protected, then think about a more clever solution later (if you think such a thing exists). TODO's have a tendency to be forgotten. Yeah, I minimized the patch down to just dealing with this issue. I'm away from home and just looking to get this issue completed with minimum fuss. Then the easiest thing would probably have been, to take everything in, except for things you thought would actually ruin existing stuff. Instead of using time to find every little detail that could be left out. Do not misunderstand me, I am glad you used your time to get it committed, but I also want to influence the way you committers work, whenever I have the chance. Only thinking about our common interrest - a good Solr. I have a bad gut feeling that the code-base is so messy because no one ever refactors. Refactoring is something you usually do while solving some specific (potentially) unrelated task. No one goes refactoring just to do refactoring, but it is extremely important that refactoring has an easy way into the code-base. 'nodeUrlWithoutProtocolPart' is just so long and I didn't see it helping much in terms of code readability - if you have a better fitting name that is a little less verbose, I think I'd be more into it. Well, first of all a long saying name is much better than a short misleading name, and second of all that name really isnt very long Yeah, I don't think I agree with changing the users params on him - I'd rather warn and let the user do what he wants to do rather than trying to outthink him ahead of time. If he decides he wants more than one repica on an instance for some reason, that's his deal - we can warn him though. Ok, cool Right - it should match collection1 - eg newcollection/data should be the data dir just like collection1/data and rather than something like newcollection_data. In my experience and testing, data ends up in the cores own instance dir - not some common dir. Didnt learn much from this explanation, but I will have to do a little studying on instance-dir and data-dir to understand how your solution will ever work. I will get back with an official complaint (just a comment here or a mail on the mailing list ) if I still do not think it will work after I have done my studying. Yup - it seemed unrelated and I'm busy so I didn't want to think about it... Still easier to just take it in, unless you saw it harming more than it helped. I am worried about refactoring in this project! Trust your test-suite
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Mark Robert Miller
          http://svn.apache.org/viewvc?view=revision&revision=1417070

          SOLR-4114: Allow creating more than one shard per instance with the Collection API.

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1417070 SOLR-4114 : Allow creating more than one shard per instance with the Collection API.
          Hide
          Commit Tag Bot added a comment -

          [trunk commit] Mark Robert Miller
          http://svn.apache.org/viewvc?view=revision&revision=1417045

          SOLR-4114: Allow creating more than one shard per instance with the Collection API.

          Show
          Commit Tag Bot added a comment - [trunk commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1417045 SOLR-4114 : Allow creating more than one shard per instance with the Collection API.
          Hide
          Mark Miller added a comment -

          In checkForCollection we are willing to wait up to 120 secs (used to be 60 secs) for the collection to be created (correctly). So it is kinda obvious to also wait at least 60 secs to verify that nothing related to a collection was actually created in cases where you want to be sure that nothing is created.

          We poll so that we wait up to 120 seconds for a slow comp, but a fast comp won't need to wait nearly that long. The 60 wait hits everyone no matter what. We generally try avoid any hard waits like that. I understand you can't really poll in this case, so I'm not sure the best way to test that - I made it a TODO for now.

          Show
          Mark Miller added a comment - In checkForCollection we are willing to wait up to 120 secs (used to be 60 secs) for the collection to be created (correctly). So it is kinda obvious to also wait at least 60 secs to verify that nothing related to a collection was actually created in cases where you want to be sure that nothing is created. We poll so that we wait up to 120 seconds for a slow comp, but a fast comp won't need to wait nearly that long. The 60 wait hits everyone no matter what. We generally try avoid any hard waits like that. I understand you can't really poll in this case, so I'm not sure the best way to test that - I made it a TODO for now.
          Hide
          Mark Miller added a comment -

          Yeah, I minimized the patch down to just dealing with this issue. I'm away from home and just looking to get this issue completed with minimum fuss. 'nodeUrlWithoutProtocolPart' is just so long and I didn't see it helping much in terms of code readability - if you have a better fitting name that is a little less verbose, I think I'd be more into it.

          I see that you removed the "auto-reduce replica-per-shard t

          Yeah, I don't think I agree with changing the users params on him - I'd rather warn and let the user do what he wants to do rather than trying to outthink him ahead of time. If he decides he wants more than one repica on an instance for some reason, that's his deal - we can warn him though.

          You removed the following lines (because you just want default-values for instance-dir and data-dir)

          Right - it should match collection1 - eg newcollection/data should be the data dir just like collection1/data and rather than something like newcollection_data. In my experience and testing, data ends up in the cores own instance dir - not some common dir.

          making "shardToUrls"-map (renamed) concurrency protected

          Yup - it seemed unrelated and I'm busy so I didn't want to think about it. My goal is to get the essence of this thing committed - it's a lot easier to then fight over smaller pieces on top of that. Progress not perfection.

          Show
          Mark Miller added a comment - Yeah, I minimized the patch down to just dealing with this issue. I'm away from home and just looking to get this issue completed with minimum fuss. 'nodeUrlWithoutProtocolPart' is just so long and I didn't see it helping much in terms of code readability - if you have a better fitting name that is a little less verbose, I think I'd be more into it. I see that you removed the "auto-reduce replica-per-shard t Yeah, I don't think I agree with changing the users params on him - I'd rather warn and let the user do what he wants to do rather than trying to outthink him ahead of time. If he decides he wants more than one repica on an instance for some reason, that's his deal - we can warn him though. You removed the following lines (because you just want default-values for instance-dir and data-dir) Right - it should match collection1 - eg newcollection/data should be the data dir just like collection1/data and rather than something like newcollection_data. In my experience and testing, data ends up in the cores own instance dir - not some common dir. making "shardToUrls"-map (renamed) concurrency protected Yup - it seemed unrelated and I'm busy so I didn't want to think about it. My goal is to get the essence of this thing committed - it's a lot easier to then fight over smaller pieces on top of that. Progress not perfection.
          Hide
          Per Steffensen added a comment -

          We have to do something else other than a one minute sleep in the test.

          In checkForCollection we are willing to wait up to 120 secs (used to be 60 secs) for the collection to be created (correctly). So it is kinda obvious to also wait at least 60 secs to verify that nothing related to a collection was actually created in cases where you want to be sure that nothing is created. In the case of waiting for a collection being created correctly you can skip the waiting as soon as you see that is has been created correctly. In the case where you want to see that nothing gets created you have no chance to do that. I think the 60 sec wait fits well into the way the test already works.

          I'd like the default data dir to remain /data. If you are not doing anything special that works fine.

          Thats also fine, but as soon at you create more than one shard on the same node, dont you need different data-dirs? You cant have two shards using the same folder for lucene-index-files, right? Did I get the instance-dir and data-dir concepts wrong? Or is it the instance-dir you want to vary between different shards? I would say that running only one shard on a node IS "doing something special".

          I'd also like the ability to override the data dir on per shard basis, but I'm not sure what the best API for that is.

          Yes, me too, but that has to be another ticket/issue. For now, until the API allowes to control the data-dir-naming, it should be ok for the algorithm to decide for something reasonable by itself.

          So I'd like to support what you want, but I don't want to change the default behavior.

          I agree, but if I got the concepts correct you cannot use "/data" as data-dir for more than one shard on each node. So default behaviour about using "/data" as data-dir will not work as soon as you run more than one shard on a node. I probably got something wrong - please try to explain.

          My latest patch - I'll commit this soon and we can iterate from there.

          Well I would prefer you committed my patch, and we can iterate from there It will also make it much easier to get SOLR-4120 in, which I hope you will also consider.

          Had a quick peek at your patch and have the following comments

          • I see that you removed the "auto-reduce replica-per-shard to never have more than one replica of the same shard on the same node"-feature and just issue a warning instead in OverseerCollectionProcessor (the "if (numShardsPerSlice > nodeList.size())"-thingy). It is ok for me, eventhough I believe it is pointless to replicate data to the same machine and under the same Solr instance. But then you probably need to change the BasicDistrbutedZkTest also - in checkCollectionExceptations I believe you'll need to change from
            int expectedShardsPerSlice = (Math.min(clusterState.getLiveNodes().size(), numShardsNumReplicaList.get(1) + 1));
            

            to

            int expectedShardsPerSlice = numShardsNumReplicaList.get(1) + 1;
            
          • You removed the following lines (because you just want default-values for instance-dir and data-dir)
            params.set(CoreAdminParams.INSTANCE_DIR, ".");
            params.set(CoreAdminParams.DATA_DIR, shardName + "_data");
            

            I still do not understand how that will work, but hopefully you will explain

          • You didnt like my rename of variable "replica" to "nodeUrlWithoutProtocolPart" in OverseerCollectionProcessor.createCollection. As I said on mailing-list I dont like the term "replica" as a replacement for what we used to call "shards", because I think it will cause misunderstandings, as "replica" is also (in my mind) a role played at runtime. But getting the terms right and reflect them correctly in API, variable-names etc. across the code must be another issue/ticket. But here in this specific example "replica" is just a very bad name, because the variable is not even containing a "replica-url", which would require the shard/replica-name to be postfixed to the string. So this "replica"-variable is closest to being an node-url (without the protocol part) - NOT a shard/replica-url. I would accept my name-change if I where you, but I have a motto of "carefully choosing my fights" and this is a fight I will not fight for very long
          • I see that you did not include my changes to HttpShardHandler, making "shardToUrls"-map (renamed) concurrency protected through getURLs-method (renamed to getFullURLs), so that you do not have to use the map so carefully outside. I understand that it has very little to do with this issue SOLR-4114, but it is a nice modification. Please consider committing it - maybe related to another issue/ticket. It is little bit of a problem that good refactoring does not easy get in as part of issues/tickets not requiring the refactor. If refactoring is hard to get in, it might be the cause that the code base is a mess (and I think it is)
          Show
          Per Steffensen added a comment - We have to do something else other than a one minute sleep in the test. In checkForCollection we are willing to wait up to 120 secs (used to be 60 secs) for the collection to be created (correctly). So it is kinda obvious to also wait at least 60 secs to verify that nothing related to a collection was actually created in cases where you want to be sure that nothing is created. In the case of waiting for a collection being created correctly you can skip the waiting as soon as you see that is has been created correctly. In the case where you want to see that nothing gets created you have no chance to do that. I think the 60 sec wait fits well into the way the test already works. I'd like the default data dir to remain /data. If you are not doing anything special that works fine. Thats also fine, but as soon at you create more than one shard on the same node, dont you need different data-dirs? You cant have two shards using the same folder for lucene-index-files, right? Did I get the instance-dir and data-dir concepts wrong? Or is it the instance-dir you want to vary between different shards? I would say that running only one shard on a node IS "doing something special". I'd also like the ability to override the data dir on per shard basis, but I'm not sure what the best API for that is. Yes, me too, but that has to be another ticket/issue. For now, until the API allowes to control the data-dir-naming, it should be ok for the algorithm to decide for something reasonable by itself. So I'd like to support what you want, but I don't want to change the default behavior. I agree, but if I got the concepts correct you cannot use "/data" as data-dir for more than one shard on each node. So default behaviour about using "/data" as data-dir will not work as soon as you run more than one shard on a node. I probably got something wrong - please try to explain. My latest patch - I'll commit this soon and we can iterate from there. Well I would prefer you committed my patch, and we can iterate from there It will also make it much easier to get SOLR-4120 in, which I hope you will also consider. Had a quick peek at your patch and have the following comments I see that you removed the "auto-reduce replica-per-shard to never have more than one replica of the same shard on the same node"-feature and just issue a warning instead in OverseerCollectionProcessor (the "if (numShardsPerSlice > nodeList.size())"-thingy). It is ok for me, eventhough I believe it is pointless to replicate data to the same machine and under the same Solr instance. But then you probably need to change the BasicDistrbutedZkTest also - in checkCollectionExceptations I believe you'll need to change from int expectedShardsPerSlice = ( Math .min(clusterState.getLiveNodes().size(), numShardsNumReplicaList.get(1) + 1)); to int expectedShardsPerSlice = numShardsNumReplicaList.get(1) + 1; You removed the following lines (because you just want default-values for instance-dir and data-dir) params.set(CoreAdminParams.INSTANCE_DIR, "." ); params.set(CoreAdminParams.DATA_DIR, shardName + "_data" ); I still do not understand how that will work, but hopefully you will explain You didnt like my rename of variable "replica" to "nodeUrlWithoutProtocolPart" in OverseerCollectionProcessor.createCollection. As I said on mailing-list I dont like the term "replica" as a replacement for what we used to call "shards", because I think it will cause misunderstandings, as "replica" is also (in my mind) a role played at runtime. But getting the terms right and reflect them correctly in API, variable-names etc. across the code must be another issue/ticket. But here in this specific example "replica" is just a very bad name, because the variable is not even containing a "replica-url", which would require the shard/replica-name to be postfixed to the string. So this "replica"-variable is closest to being an node-url (without the protocol part) - NOT a shard/replica-url. I would accept my name-change if I where you, but I have a motto of "carefully choosing my fights" and this is a fight I will not fight for very long I see that you did not include my changes to HttpShardHandler, making "shardToUrls"-map (renamed) concurrency protected through getURLs-method (renamed to getFullURLs), so that you do not have to use the map so carefully outside. I understand that it has very little to do with this issue SOLR-4114 , but it is a nice modification. Please consider committing it - maybe related to another issue/ticket. It is little bit of a problem that good refactoring does not easy get in as part of issues/tickets not requiring the refactor. If refactoring is hard to get in, it might be the cause that the code base is a mess (and I think it is)
          Hide
          Mark Miller added a comment -

          My latest patch - I'll commit this soon and we can iterate from there.

          Show
          Mark Miller added a comment - My latest patch - I'll commit this soon and we can iterate from there.
          Hide
          Mark Miller added a comment - - edited

          Thanks for the corrected patch!

          Two things I'd like to address before committing:

          1.
          + // nocommit
          + Thread.sleep(60000);
          We have to do something else other than a one minute sleep in the test.

          2. Setting the shard as part of the data dir.

          I'd like the default data dir to remain /data. If you are not doing anything special that works fine.
          I'd also like the ability to override the data dir on per shard basis, but I'm not sure what the best API for that is.

          So I'd like to support what you want, but I don't want to change the default behavior.

          Show
          Mark Miller added a comment - - edited Thanks for the corrected patch! Two things I'd like to address before committing: 1. + // nocommit + Thread.sleep(60000); We have to do something else other than a one minute sleep in the test. 2. Setting the shard as part of the data dir. I'd like the default data dir to remain /data. If you are not doing anything special that works fine. I'd also like the ability to override the data dir on per shard basis, but I'm not sure what the best API for that is. So I'd like to support what you want, but I don't want to change the default behavior.
          Hide
          Mark Miller added a comment - - edited

          consider backporting to 4.x

          Currently, all of 'my' work is targeted towards 4.x - with the caveat that some trickier stuff might bake in 5x before being back ported.

          Show
          Mark Miller added a comment - - edited consider backporting to 4.x Currently, all of 'my' work is targeted towards 4.x - with the caveat that some trickier stuff might bake in 5x before being back ported.
          Hide
          Per Steffensen added a comment -

          Hope you will commit, and consider backporting to 4.x, since we expect to upgrade to 4.1 when it is released, and we would really like this feature to be included.

          Show
          Per Steffensen added a comment - Hope you will commit, and consider backporting to 4.x, since we expect to upgrade to 4.1 when it is released, and we would really like this feature to be included.
          Hide
          Per Steffensen added a comment - - edited

          Here is the patch for trunk (5.x). The main mistake was that you didnt used the calculated shardName as the shardName - instead you used collectionName. This caused different shards on the same node to share name and data-dir - not so cool

          Show
          Per Steffensen added a comment - - edited Here is the patch for trunk (5.x). The main mistake was that you didnt used the calculated shardName as the shardName - instead you used collectionName. This caused different shards on the same node to share name and data-dir - not so cool
          Hide
          Mark Miller added a comment -

          Should be against 5x - I'm going to US west coast for a week - so not sure when I'll get back to this - I may try and get it going while I'm out there and I may not have time till I get back.

          Show
          Mark Miller added a comment - Should be against 5x - I'm going to US west coast for a week - so not sure when I'll get back to this - I may try and get it going while I'm out there and I may not have time till I get back.
          Hide
          Per Steffensen added a comment -

          Where does your patch fit, Mark?

          Show
          Per Steffensen added a comment - Where does your patch fit, Mark?
          Hide
          Mark Miller added a comment -

          Here is a patch of my quick attempted merge. The test fails in the collections api test while waiting for recoveries to finish after creating a collection(s).

          Show
          Mark Miller added a comment - Here is a patch of my quick attempted merge. The test fails in the collections api test while waiting for recoveries to finish after creating a collection(s).
          Hide
          Per Steffensen added a comment -

          Thanks, Mark. Let me know if I can help in any way. It is not that big a patch, but I did take the opportunity to do some refactoring - you know the bell in my head preventing me from just doing ctrl-c/ctrl-v This makes the patch a little bigger. Let me know if you want me to make a patch fitting on top of 4.x or 5.x.

          Show
          Per Steffensen added a comment - Thanks, Mark. Let me know if I can help in any way. It is not that big a patch, but I did take the opportunity to do some refactoring - you know the bell in my head preventing me from just doing ctrl-c/ctrl-v This makes the patch a little bigger. Let me know if you want me to make a patch fitting on top of 4.x or 5.x.
          Hide
          Mark Miller added a comment -

          I started working on patching this into recent stuff, and it's more of a pain than I thought. I must have missed some piece as I tried to merge it up and the test is failing. Giving up for tonight.

          Show
          Mark Miller added a comment - I started working on patching this into recent stuff, and it's more of a pain than I thought. I must have missed some piece as I tried to merge it up and the test is failing. Giving up for tonight.
          Hide
          Per Steffensen added a comment -

          Created another ticket for the spread-shards-according-to-provided-list thingy. SOLR-4120

          Show
          Per Steffensen added a comment - Created another ticket for the spread-shards-according-to-provided-list thingy. SOLR-4120
          Hide
          Mark Miller added a comment -

          Well, it makes things a little more painful in that I have to merge it to 4x/5x, but I can do that. It's probably not too difficult.

          Show
          Mark Miller added a comment - Well, it makes things a little more painful in that I have to merge it to 4x/5x, but I can do that. It's probably not too difficult.
          Hide
          Per Steffensen added a comment -

          so no, generally nothing is being back ported to the 4.0 branch

          Well, I guess the essence of my question is, if it is ok that I keep providing patches relative to lucene_solr_4_0? At least for this issue and the spread-shards-across-provided-list-of-solrs one?

          Show
          Per Steffensen added a comment - so no, generally nothing is being back ported to the 4.0 branch Well, I guess the essence of my question is, if it is ok that I keep providing patches relative to lucene_solr_4_0? At least for this issue and the spread-shards-across-provided-list-of-solrs one?
          Hide
          Mark Miller added a comment -

          But is it also going to be backported to lucene_solr_4_0

          Given past discussion, it's very unlikely that we will release a 4.0.1 (I was for it FWIW) and will just do a 4.1 - so no, generally nothing is being back ported to the 4.0 branch.

          If we did end up deciding to do a 4.0.1, then we would select which issues should go in and then do those back ports later.

          Show
          Mark Miller added a comment - But is it also going to be backported to lucene_solr_4_0 Given past discussion, it's very unlikely that we will release a 4.0.1 (I was for it FWIW) and will just do a 4.1 - so no, generally nothing is being back ported to the 4.0 branch. If we did end up deciding to do a 4.0.1, then we would select which issues should go in and then do those back ports later.
          Hide
          Per Steffensen added a comment -

          5x and then merged to 4x - just that small fix though - have not had a chance to review this patch fully yet.

          But is it also going to be backported to lucene_solr_4_0, which is actually the branch I am working on top of?

          Show
          Per Steffensen added a comment - 5x and then merged to 4x - just that small fix though - have not had a chance to review this patch fully yet. But is it also going to be backported to lucene_solr_4_0, which is actually the branch I am working on top of?
          Hide
          Per Steffensen added a comment -

          Well Im off for today. Will probably (if my POs head does not turn green) be making the spread-shards-according-to-provided-list feature tomorrow. If you commit the entire patch for SOLR-4114 it will be easier for me to provide a new patch for this new feature and attach it to a new issue

          Show
          Per Steffensen added a comment - Well Im off for today. Will probably (if my POs head does not turn green) be making the spread-shards-according-to-provided-list feature tomorrow. If you commit the entire patch for SOLR-4114 it will be easier for me to provide a new patch for this new feature and attach it to a new issue
          Hide
          Mark Miller added a comment -

          On which branch are you committing, Mark?

          5x and then merged to 4x - just that small fix though - have not had a chance to review this patch fully yet.

          Show
          Mark Miller added a comment - On which branch are you committing, Mark? 5x and then merged to 4x - just that small fix though - have not had a chance to review this patch fully yet.
          Hide
          Per Steffensen added a comment - - edited

          I would too, but we would still disagree on what that meant since I would interpret the "number of times the data is replicated"...

          I actually agree with you. I just dont like "replication" to part of the name for it then. If we rename "replication-factor" to "number-of-copies" or even "number-of-replica" or something I would be much happier changing the semantics of it But really, this is another issue.

          Show
          Per Steffensen added a comment - - edited I would too, but we would still disagree on what that meant since I would interpret the "number of times the data is replicated"... I actually agree with you. I just dont like "replication" to part of the name for it then. If we rename "replication-factor" to "number-of-copies" or even "number-of-replica" or something I would be much happier changing the semantics of it But really, this is another issue.
          Hide
          Per Steffensen added a comment -

          I've committed the shared params issue under SOLR-4055 and added Per to the Changes entry.

          On which branch are you committing, Mark?

          Show
          Per Steffensen added a comment - I've committed the shared params issue under SOLR-4055 and added Per to the Changes entry. On which branch are you committing, Mark?
          Hide
          Yonik Seeley added a comment -

          I would expect "replication-factor" to say something about how many times the data is REPLICATED.

          I would too, but we would still disagree on what that meant since I would interpret the "number of times the data is replicated" to mean the total number of copies that exist after a write operation to the cluster. That seems to be the much more common interpretation in this context since there is no "original"... everyone has stored/indexed a copy.

          $ echo hello > file1.txt
          $ cp file1.txt file2.txt

          How many copies of the file are there? If you look at the state (and not the mechanism by which you arrived there) most would say there are 2 copies.
          In one interpretation, there is only one "copy", but that's too literal and assignes some special category to the original.

          http://hadoop.apache.org/docs/r0.20.2/hdfs_design.html
          "The number of copies of a file is called the replication factor of that file."

          http://www.datastax.com/docs/1.0/cluster_architecture/replication
          "The total number of replicas across the cluster is referred to as the replication factor. A replication factor of 1 means that there is only one copy of each row on one node."

          Oracle NoSQL store:
          http://docs.oracle.com/cd/NOSQL/html/AdminGuide/introduction.html#replicationfactor
          http://docs.oracle.com/cd/NOSQL/html/AdminGuide/store-config.html
          "A Replication Factor of 3 gives you shards with one master plus two replicas."

          Riak:
          http://wiki.basho.com/What-is-Riak%3F.html
          "An n value of 3 (default) means that each object is replicated 3 times. When an object’s key is mapped onto a given partition, Riak won’t stop there – it automatically replicates the data onto the next two partitions as well."

          Splunk:
          http://docs.splunk.com/Documentation/Splunk/latest/Indexer/Thereplicationfactor
          "The number of data/bucket copies is called the cluster's replication factor."
          "The cluster can tolerate a failure of (replication factor - 1) peer nodes. So, for example, to ensure that your system can tolerate a failure of two peers, you must configure a replication factor of 3, which means that the cluster stores three identical copies of each bucket on separate nodes. With a replication factor of 3, you can be certain that all your data will be available if no more than two peer nodes in the cluster fail. With two nodes down, you still have one complete copy of your data available on the remaining peer(s)."

          It's clear that "3 copies" means 3 total instances of the same data, not 4 (an "original" plus 3 more copies of it.)

          Show
          Yonik Seeley added a comment - I would expect "replication-factor" to say something about how many times the data is REPLICATED. I would too, but we would still disagree on what that meant since I would interpret the "number of times the data is replicated" to mean the total number of copies that exist after a write operation to the cluster. That seems to be the much more common interpretation in this context since there is no "original"... everyone has stored/indexed a copy. $ echo hello > file1.txt $ cp file1.txt file2.txt How many copies of the file are there? If you look at the state (and not the mechanism by which you arrived there) most would say there are 2 copies. In one interpretation, there is only one "copy", but that's too literal and assignes some special category to the original. http://hadoop.apache.org/docs/r0.20.2/hdfs_design.html "The number of copies of a file is called the replication factor of that file." http://www.datastax.com/docs/1.0/cluster_architecture/replication "The total number of replicas across the cluster is referred to as the replication factor. A replication factor of 1 means that there is only one copy of each row on one node." Oracle NoSQL store: http://docs.oracle.com/cd/NOSQL/html/AdminGuide/introduction.html#replicationfactor http://docs.oracle.com/cd/NOSQL/html/AdminGuide/store-config.html "A Replication Factor of 3 gives you shards with one master plus two replicas." Riak: http://wiki.basho.com/What-is-Riak%3F.html "An n value of 3 (default) means that each object is replicated 3 times. When an object’s key is mapped onto a given partition, Riak won’t stop there – it automatically replicates the data onto the next two partitions as well." Splunk: http://docs.splunk.com/Documentation/Splunk/latest/Indexer/Thereplicationfactor "The number of data/bucket copies is called the cluster's replication factor." "The cluster can tolerate a failure of (replication factor - 1) peer nodes. So, for example, to ensure that your system can tolerate a failure of two peers, you must configure a replication factor of 3, which means that the cluster stores three identical copies of each bucket on separate nodes. With a replication factor of 3, you can be certain that all your data will be available if no more than two peer nodes in the cluster fail. With two nodes down, you still have one complete copy of your data available on the remaining peer(s)." It's clear that "3 copies" means 3 total instances of the same data, not 4 (an "original" plus 3 more copies of it.)
          Hide
          Per Steffensen added a comment -

          When grabbing the params fix I noticed you set the data dir to something like shardname+_data - that's not strictly necessary right? Since each core should have it's own instance dir

          Well I use the same instance-dir for all shards, but a different data-dir - this is just how we used to do it in my project, but it can be changed. As long as the code uses same instance-dir different data-dirs are necessary though.

          Show
          Per Steffensen added a comment - When grabbing the params fix I noticed you set the data dir to something like shardname+_data - that's not strictly necessary right? Since each core should have it's own instance dir Well I use the same instance-dir for all shards, but a different data-dir - this is just how we used to do it in my project, but it can be changed. As long as the code uses same instance-dir different data-dirs are necessary though.
          Hide
          Mark Miller added a comment -

          I've committed the shared params issue under SOLR-4055 and added Per to the Changes entry.

          Show
          Mark Miller added a comment - I've committed the shared params issue under SOLR-4055 and added Per to the Changes entry.
          Hide
          Per Steffensen added a comment -

          Personally, I'm not really sold on this auto re balancing idea. I'd prefer the user had to explicitly make these moves.

          Me neither - and I can say that ES sometimes f.... it up. At least when I was working with it, but that was mainly because of bad re-balancing algoritms. But I like moving shards "manually" from a admin-console!

          Show
          Per Steffensen added a comment - Personally, I'm not really sold on this auto re balancing idea. I'd prefer the user had to explicitly make these moves. Me neither - and I can say that ES sometimes f.... it up. At least when I was working with it, but that was mainly because of bad re-balancing algoritms. But I like moving shards "manually" from a admin-console!
          Hide
          Per Steffensen added a comment -

          could not you do same thing as Elastic Search. Build index with number of shards (initial number is 5). If there is 1 machine in cluster, then all shards are on this machine. If you add more machines, they will move to other machines. It is way simple for administration.

          This moving shards around as more Solr servers join the cluster is the easiest way to provide elasticity (as I mentioned above somewhere). That is one of the reasons, that I want to be able to run multiple shards for a collection on the same Solr server. In that way you will have shards already to move to other Solrs that might join the cluster later.

          In Solr, right now, we dont have the abillity to move shards from one server to another (ES has it), but in order to be able to bennefit from such a future feature, you will need to be able have multiple shards on one Solr server. Alternatively you have to go split a shard, but that is much harder, and should only be used if you did not forsee, when you created your collection, that you would add more servers later, and therefore created your collection with multiple shards per server.

          Show
          Per Steffensen added a comment - could not you do same thing as Elastic Search. Build index with number of shards (initial number is 5). If there is 1 machine in cluster, then all shards are on this machine. If you add more machines, they will move to other machines. It is way simple for administration. This moving shards around as more Solr servers join the cluster is the easiest way to provide elasticity (as I mentioned above somewhere). That is one of the reasons, that I want to be able to run multiple shards for a collection on the same Solr server. In that way you will have shards already to move to other Solrs that might join the cluster later. In Solr, right now, we dont have the abillity to move shards from one server to another (ES has it), but in order to be able to bennefit from such a future feature, you will need to be able have multiple shards on one Solr server. Alternatively you have to go split a shard, but that is much harder, and should only be used if you did not forsee, when you created your collection, that you would add more servers later, and therefore created your collection with multiple shards per server.
          Hide
          Jack Krupansky added a comment -

          I certainly think of "replica" as a "copy of the ORIGINAL", which makes perfect sense in a master with n-slaves configuration, but in a fully distributed environment such as SolrCloud where the leader of a shard can vary over time and updates are distributed to all nodes all of the time, there is no longer the concept of an "original copy" of the data. If anything, the original data is the source data on the wire before it gets instantiated on each node. No node is truly the original.

          The terminology has this difficulty that it is only partially shared between the worlds of master/slave and the cloud. In master/slave, only the slaves are replicas and the master is the original, while in cloud ALL nodes are replicas since there are no originals. The "leader" is not a "master" copy of the data in the sense of master/slave.

          So, I guess I am semi-comfortable with replica referring to all instances of the data, but we do need to be careful to highlight the distinction between how the term replica is used in the world of master/slave vs. SolrCloud, especially since many Cloud users will be migrating from the world of master/slave.

          We also need to be careful not to refer to "leader and replicas" which implies that a leader is not a replica!

          Show
          Jack Krupansky added a comment - I certainly think of "replica" as a "copy of the ORIGINAL", which makes perfect sense in a master with n-slaves configuration, but in a fully distributed environment such as SolrCloud where the leader of a shard can vary over time and updates are distributed to all nodes all of the time, there is no longer the concept of an "original copy" of the data. If anything, the original data is the source data on the wire before it gets instantiated on each node. No node is truly the original. The terminology has this difficulty that it is only partially shared between the worlds of master/slave and the cloud. In master/slave, only the slaves are replicas and the master is the original, while in cloud ALL nodes are replicas since there are no originals. The "leader" is not a "master" copy of the data in the sense of master/slave. So, I guess I am semi-comfortable with replica referring to all instances of the data, but we do need to be careful to highlight the distinction between how the term replica is used in the world of master/slave vs. SolrCloud, especially since many Cloud users will be migrating from the world of master/slave. We also need to be careful not to refer to "leader and replicas" which implies that a leader is not a replica!
          Hide
          Mark Miller added a comment -

          you add more machines, they will move to other machines.

          Personally, I'm not really sold on this auto re balancing idea. I'd prefer the user had to explicitly make these moves.

          Show
          Mark Miller added a comment - you add more machines, they will move to other machines. Personally, I'm not really sold on this auto re balancing idea. I'd prefer the user had to explicitly make these moves.
          Hide
          Mark Miller added a comment -

          Can I add such a feature to this SOLR-4114 and include it in a combined patch, or do you prefer another ticket for this change?

          My preference would be a new issue. If it has to be done as one piece, I would wait for this to go in before supplying the patch for that issue. Or supply a patch for that issue and note that it requires applying this patch first. Combining multiple issues into one patch just makes it more difficult to get it in generally.

          Show
          Mark Miller added a comment - Can I add such a feature to this SOLR-4114 and include it in a combined patch, or do you prefer another ticket for this change? My preference would be a new issue. If it has to be done as one piece, I would wait for this to go in before supplying the patch for that issue. Or supply a patch for that issue and note that it requires applying this patch first. Combining multiple issues into one patch just makes it more difficult to get it in generally.
          Hide
          Radim Kolar added a comment -

          could not you do same thing as Elastic Search. Build index with number of shards (initial number is 5). If there is 1 machine in cluster, then all shards are on this machine. If you add more machines, they will move to other machines. It is way simple for administration.

          Show
          Radim Kolar added a comment - could not you do same thing as Elastic Search. Build index with number of shards (initial number is 5). If there is 1 machine in cluster, then all shards are on this machine. If you add more machines, they will move to other machines. It is way simple for administration.
          Hide
          Mark Miller added a comment -

          When grabbing the params fix I noticed you set the data dir to something like shardname+_data - that's not strictly necessary right? Since each core should have it's own instance dir?

          I've been thinking about how to set custom datadirs with this api - it would be nice to be able to specify the data dir - and in some cases perhaps base it on something like the core name rather than just some static string. But have you found it 'necessary' with your work?

          Show
          Mark Miller added a comment - When grabbing the params fix I noticed you set the data dir to something like shardname+_data - that's not strictly necessary right? Since each core should have it's own instance dir? I've been thinking about how to set custom datadirs with this api - it would be nice to be able to specify the data dir - and in some cases perhaps base it on something like the core name rather than just some static string. But have you found it 'necessary' with your work?
          Hide
          Per Steffensen added a comment -

          New patch SOLR-4114.patch attached (not including the only-spread-shards-over-solrs-mentioned-in-provided-list thingy)

          New, compared to the first patch:

          • maxShardsPerNode implemented
          • Tests (BasicDistributedZkTest.testCollectionAPI) now tests additional stuff
            • That the expected number of shards are actually created
            • That if there is not "room" for all the shards due to the provided maxShardsPerNode, nothing is created
          Show
          Per Steffensen added a comment - New patch SOLR-4114 .patch attached (not including the only-spread-shards-over-solrs-mentioned-in-provided-list thingy) New, compared to the first patch: maxShardsPerNode implemented Tests (BasicDistributedZkTest.testCollectionAPI) now tests additional stuff That the expected number of shards are actually created That if there is not "room" for all the shards due to the provided maxShardsPerNode, nothing is created
          Hide
          Per Steffensen added a comment - - edited

          Another more urgent problem (for me) is that I need to do another change to the Solr Collection API, before we can use it as a replacement for what we already do in our project (where we create each shard one by one in OUR code). We split our set of Solr servers into two subsets - Data-Solrs and Search-Solrs. The Search-Solrs are not supposed to carry any data and therefore to be occupied by indexing. Search-Solr instead play the role of receiving queries from the outside, sub-quering the Data-Solrs and combining the final total response to the "outside". Data-Solrs are where we create the data-carrying collections. Data-Solrs need more CPU and IO-capabilities while Search-Solrs need more RAM - hence the splitup.

          Therefore I need to be able to provide a list of Solrs to the create operation of the Solr Collection API. The shards of the collection to be created are then only allowed to be spread over the Solrs in this list - default list could be "all Solrs". As this list we, in our Solr-based projbect, will give our list of Data-Solrs.

          Can I add such a feature to this SOLR-4114 and include it in a combined patch, or do you prefer another ticket for this change? I can create another issue but provide a combined patch. Are you interrested in such a feature at all? That is, a feature where the create operation takes a list of Solrs to spread the created shards over.

          Show
          Per Steffensen added a comment - - edited Another more urgent problem (for me) is that I need to do another change to the Solr Collection API, before we can use it as a replacement for what we already do in our project (where we create each shard one by one in OUR code). We split our set of Solr servers into two subsets - Data-Solrs and Search-Solrs. The Search-Solrs are not supposed to carry any data and therefore to be occupied by indexing. Search-Solr instead play the role of receiving queries from the outside, sub-quering the Data-Solrs and combining the final total response to the "outside". Data-Solrs are where we create the data-carrying collections. Data-Solrs need more CPU and IO-capabilities while Search-Solrs need more RAM - hence the splitup. Therefore I need to be able to provide a list of Solrs to the create operation of the Solr Collection API. The shards of the collection to be created are then only allowed to be spread over the Solrs in this list - default list could be "all Solrs". As this list we, in our Solr-based projbect, will give our list of Data-Solrs. Can I add such a feature to this SOLR-4114 and include it in a combined patch, or do you prefer another ticket for this change? I can create another issue but provide a combined patch. Are you interrested in such a feature at all? That is, a feature where the create operation takes a list of Solrs to spread the created shards over.
          Hide
          Per Steffensen added a comment -

          Per: this is unrelated to your patch of course - it just happened to come up here.

          No problem. I could make it as part of this patch if you want, but Im not sure I agree with your way of interpreting the term "replication-factor". I would expect "replication-factor" to say something about how many times the data is REPLICATED. If I run with only one copy of the data for each slice, I would logically say that my data is not replicated, and that matches the replication-factor of 0.

          I have used HDFS and HBase a little a year or so ago, but Im not sure what meaning they put into the term "replica". I've also worked a lot with ElasticSearch (which I believe is more of a pendant to Solr) and in ElasticSearch I believe they use the term "replica" as the number of ADDITIONAL copies of the data - equal to your/our current implementation in Solr.

          Show
          Per Steffensen added a comment - Per: this is unrelated to your patch of course - it just happened to come up here. No problem. I could make it as part of this patch if you want, but Im not sure I agree with your way of interpreting the term "replication-factor". I would expect "replication-factor" to say something about how many times the data is REPLICATED. If I run with only one copy of the data for each slice, I would logically say that my data is not replicated, and that matches the replication-factor of 0. I have used HDFS and HBase a little a year or so ago, but Im not sure what meaning they put into the term "replica". I've also worked a lot with ElasticSearch (which I believe is more of a pendant to Solr) and in ElasticSearch I believe they use the term "replica" as the number of ADDITIONAL copies of the data - equal to your/our current implementation in Solr.
          Hide
          Per Steffensen added a comment - - edited

          This fix belongs with the issue that fixed delete and reload - I'm going to fix it there.

          Yes of course, it is just hard for me to split up the patch, because it is all needed for the tests to be green - and I really want to give you a patch fitting on top of a certain revision where all tests are green if you add the patch. But commit-wise it belongs to the other issue.

          Show
          Per Steffensen added a comment - - edited This fix belongs with the issue that fixed delete and reload - I'm going to fix it there. Yes of course, it is just hard for me to split up the patch, because it is all needed for the tests to be green - and I really want to give you a patch fitting on top of a certain revision where all tests are green if you add the patch. But commit-wise it belongs to the other issue.
          Hide
          Yonik Seeley added a comment -

          Ok, its just than the replicationFactor you specify in your request is the other thing.

          Hmmm, you're right:
          "Note: replicationFactor defines the maximum number of replicas created in addition to the leader from amongst the nodes currently running"

          That's not consistent with the original definition (http://wiki.apache.org/solr/NewSolrCloudDesign), the way the state is represented in clusterstate, or the way others use the term such as in hbase/HDFS, cassandra, oracle, etc. The important part is how many times the data is stored (the replication factor), and things like leaders are more of an implementation detail.

          Luckily we don't yet store this in the cluster, so there's no back compat issue with existing clusters. There's only a change when creating a new cluster, but that seems relatively minor. Given that, I'd lean toward changing this parameter to be in line with common usage.

          Per: this is unrelated to your patch of course - it just happened to come up here.

          Show
          Yonik Seeley added a comment - Ok, its just than the replicationFactor you specify in your request is the other thing. Hmmm, you're right: "Note: replicationFactor defines the maximum number of replicas created in addition to the leader from amongst the nodes currently running" That's not consistent with the original definition ( http://wiki.apache.org/solr/NewSolrCloudDesign ), the way the state is represented in clusterstate, or the way others use the term such as in hbase/HDFS, cassandra, oracle, etc. The important part is how many times the data is stored (the replication factor), and things like leaders are more of an implementation detail. Luckily we don't yet store this in the cluster, so there's no back compat issue with existing clusters. There's only a change when creating a new cluster, but that seems relatively minor. Given that, I'd lean toward changing this parameter to be in line with common usage. Per: this is unrelated to your patch of course - it just happened to come up here.
          Hide
          Mark Miller added a comment -

          fixed in collectionCmd (used for delete and reload) but not in createCollection

          This fix belongs with the issue that fixed delete and reload - I'm going to fix it there.

          Show
          Mark Miller added a comment - fixed in collectionCmd (used for delete and reload) but not in createCollection This fix belongs with the issue that fixed delete and reload - I'm going to fix it there.
          Hide
          Per Steffensen added a comment -

          Patch including the maxShardsPerNode feature comming up. And (much) better testing of the create operation of the Collections API.

          Show
          Per Steffensen added a comment - Patch including the maxShardsPerNode feature comming up. And (much) better testing of the create operation of the Collections API.
          Hide
          Per Steffensen added a comment - - edited

          As far as terminology, when I say replicationFactor of 3, I mean 3 copies of the data. I also count the leader as a replica of a shard (which is logical). It follows from the clusterstate.json, which lists all "replicas" for a shard and one of them just has a flag indicating it's the leader. This also makes it easier to talk about a shard having 0 replicas (meaning there is not even a leader).

          I understand that you can view all shards under a slice as a "replica", but in my mind "replica" is also a "role" that a shard plays at runtime - all shards except one under a slice play the "replica role" at runtime, the remaining shard plays the "leader role" at runtime. To not create to much confusion, I suggest you use the term "shards" for all the instances under a slice, and that you use the terms "replica" and "leader" only for a role that a shard plays at runtime.
          But that of course would require changes e.g. to Slice-class where e.g. getReplicas, getReplicasCopy and getReplicasMap needs to me renamed to getShardsXXX. It probably shouldnt be done now, but as a part of a cross-code cleaning up in term-usage. Today there is a heavy mixup of term-usage in the code - replica and shard are sometimes used for a node, replica and shard are used for the same thing, etc.

          Suggested terms:

          • collection: A big logical bucket to fill data into
          • slice: A logical part of a collection. A part of the data going into a collection goes into a particular slice. Slices for a particular collection are non-overlapping
          • shard: A physical instance of a slice. Running without replica there is one shard per slice. Running with replication-factor X there are X+1 shards per slice.
          • replica and leader: Roles played by shards at runtime. As soon as the system is not running there are no replica/leader - there are just shards
          • node-base-url: The prefix/base (up to and including the webapp-context) of the URL for a specific Solr server
          • node-name: A logical name for the Solr server - the same as node-base-url except /'s are replaced by _'s and the protocol part (http(s)://) is removed
          Show
          Per Steffensen added a comment - - edited As far as terminology, when I say replicationFactor of 3, I mean 3 copies of the data. I also count the leader as a replica of a shard (which is logical). It follows from the clusterstate.json, which lists all "replicas" for a shard and one of them just has a flag indicating it's the leader. This also makes it easier to talk about a shard having 0 replicas (meaning there is not even a leader). I understand that you can view all shards under a slice as a "replica", but in my mind "replica" is also a "role" that a shard plays at runtime - all shards except one under a slice play the "replica role" at runtime, the remaining shard plays the "leader role" at runtime. To not create to much confusion, I suggest you use the term "shards" for all the instances under a slice, and that you use the terms "replica" and "leader" only for a role that a shard plays at runtime. But that of course would require changes e.g. to Slice-class where e.g. getReplicas, getReplicasCopy and getReplicasMap needs to me renamed to getShardsXXX. It probably shouldnt be done now, but as a part of a cross-code cleaning up in term-usage. Today there is a heavy mixup of term-usage in the code - replica and shard are sometimes used for a node, replica and shard are used for the same thing, etc. Suggested terms: collection: A big logical bucket to fill data into slice: A logical part of a collection. A part of the data going into a collection goes into a particular slice. Slices for a particular collection are non-overlapping shard: A physical instance of a slice. Running without replica there is one shard per slice. Running with replication-factor X there are X+1 shards per slice. replica and leader: Roles played by shards at runtime. As soon as the system is not running there are no replica/leader - there are just shards node-base-url: The prefix/base (up to and including the webapp-context) of the URL for a specific Solr server node-name: A logical name for the Solr server - the same as node-base-url except /'s are replaced by _'s and the protocol part (http(s)://) is removed
          Hide
          Per Steffensen added a comment -

          Solr 3.X to Solr 4.X back compat is not considered the same as Solr 4.0 to Solr 4.1 back compat.

          Of course, I agree! But anyway...

          Show
          Per Steffensen added a comment - Solr 3.X to Solr 4.X back compat is not considered the same as Solr 4.0 to Solr 4.1 back compat. Of course, I agree! But anyway...
          Hide
          Per Steffensen added a comment -

          As far as terminology, when I say replicationFactor of 3, I mean 3 copies of the data. I also count the leader as a replica of a shard (which is logical). It follows from the clusterstate.json, which lists all "replicas" for a shard and one of them just has a flag indicating it's the leader. This also makes it easier to talk about a shard having 0 replicas (meaning there is not even a leader).

          Ok, its just than the replicationFactor you specify in your request is the other thing. You get replicationFactor + 1 shards per slice, if we define replicationFactor as the one you give in your request.

          Show
          Per Steffensen added a comment - As far as terminology, when I say replicationFactor of 3, I mean 3 copies of the data. I also count the leader as a replica of a shard (which is logical). It follows from the clusterstate.json, which lists all "replicas" for a shard and one of them just has a flag indicating it's the leader. This also makes it easier to talk about a shard having 0 replicas (meaning there is not even a leader). Ok, its just than the replicationFactor you specify in your request is the other thing. You get replicationFactor + 1 shards per slice, if we define replicationFactor as the one you give in your request.
          Hide
          Yonik Seeley added a comment -

          > So that we don't lose functionality we currently have?

          So now you care about backwards compatibility?

          I was speaking specifically about functionality, not back compatibility.

          With your current solution there will be no "waiting until that machine comes back up". You will just end up with 8 slices, where 7 of them have 2 replica, and the last only have 1 replica.

          Correct. When I said "it's entirely reasonable for a user to want to wait", I meant wait to create the additional replica for one shard, not wait to create the whole collection. Although I guess it might be useful to be able to fail collection creation if certain specified constraints aren't met (including a min replication factor).

          As far as terminology, when I say replicationFactor of 3, I mean 3 copies of the data. I also count the leader as a replica of a shard (which is logical). It follows from the clusterstate.json, which lists all "replicas" for a shard and one of them just has a flag indicating it's the leader. This also makes it easier to talk about a shard having 0 replicas (meaning there is not even a leader).

          Show
          Yonik Seeley added a comment - > So that we don't lose functionality we currently have? So now you care about backwards compatibility? I was speaking specifically about functionality, not back compatibility. With your current solution there will be no "waiting until that machine comes back up". You will just end up with 8 slices, where 7 of them have 2 replica, and the last only have 1 replica. Correct. When I said "it's entirely reasonable for a user to want to wait", I meant wait to create the additional replica for one shard, not wait to create the whole collection. Although I guess it might be useful to be able to fail collection creation if certain specified constraints aren't met (including a min replication factor). As far as terminology, when I say replicationFactor of 3, I mean 3 copies of the data. I also count the leader as a replica of a shard (which is logical). It follows from the clusterstate.json, which lists all "replicas" for a shard and one of them just has a flag indicating it's the leader. This also makes it easier to talk about a shard having 0 replicas (meaning there is not even a leader).
          Hide
          Mark Miller added a comment -

          Solr 3.X to Solr 4.X back compat is not considered the same as Solr 4.0 to Solr 4.1 back compat.

          Show
          Mark Miller added a comment - Solr 3.X to Solr 4.X back compat is not considered the same as Solr 4.0 to Solr 4.1 back compat.
          Hide
          Per Steffensen added a comment -

          So that we don't lose functionality we currently have?

          So now you care about backwards compatibility? You didnt care about backwards compatibility from 3.6 to 4.0 when you introduced optimistic locking (including error in case of updating an existing document without providing correct version), which is forced upon you in 4.0 if you choose to run with version-field and update-log. There are perfectly valid reasons for wanting to use version-field and update-log, without wanting to have fullblown optimistic locking. My solution to SOLR-3178 support this kind of backwards compatibility by letting you explicitly choose among update-semantics modes "classic", "consistency" and "classic-consistency-hybrid". So if you come from 3.6 and want backwards compatibile update-semantics, but also want version-field and update-log, you just choose update-semantics "classic" See http://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics.
          Im just teasing you a little

          But anyway, I like backwards compatibility so you are right, we probably do not want to do something that change default behaviour in 4.0.0. Will have a look at a solution tomorrow. It is kinda late in europe now.

          Example: you have 24 servers and create a collection with 8 shards and a target replication factor of 3... but one of the servers goes down in the meantime so one shard has only 2 replicas. It's entirely reasonable for a user to want to wait until that machine comes back up rather than doubling up on a different node.

          Assume you mean replication-factor of 2? With a replication-factor of 2 you will get 3 shards per slice.

          With your current solution there will be no "waiting until that machine comes back up". You will just end up with 8 slices, where 7 of them have 2 replica, and the last only have 1 replica. With the patch I provided today you will end up with 8 slices, where all of them have 2 replica - but one of the servers will be running two shards and the solr down will not be running any (when it comes back up). I probably would prefer my current solution - at least you acheive the property that any two servers can crash (including disk crash) without you loosing data - which is basically what you want to acheive when you request replication-factor of 2.

          But waiting for the machine to come back up before creating the collection would certainly be the best solution. It is just extremly hard to know if a machine is down or not - or if you intented to run one server more than what is currently running. In general there is no information in solr/ZK about that - and there shouldnt. In this case a maxShardsPerNode could be a nice way to tell the system that you just want to wait. But then it would have to be implemented correctly, and that is really hard. In OverseerCollectionProcessor you can check if you can meet the maxShardsPerNode requirement with the current set of live solrs, and if you cant just dont initiate the creation process. But a server can go down between the time where the OverseerCollectionProcessor checks and the time where it is supposed to create a shard. Therefore it is impossible to guarantee that the OverseerCollectionProcessor does not create some shards of a new collection without being able to create them all while still living up to the maxShardsPerNode requirement. In such case, if you really want to live up to the maxShardsPerNode requiremnt the OverseerCollectionProcessor would have to try to delete the shards of the collection that was successfully created. But this deletion process can also fail. Ahhh there is no guaranteeed way.

          Therefore my idea about the whole thing, is more aming at just having all the shards created, and then move them around later. I know this is not possible for now, but I do expect that we (at least my project) will make support for (manually and/or automatic) migration of shards from one server to another. This feature is needed to acheive nice elasiticty (moving shards/load onto new servers as they join the cluster), but also to do re-balancing after e.g. a solr was down (and a shard that should have been placed on this server was temporarily created to run on another server).

          Well as I said I will consider the best (small patch ) solution tomorrow. But if I cant come up with a better small-patch-solution we can certainly do the maxShardsPerNode thing - no problemo. It just isnt going to be 100% guaranteed.

          Show
          Per Steffensen added a comment - So that we don't lose functionality we currently have? So now you care about backwards compatibility? You didnt care about backwards compatibility from 3.6 to 4.0 when you introduced optimistic locking (including error in case of updating an existing document without providing correct version), which is forced upon you in 4.0 if you choose to run with version-field and update-log. There are perfectly valid reasons for wanting to use version-field and update-log, without wanting to have fullblown optimistic locking. My solution to SOLR-3178 support this kind of backwards compatibility by letting you explicitly choose among update-semantics modes "classic", "consistency" and "classic-consistency-hybrid". So if you come from 3.6 and want backwards compatibile update-semantics, but also want version-field and update-log, you just choose update-semantics "classic" See http://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics . Im just teasing you a little But anyway, I like backwards compatibility so you are right, we probably do not want to do something that change default behaviour in 4.0.0. Will have a look at a solution tomorrow. It is kinda late in europe now. Example: you have 24 servers and create a collection with 8 shards and a target replication factor of 3... but one of the servers goes down in the meantime so one shard has only 2 replicas. It's entirely reasonable for a user to want to wait until that machine comes back up rather than doubling up on a different node. Assume you mean replication-factor of 2? With a replication-factor of 2 you will get 3 shards per slice. With your current solution there will be no "waiting until that machine comes back up". You will just end up with 8 slices, where 7 of them have 2 replica, and the last only have 1 replica. With the patch I provided today you will end up with 8 slices, where all of them have 2 replica - but one of the servers will be running two shards and the solr down will not be running any (when it comes back up). I probably would prefer my current solution - at least you acheive the property that any two servers can crash (including disk crash) without you loosing data - which is basically what you want to acheive when you request replication-factor of 2. But waiting for the machine to come back up before creating the collection would certainly be the best solution. It is just extremly hard to know if a machine is down or not - or if you intented to run one server more than what is currently running. In general there is no information in solr/ZK about that - and there shouldnt. In this case a maxShardsPerNode could be a nice way to tell the system that you just want to wait. But then it would have to be implemented correctly, and that is really hard. In OverseerCollectionProcessor you can check if you can meet the maxShardsPerNode requirement with the current set of live solrs, and if you cant just dont initiate the creation process. But a server can go down between the time where the OverseerCollectionProcessor checks and the time where it is supposed to create a shard. Therefore it is impossible to guarantee that the OverseerCollectionProcessor does not create some shards of a new collection without being able to create them all while still living up to the maxShardsPerNode requirement. In such case, if you really want to live up to the maxShardsPerNode requiremnt the OverseerCollectionProcessor would have to try to delete the shards of the collection that was successfully created. But this deletion process can also fail. Ahhh there is no guaranteeed way. Therefore my idea about the whole thing, is more aming at just having all the shards created, and then move them around later. I know this is not possible for now, but I do expect that we (at least my project) will make support for (manually and/or automatic) migration of shards from one server to another. This feature is needed to acheive nice elasiticty (moving shards/load onto new servers as they join the cluster), but also to do re-balancing after e.g. a solr was down (and a shard that should have been placed on this server was temporarily created to run on another server). Well as I said I will consider the best (small patch ) solution tomorrow. But if I cant come up with a better small-patch-solution we can certainly do the maxShardsPerNode thing - no problemo. It just isnt going to be 100% guaranteed.
          Hide
          Yonik Seeley added a comment -

          Well I see no reason to introduce (in the first step at least) a maxShardsPerNode.

          So that we don't lose functionality we currently have?
          And I agree that it should be up to the user, hence the proposed parameter to control it.

          Only potential problem is if his create request is run when not all Solr servers are running, and in such case a maxShardsPerNode could help to stop the creation process.

          Exactly... there's the main use case.

          Example: you have 24 servers and create a collection with 8 shards and a target replication factor of 3... but one of the servers goes down in the meantime so one shard has only 2 replicas. It's entirely reasonable for a user to want to wait until that machine comes back up rather than doubling up on a different node.

          The other use case is the examples on http://wiki.apache.org/solr/SolrCloud

          Show
          Yonik Seeley added a comment - Well I see no reason to introduce (in the first step at least) a maxShardsPerNode. So that we don't lose functionality we currently have? And I agree that it should be up to the user, hence the proposed parameter to control it. Only potential problem is if his create request is run when not all Solr servers are running, and in such case a maxShardsPerNode could help to stop the creation process. Exactly... there's the main use case. Example: you have 24 servers and create a collection with 8 shards and a target replication factor of 3... but one of the servers goes down in the meantime so one shard has only 2 replicas. It's entirely reasonable for a user to want to wait until that machine comes back up rather than doubling up on a different node. The other use case is the examples on http://wiki.apache.org/solr/SolrCloud
          Hide
          Per Steffensen added a comment -

          Learned from Steve today, that you usually develop for 5.x on trunk, and then back port to 4.x.y branches. Let me know if you would like a trunk-based patch instead

          Show
          Per Steffensen added a comment - Learned from Steve today, that you usually develop for 5.x on trunk, and then back port to 4.x.y branches. Let me know if you would like a trunk-based patch instead
          Hide
          Per Steffensen added a comment -

          About SOLR-4114.patch:

          • It fits on top of revision 1412602 of branch lucene_solr_4_0
          • The shard allocation algorithm explained
            • Shards are allocated to Solr servers one by one. The next shard is always assigned to the next server in a shuffled list of live servers. Whenever you reach the end of the list of live servers you start over again.
            • Replica for a certain shard are allocated to the #replication-factor next servers in the list
            • replication-factor is reduced if it is requested to be higher than "the number of live servers - 1". Kinda pointless to run two shards belonging to the same slice on the same server
              • Unfortunately only able to log the decission about such a replication-factor reduction - no easy way to get info back to caller since the job is handled asynchronously by the Overseer
          • Besides that a bug-fix included
            • OverseerCollectionProcessor.createCollection and .collectionCmd reused params-objects too much. The same params-object was used for several submits to ShardHandler, but since the ShardsHandler issues asynchronous jobs, the params-object might be changed by the OverseerCollectionProcessor before the asynchronous job is executed - resulting in a lot of fun Comments added around the fixes
              • This bug does not appear to be fixed on lucene_solr_4_0
              • It appears to be partly fixed on branch_4x - fixed in collectionCmd (used for delete and reload) but not in createCollection (used for create)
          • Besides that a little cleaning up - I know you don't like it, but my eyes cannot handle such mess
            • BasicDistributedZkTest: Introduced method getCommonCloudSolrServer to be used instead of just using solrj. The solrj variable was initialized in method queryServer but used lots of other places. For this to work your test needs to call queryServer before any of the other methods using solrj. This is fragile, when you change the test, and if you (as I did) commented out parts of the test.
            • HttpShardHandler: Made getURLs thread-safe so that you do not have to be so careful using it
            • General: Took a small step towards consistent usage of terms collection, node-name, node-base-url, slice, shard and replica. All over the code the terms are mixed up, I took the opportunity to clean up in the code nearby my changes. IMHO you should do a lot more cleaing up in this project. I will try to sneak in clean-ups whenever I can My view on correct meaning of terms
              • collection: A big logical bucket to fill data into
              • slice: A logical part of a collection. A part of the data going into a collection goes into a particular slice. Slices for a particular collection are non-overlapping
              • shard: A physical instance of a slice. Running without replica there is one shard per slice. Running with replication-factor X there are X+1 shards per slice.
              • node-base-url: The prefix/base (up to and including the webapp-context) of the URL for a specific Solr server
              • node-name: A logical name for the Solr server - the same as node-base-url except /'s are replaced by _'s and the protocol part (http(s)://) is removed

          If you dont want the cleaning up stuff the following parts of the patch can be left out

          • BasicDistributedZkTest: Eveything except maybe the change from "new ZkCoreNodeProps(node).getCoreUrl()" to "ZkCoreNodeProps.getCoreUrl(node.getStr(ZkStateReader.BASE_URL_PROP), collection)" in method getUrlFromZk
          • ShardHandler: Everything
          • HttpShardHandler: Everything
          • OverseerCollectionProcessor: The renaming stuff

          The important stuff is in OverseerCollectionProcessor - the modified shard allocation algoritm that allows for multiple shards from the same collection on each Solr server, and the bug-fix dealing with too eager reuse of params-objects.

          Show
          Per Steffensen added a comment - About SOLR-4114 .patch: It fits on top of revision 1412602 of branch lucene_solr_4_0 The shard allocation algorithm explained Shards are allocated to Solr servers one by one. The next shard is always assigned to the next server in a shuffled list of live servers. Whenever you reach the end of the list of live servers you start over again. Replica for a certain shard are allocated to the #replication-factor next servers in the list replication-factor is reduced if it is requested to be higher than "the number of live servers - 1". Kinda pointless to run two shards belonging to the same slice on the same server Unfortunately only able to log the decission about such a replication-factor reduction - no easy way to get info back to caller since the job is handled asynchronously by the Overseer Besides that a bug-fix included OverseerCollectionProcessor.createCollection and .collectionCmd reused params-objects too much. The same params-object was used for several submits to ShardHandler, but since the ShardsHandler issues asynchronous jobs, the params-object might be changed by the OverseerCollectionProcessor before the asynchronous job is executed - resulting in a lot of fun Comments added around the fixes This bug does not appear to be fixed on lucene_solr_4_0 It appears to be partly fixed on branch_4x - fixed in collectionCmd (used for delete and reload) but not in createCollection (used for create) Besides that a little cleaning up - I know you don't like it, but my eyes cannot handle such mess BasicDistributedZkTest: Introduced method getCommonCloudSolrServer to be used instead of just using solrj. The solrj variable was initialized in method queryServer but used lots of other places. For this to work your test needs to call queryServer before any of the other methods using solrj. This is fragile, when you change the test, and if you (as I did) commented out parts of the test. HttpShardHandler: Made getURLs thread-safe so that you do not have to be so careful using it General: Took a small step towards consistent usage of terms collection, node-name, node-base-url, slice, shard and replica. All over the code the terms are mixed up, I took the opportunity to clean up in the code nearby my changes. IMHO you should do a lot more cleaing up in this project. I will try to sneak in clean-ups whenever I can My view on correct meaning of terms collection: A big logical bucket to fill data into slice: A logical part of a collection. A part of the data going into a collection goes into a particular slice. Slices for a particular collection are non-overlapping shard: A physical instance of a slice. Running without replica there is one shard per slice. Running with replication-factor X there are X+1 shards per slice. node-base-url: The prefix/base (up to and including the webapp-context) of the URL for a specific Solr server node-name: A logical name for the Solr server - the same as node-base-url except /'s are replaced by _'s and the protocol part (http(s)://) is removed If you dont want the cleaning up stuff the following parts of the patch can be left out BasicDistributedZkTest: Eveything except maybe the change from "new ZkCoreNodeProps(node).getCoreUrl()" to "ZkCoreNodeProps.getCoreUrl(node.getStr(ZkStateReader.BASE_URL_PROP), collection)" in method getUrlFromZk ShardHandler: Everything HttpShardHandler: Everything OverseerCollectionProcessor: The renaming stuff The important stuff is in OverseerCollectionProcessor - the modified shard allocation algoritm that allows for multiple shards from the same collection on each Solr server, and the bug-fix dealing with too eager reuse of params-objects.
          Hide
          Per Steffensen added a comment -

          Well I see no reason to introduce (in the first step at least) a maxShardsPerNode. Your requested numShards and the number of live servers at that point in time will decide the number of shards of each node/server - basically no limit, but the clever user of Solr will probably not want to request to have 1000 shards for a collection if he only have 2 Solr servers, but it should actually be up to the user of Solr. We have been using Solr-cloud for a long time now and we have a very high focus on performance, because we need to end up with a Solr cluster supporting "live-searches" among 50-100 billion records. During numerous performance test we have a.o. played with the number of shards per Solr server per collection. We have run one-month+ tests just pumping data into the cluster to se how loading-time, search-time etc. develops as collections are filled with data. We have run such tests with 1, 4, 8 and 12 shards per Solr server per collection, and each of them have both good and bad properties wrt performance, so until we know (and we should be very carefull taking "good decissions on behalf of every Solr user") that there is a always-true best number for maxShardsPerNode we should be careful putting any limit on the user.

          I know you just want to give a maxShardsPerNode in the create request, but the user of Solr really should be able to calculate the number of shards going on each Solr when he controls the numShards and when he knows how many Solr servers he is running. Only potential problem is if his create request is run when not all Solr servers are running, and in such case a maxShardsPerNode could help to stop the creation process.

          But a Solr user probably want to make sure all Solr servers that are supposed to run, are actually running, before he issues a collection creation request, so that he gets shards distributed across all the Solr servers he intend to run. We do that in our project BTW, but outside Solr code.

          Show
          Per Steffensen added a comment - Well I see no reason to introduce (in the first step at least) a maxShardsPerNode. Your requested numShards and the number of live servers at that point in time will decide the number of shards of each node/server - basically no limit, but the clever user of Solr will probably not want to request to have 1000 shards for a collection if he only have 2 Solr servers, but it should actually be up to the user of Solr. We have been using Solr-cloud for a long time now and we have a very high focus on performance, because we need to end up with a Solr cluster supporting "live-searches" among 50-100 billion records. During numerous performance test we have a.o. played with the number of shards per Solr server per collection. We have run one-month+ tests just pumping data into the cluster to se how loading-time, search-time etc. develops as collections are filled with data. We have run such tests with 1, 4, 8 and 12 shards per Solr server per collection, and each of them have both good and bad properties wrt performance, so until we know (and we should be very carefull taking "good decissions on behalf of every Solr user") that there is a always-true best number for maxShardsPerNode we should be careful putting any limit on the user. I know you just want to give a maxShardsPerNode in the create request, but the user of Solr really should be able to calculate the number of shards going on each Solr when he controls the numShards and when he knows how many Solr servers he is running. Only potential problem is if his create request is run when not all Solr servers are running, and in such case a maxShardsPerNode could help to stop the creation process. But a Solr user probably want to make sure all Solr servers that are supposed to run, are actually running, before he issues a collection creation request, so that he gets shards distributed across all the Solr servers he intend to run. We do that in our project BTW, but outside Solr code.
          Hide
          Yonik Seeley added a comment -

          What's the proposed API? Perhaps a maxShardsPerNode parameter during the create?
          Seems like it should default to 1 (the current behavior)?

          Show
          Yonik Seeley added a comment - What's the proposed API? Perhaps a maxShardsPerNode parameter during the create? Seems like it should default to 1 (the current behavior)?
          Hide
          Per Steffensen added a comment -

          Yes it is just the collections API. I could have spelled that out more clearly. Guess only clue was the fact that I did put label "collection-api" on

          Show
          Per Steffensen added a comment - Yes it is just the collections API. I could have spelled that out more clearly. Guess only clue was the fact that I did put label "collection-api" on
          Hide
          Mark Miller added a comment -

          This title is a little too general no? We do actually support multiple shards in one collection on the same server, it's just the collections API that doesn't support this?

          Show
          Mark Miller added a comment - This title is a little too general no? We do actually support multiple shards in one collection on the same server, it's just the collections API that doesn't support this?
          Hide
          Per Steffensen added a comment -

          Patch comming up

          Show
          Per Steffensen added a comment - Patch comming up

            People

            • Assignee:
              Mark Miller
              Reporter:
              Per Steffensen
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development