Solr
  1. Solr
  2. SOLR-1306

Support pluggable persistence/loading of solr.xml details

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: 4.1
    • Component/s: multicore
    • Labels:
      None

      Description

      Persisting and loading details from one xml is fine if the no:of cores are small and the no:of cores are few/fixed . If there are 10's of thousands of cores in a single box adding a new core (with persistent=true) becomes very expensive because every core creation has to write this huge xml.

      Moreover , there is a good chance that the file gets corrupted and all the cores become unusable . In that case I would prefer it to be stored in a centralized DB which is backed up/replicated and all the information is available in a centralized location.

      We may need to refactor CoreContainer to have a pluggable implementation which can load/persist the details . The default implementation should write/read from/to solr.xml . And the class should be pluggable as follows in solr.xml

      <solr>
        <dataProvider class="com.foo.FooDataProvider" attr1="val1" attr2="val2"/>
      </solr>
      

      There will be a new interface (or abstract class ) called SolrDataProvider which this class must implement

      1. SOLR-1306.patch
        16 kB
        Erick Erickson
      2. SOLR-1306.patch
        40 kB
        Erick Erickson
      3. SOLR-1306.patch
        42 kB
        Erick Erickson
      4. SOLR-1306.patch
        42 kB
        Erick Erickson

        Issue Links

          Activity

          Steve Rowe made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Erick Erickson made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Won't Fix [ 2 ]
          Hide
          Erick Erickson added a comment -

          See the discussion at SOLR-4083. Rather than a pluggable core descriptor provider, we'll walk the cores directory under with certain rules and discover all the cores present. Simpler that way since the cores need to be physically present anyway in order to be referenced from a pluggable architecture.

          Show
          Erick Erickson added a comment - See the discussion at SOLR-4083 . Rather than a pluggable core descriptor provider, we'll walk the cores directory under with certain rules and discover all the cores present. Simpler that way since the cores need to be physically present anyway in order to be referenced from a pluggable architecture.
          Hide
          Erick Erickson added a comment -

          Yeah, I'm reluctantly agreeing with him too <G>....

          See: https://issues.apache.org/jira/browse/SOLR-4083. I did some experiments and on a spinning-disk machine (mac 2009) we can read about 1K configs/second with the cores in 150 directories, 100 cores/dir.

          So I'll be cogitating on this a bit, but this effort may take a sharp left turn soon.

          Still doesn't change the need for SOLR-1028 though.

          Show
          Erick Erickson added a comment - Yeah, I'm reluctantly agreeing with him too <G>.... See: https://issues.apache.org/jira/browse/SOLR-4083 . I did some experiments and on a spinning-disk machine (mac 2009) we can read about 1K configs/second with the cores in 150 directories, 100 cores/dir. So I'll be cogitating on this a bit, but this effort may take a sharp left turn soon. Still doesn't change the need for SOLR-1028 though.
          Hide
          Noble Paul added a comment -

          I agree with Mark. If we move to the model where cores can be discovered from directories it should be just fine

          Show
          Noble Paul added a comment - I agree with Mark. If we move to the model where cores can be discovered from directories it should be just fine
          Hide
          Mark Miller added a comment -

          I still don't understand the motivation. You can do the same thing with the on disk auto discover strategy without adding another plugin point to support and complicating this stuff.

          What does this plugin gain you over the model of just loading cores we find in folders? You can still physically move cores all you want - in fact, it's probably a lot easier.

          Show
          Mark Miller added a comment - I still don't understand the motivation. You can do the same thing with the on disk auto discover strategy without adding another plugin point to support and complicating this stuff. What does this plugin gain you over the model of just loading cores we find in folders? You can still physically move cores all you want - in fact, it's probably a lot easier.
          Hide
          Erick Erickson added a comment -

          The use-case has come up on the list for quite a while. Multi-tenant situations that are sparsely used, where the necessary balancing of tenants to machines is carried out by physically moving cores around different machines as usage patterns/core sized change. FWIW, this is not theoretical.

          And I'm not sure what the complexity issue is. Most of the actual code changes in SOLR-1028 are all about making it possible to lazily load cores and they're more organizational (extracted a couple of methods that kinda makes the diff output look way more scary than it is).

          But I'll tell you what. I'll sweeten the pot with https://issues.apache.org/jira/browse/SOLR-4083 to back up my claim that this work makes nuking solr.xml easier.

          Show
          Erick Erickson added a comment - The use-case has come up on the list for quite a while. Multi-tenant situations that are sparsely used, where the necessary balancing of tenants to machines is carried out by physically moving cores around different machines as usage patterns/core sized change. FWIW, this is not theoretical. And I'm not sure what the complexity issue is. Most of the actual code changes in SOLR-1028 are all about making it possible to lazily load cores and they're more organizational (extracted a couple of methods that kinda makes the diff output look way more scary than it is). But I'll tell you what. I'll sweeten the pot with https://issues.apache.org/jira/browse/SOLR-4083 to back up my claim that this work makes nuking solr.xml easier.
          Hide
          Mark Miller added a comment -

          You must have a solr home for each core anyway - I'm not sure the advantage of having this open ended pluggable point here. Keeping this simple and just doing solrhome auto discovery (with the ability to add multiple root folders to look in) just seems much simpler. I'm not convinced there are use cases that are worth the cost of making all this pluggable?

          Show
          Mark Miller added a comment - You must have a solr home for each core anyway - I'm not sure the advantage of having this open ended pluggable point here. Keeping this simple and just doing solrhome auto discovery (with the ability to add multiple root folders to look in) just seems much simpler. I'm not convinced there are use cases that are worth the cost of making all this pluggable?
          Hide
          Noble Paul added a comment -

          Yes you are right the custom implementation should take care of walking the tree and identifying the cores. But at the same time we have to keep in mind that deletion of cores will not just happen immediately as as I fire the command. The actual cleanup of the file systems will happen a bit later. So we should have some kind of a marker to say that if that core is actually a live one.

          IMHO As much as possible we should avoid completely pluggable solutions , because we clearly know that there are only a couple of scenarios.

          Show
          Noble Paul added a comment - Yes you are right the custom implementation should take care of walking the tree and identifying the cores. But at the same time we have to keep in mind that deletion of cores will not just happen immediately as as I fire the command. The actual cleanup of the file systems will happen a bit later. So we should have some kind of a marker to say that if that core is actually a live one. IMHO As much as possible we should avoid completely pluggable solutions , because we clearly know that there are only a couple of scenarios.
          Hide
          Erick Erickson added a comment -

          But isn't this just another CoreDescriptorProvider? Perhaps the default one? The way the code works, you don't have to define any cores in solr.xml, all you have to do is provide a class for a pluggable descriptor provider. Unless I'm missing something, we could have a default one that enumerated the cores tree and the stock Solr could ship without a solr.xml at all. Maybe back up a step and have a custom CoreContainer instead/too that could be specified when needed as a sysprop when starting Solr? Or do we just need a few system properties rather than a new class? That would go a long way towards simply nuking the need to ever have a solr.xml at all.

          And people who had other needs could still do something other/more complex as needed.

          Besides Paul's comment about 10s of K cores at the top level (and, Paul, I'm thinking of walking a tree, so not all the top-levels of the cores would be in the same directory), there's still the startup cost in the case I'm working on of enumerating, say, 15K cores on startup. The pluggable core provider allows that cost to be amortized over some period of time and/or fetched from a faster source.

          This is outside the separate issue of lazy core loading, the lazy core cache, etc.

          So I'm not sure I see any incompatibilities here now I've thought about it a bit more. In fact, this patch seems like a step to accomplish that goal.

          Show
          Erick Erickson added a comment - But isn't this just another CoreDescriptorProvider? Perhaps the default one? The way the code works, you don't have to define any cores in solr.xml, all you have to do is provide a class for a pluggable descriptor provider. Unless I'm missing something, we could have a default one that enumerated the cores tree and the stock Solr could ship without a solr.xml at all. Maybe back up a step and have a custom CoreContainer instead/too that could be specified when needed as a sysprop when starting Solr? Or do we just need a few system properties rather than a new class? That would go a long way towards simply nuking the need to ever have a solr.xml at all. And people who had other needs could still do something other/more complex as needed. Besides Paul's comment about 10s of K cores at the top level (and, Paul, I'm thinking of walking a tree , so not all the top-levels of the cores would be in the same directory), there's still the startup cost in the case I'm working on of enumerating, say, 15K cores on startup. The pluggable core provider allows that cost to be amortized over some period of time and/or fetched from a faster source. This is outside the separate issue of lazy core loading, the lazy core cache, etc. So I'm not sure I see any incompatibilities here now I've thought about it a bit more. In fact, this patch seems like a step to accomplish that goal.
          Hide
          Noble Paul added a comment -

          Please keep in mind that keeping all cores in the directory is not feasible if you have 10's of K cores. The file system ends up being very slow with that many no:of directories. So, we had to put the cores in 'n' different buckets to overcome the performance issue

          Show
          Noble Paul added a comment - Please keep in mind that keeping all cores in the directory is not feasible if you have 10's of K cores. The file system ends up being very slow with that many no:of directories. So, we had to put the cores in 'n' different buckets to overcome the performance issue
          Hide
          Lance Norskog added a comment -

          think we should drop the top level config (eg solr.xml). Instead, we should auto load folders

          +1

          There are often groups of cores with the same schema- shards in the same solr, for example. How would this dynamic discovery support groups of collections?

          Show
          Lance Norskog added a comment - think we should drop the top level config (eg solr.xml). Instead, we should auto load folders +1 There are often groups of cores with the same schema- shards in the same solr, for example. How would this dynamic discovery support groups of collections?
          Hide
          Mark Miller added a comment -

          I'm with yonik on this one - I think we should drop the top level config (eg solr.xml). Instead, we should auto load folders - no config required, but if you want to override some things, the config lives with the core folder. If you want to be able to place core folders in other locations, we could have a sys prop that added locations. Anything required for settings (like zkHost) would be passed on startup as sys props instead.

          You can still load cores in parallel this way.

          Show
          Mark Miller added a comment - I'm with yonik on this one - I think we should drop the top level config (eg solr.xml). Instead, we should auto load folders - no config required, but if you want to override some things, the config lives with the core folder. If you want to be able to place core folders in other locations, we could have a sys prop that added locations. Anything required for settings (like zkHost) would be passed on startup as sys props instead. You can still load cores in parallel this way.
          Hide
          Erick Erickson added a comment -

          Well, the use case here is explicitly that the core information is kept in a completely extra-solr repository (extra ZK too for that matter). Managing 100K cores by moving directories around is non-trivial, especially since there will probably be some system-of-record for where all the information lives anyway.

          As it stands, this patch doesn't really affect the way Solr works OOB. It only comes into play if the people implementing the provider require it (and want to implement the complexity).

          But let me think about this a bit. Are you suggesting that the whole notion of solr.xml be replaced by some kind of crawl/discovery process? Off the top of my head, I can imagine a degenerate solr.xml that just lists one or more directories. Then the load process consists of crawling those directories looking for cores and loading them, possibly with some kind of configuration files at the core level. For the 10s of K cores/machine case we don't want to put the data in solrconfig.xml or anything like that, I'm thinking of something very much simpler, on the order of a java.properties file. I've skipped thinking about how to "find a core" or how that plays with using common schemas to see if this is along the lines you're thinking of "getting meta-data closer to the index".

          It does make the whole coordination issue a lot easier, though. You no longer have the loose coupling between having core information in solr.xml and then having to be sure the files/dirs corresponding to what's in solr.xml "just happen" to map to what's actually on disk.... Moving something from one place to another would consist of
          1> shutting down the servers
          2> moving the core directory from one server to another
          3> starting up the servers again.

          I can imagine doing this a bit differently...
          1> copy the core from one server to another
          2> issue an unload for the core on the source server
          3> issue a create for the core on the dest server

          There'd probably have to be some kind of background loading, but we're already talking about parallelizing multicore loads...

          From an admin perspective, the poor soul trying to maintain this all could pretty easily enumerate where all the cores were just by asking each server for a list of where things are.

          Anyway, is the in the vicinity of "moving the metadata closer to the index"?

          Show
          Erick Erickson added a comment - Well, the use case here is explicitly that the core information is kept in a completely extra-solr repository (extra ZK too for that matter). Managing 100K cores by moving directories around is non-trivial, especially since there will probably be some system-of-record for where all the information lives anyway. As it stands, this patch doesn't really affect the way Solr works OOB. It only comes into play if the people implementing the provider require it (and want to implement the complexity). But let me think about this a bit. Are you suggesting that the whole notion of solr.xml be replaced by some kind of crawl/discovery process? Off the top of my head, I can imagine a degenerate solr.xml that just lists one or more directories. Then the load process consists of crawling those directories looking for cores and loading them, possibly with some kind of configuration files at the core level. For the 10s of K cores/machine case we don't want to put the data in solrconfig.xml or anything like that, I'm thinking of something very much simpler, on the order of a java.properties file. I've skipped thinking about how to "find a core" or how that plays with using common schemas to see if this is along the lines you're thinking of "getting meta-data closer to the index". It does make the whole coordination issue a lot easier, though. You no longer have the loose coupling between having core information in solr.xml and then having to be sure the files/dirs corresponding to what's in solr.xml "just happen" to map to what's actually on disk.... Moving something from one place to another would consist of 1> shutting down the servers 2> moving the core directory from one server to another 3> starting up the servers again. I can imagine doing this a bit differently... 1> copy the core from one server to another 2> issue an unload for the core on the source server 3> issue a create for the core on the dest server There'd probably have to be some kind of background loading, but we're already talking about parallelizing multicore loads... From an admin perspective, the poor soul trying to maintain this all could pretty easily enumerate where all the cores were just by asking each server for a list of where things are. Anyway, is the in the vicinity of "moving the metadata closer to the index"?
          Hide
          Yonik Seeley added a comment -

          At first blush, this seems to go in the wrong direction.
          Rather than keep meta-data about a core/directory further away from the actual index for that directory, it seems like we should move it closer (i.e. the meta-data for collection1 should be kept under the collection1 directory or even the collection1/data directory).

          Wouldn't it be nice to be able to back up a collection/shard by simply copying a single directory?
          This applies to cloud too - it seems like info about the shard / collection the index belongs to should ride around next to the index.
          One should be able to bring down two solr servers, move a directory from one server to another, then start back up and have everything just work.

          Show
          Yonik Seeley added a comment - At first blush, this seems to go in the wrong direction. Rather than keep meta-data about a core/directory further away from the actual index for that directory, it seems like we should move it closer (i.e. the meta-data for collection1 should be kept under the collection1 directory or even the collection1/data directory). Wouldn't it be nice to be able to back up a collection/shard by simply copying a single directory? This applies to cloud too - it seems like info about the shard / collection the index belongs to should ride around next to the index. One should be able to bring down two solr servers, move a directory from one server to another, then start back up and have everything just work.
          Hide
          Erick Erickson added a comment -

          I'm thinking of committing this this weekend (to trunk, not 4.x yet) unless people object. I want to write a stress test and bang away at this thing first, and reconcile the CoreDescriptorProvider I came up with with the one already in there for Zookeeper.

          Let me know
          Erick

          Show
          Erick Erickson added a comment - I'm thinking of committing this this weekend (to trunk, not 4.x yet) unless people object. I want to write a stress test and bang away at this thing first, and reconcile the CoreDescriptorProvider I came up with with the one already in there for Zookeeper. Let me know Erick
          Erick Erickson made changes -
          Attachment SOLR-1306.patch [ 12552863 ]
          Hide
          Erick Erickson added a comment -

          Last fix broke some tests, this fixes them.

          Show
          Erick Erickson added a comment - Last fix broke some tests, this fixes them.
          Erick Erickson made changes -
          Attachment SOLR-1306.patch [ 12552380 ]
          Hide
          Erick Erickson added a comment -

          Fix for problem when no CoreDescriptorProvider was supplied but a bunch of cores were specified as loadOnStartup="false". CoreContainer.getCoreNames was not returning these cores.

          Show
          Erick Erickson added a comment - Fix for problem when no CoreDescriptorProvider was supplied but a bunch of cores were specified as loadOnStartup="false". CoreContainer.getCoreNames was not returning these cores.
          Erick Erickson made changes -
          Attachment SOLR-1306.patch [ 12552308 ]
          Hide
          Erick Erickson added a comment -

          took out some extraneous crud that made it into the last patch.

          When creating a custom core descriptor, the following changes need to be made to solr.xml:
          1> add a sharedLib directive to the <solr> tag to a directory containing the your custom jar
          2> add coreDescriptorProviderClass to the <cores> tag. Here's an example:

          <solr persistent="false" sharedLib="../../../../../your/path/here/">

          <cores [all the other opts] coreDescriptorProviderClass="your.company.TheCoreDescriptorProvider />

          Show
          Erick Erickson added a comment - took out some extraneous crud that made it into the last patch. When creating a custom core descriptor, the following changes need to be made to solr.xml: 1> add a sharedLib directive to the <solr> tag to a directory containing the your custom jar 2> add coreDescriptorProviderClass to the <cores> tag. Here's an example: <solr persistent="false" sharedLib="../../../../../your/path/here/"> <cores [all the other opts] coreDescriptorProviderClass="your.company.TheCoreDescriptorProvider />
          Erick Erickson made changes -
          Attachment SOLR-1306.patch [ 12551614 ]
          Erick Erickson made changes -
          Attachment SOLR-1306.patch [ 12552105 ]
          Erick Erickson made changes -
          Attachment SOLR-1306.patch [ 12552105 ]
          Hide
          Erick Erickson added a comment -

          Patch adding multi-thread core creation that insures that multiple calls to create the same core all return the same object.

          Show
          Erick Erickson added a comment - Patch adding multi-thread core creation that insures that multiple calls to create the same core all return the same object.
          Erick Erickson made changes -
          Attachment SOLR-1306.patch [ 12551614 ]
          Hide
          Erick Erickson added a comment -

          Current progress. All the failing tests pass, although I'm getting a failure that also fails without this patch that I'm looking into.

          This removes the silly lots_of_solr_cores.xml I was using, cleans up a number of details.

          Show
          Erick Erickson added a comment - Current progress. All the failing tests pass, although I'm getting a failure that also fails without this patch that I'm looking into. This removes the silly lots_of_solr_cores.xml I was using, cleans up a number of details.
          Hide
          Erick Erickson added a comment -

          Hmmm. Looking at this a little more, the isPersist method looks like one of those ideas that doesn't work. Originally I thought it would be a way to manipulate the persisted values in solr.xml, but there's a chicken-and-egg problem here.

          There's no provision for getting a CoreDescriptor unless the core is loaded. And persisting a core that has not been loaded seems like a bad idea. So I'm removing the method from the interface. My thinking now is that anything provided by a CoreDescriptorProvider should just NOT be persisted.

          We can revisit this later if there's a demonstrated need, but for the nonce I think it's unwise to conflate persistence with this.

          Show
          Erick Erickson added a comment - Hmmm. Looking at this a little more, the isPersist method looks like one of those ideas that doesn't work. Originally I thought it would be a way to manipulate the persisted values in solr.xml, but there's a chicken-and-egg problem here. There's no provision for getting a CoreDescriptor unless the core is loaded. And persisting a core that has not been loaded seems like a bad idea. So I'm removing the method from the interface. My thinking now is that anything provided by a CoreDescriptorProvider should just NOT be persisted. We can revisit this later if there's a demonstrated need, but for the nonce I think it's unwise to conflate persistence with this.
          Hide
          Erick Erickson added a comment -

          I should add that I know that about 10 tests fail, I'll look into them this afternoon.

          Show
          Erick Erickson added a comment - I should add that I know that about 10 tests fail, I'll look into them this afternoon.
          Erick Erickson made changes -
          Attachment SOLR-1306.patch [ 12551374 ]
          Hide
          Erick Erickson added a comment -

          MUST be applied after SOLR-1028

          OK, here's a preliminary cut at this, no tests yet, but I was looking at logging and it seems to be doing what I want, putting up for inspection by the curious...

          A couple of notes:
          1> It turns out that to make this work I needed to incorporate SOLR-4013 and SOLR-3980. I'd appreciate anyone looking at the synchronization I did around the member variable "loadingCores". The intent here is to keep two threads from creating the same core at the same time.
          1a> I'm assuming that there is exactly one CoreContainer per JVM. Otherwise I don't understand how any of the synchronization works on the member vars....
          1b> Running a simple test things went all to hell in JMX stuff without the synchronization, apparently the multiple thread problem shows up early and often.
          1c> synchronization is always "interesting", so the more eyes the better.
          1d> In particular, any good suggestions about bailing out of the sleep loop? Since cores can take quite a while to warm, I'm having a hard time thinking of a good default. I suppose another attribute where the provider is mentioned. There's no reason a custom provider has to be present, so requiring a timeout from the provider doesn't seem workable.

          2> I've implemented a trivial CoreDescriptorProvider for a PoC, it's at the bottom. It pre-supposes you have 4 collections, the accompanying Solr.xml is also below.

          3> I'm going to put this away for a couple of hours and come back to it with fresh eyes, this copy is purely for the curious/critical...

          *****sample custom descriptor provider
          public class TestCoreContainerProvider implements CoreDescriptorProvider
          {
          @Override
          public CoreDescriptor getDescriptor(CoreContainer container, String name)

          { if (! "collection2".equals(name) && ! "collection3".equals(name) && ! "collection4".equals(name)) return null; CoreDescriptor cd = new CoreDescriptor(container, name, name); // True hack because I know the dirs are the same as the collection. return cd; }

          @Override
          public boolean isPersist(String s)

          { return false; }

          @Override
          public Collection<String> getCoreNames()

          { return new ArrayList<String>(Arrays.asList("collection2", "collection3", "collection4")); }

          }

                • solr.xml. Note no ZK stuff.
                  <solr persistent="false" sharedLib="../../../../../blahblah/out/artifacts/provider_jar">

          <cores adminPath="/admin/cores" defaultCoreName="collection1" host="$

          {host:}

          " hostPort="$

          {jetty.port:}

          " hostContext="$

          {hostContext:}

          " zkClientTimeout="$

          {zkClientTimeout:15000}

          " coreDescriptorProviderClass="blah.TestCoreContainerProvider">
          <core name="collection1" instanceDir="collection1" />
          </cores>
          </solr>

          Show
          Erick Erickson added a comment - MUST be applied after SOLR-1028 OK, here's a preliminary cut at this, no tests yet, but I was looking at logging and it seems to be doing what I want, putting up for inspection by the curious... A couple of notes: 1> It turns out that to make this work I needed to incorporate SOLR-4013 and SOLR-3980 . I'd appreciate anyone looking at the synchronization I did around the member variable "loadingCores". The intent here is to keep two threads from creating the same core at the same time. 1a> I'm assuming that there is exactly one CoreContainer per JVM. Otherwise I don't understand how any of the synchronization works on the member vars.... 1b> Running a simple test things went all to hell in JMX stuff without the synchronization, apparently the multiple thread problem shows up early and often. 1c> synchronization is always "interesting", so the more eyes the better. 1d> In particular, any good suggestions about bailing out of the sleep loop? Since cores can take quite a while to warm, I'm having a hard time thinking of a good default. I suppose another attribute where the provider is mentioned. There's no reason a custom provider has to be present, so requiring a timeout from the provider doesn't seem workable. 2> I've implemented a trivial CoreDescriptorProvider for a PoC, it's at the bottom. It pre-supposes you have 4 collections, the accompanying Solr.xml is also below. 3> I'm going to put this away for a couple of hours and come back to it with fresh eyes, this copy is purely for the curious/critical... *****sample custom descriptor provider public class TestCoreContainerProvider implements CoreDescriptorProvider { @Override public CoreDescriptor getDescriptor(CoreContainer container, String name) { if (! "collection2".equals(name) && ! "collection3".equals(name) && ! "collection4".equals(name)) return null; CoreDescriptor cd = new CoreDescriptor(container, name, name); // True hack because I know the dirs are the same as the collection. return cd; } @Override public boolean isPersist(String s) { return false; } @Override public Collection<String> getCoreNames() { return new ArrayList<String>(Arrays.asList("collection2", "collection3", "collection4")); } } solr.xml. Note no ZK stuff. <solr persistent="false" sharedLib="../../../../../blahblah/out/artifacts/provider_jar"> <cores adminPath="/admin/cores" defaultCoreName="collection1" host="$ {host:} " hostPort="$ {jetty.port:} " hostContext="$ {hostContext:} " zkClientTimeout="$ {zkClientTimeout:15000} " coreDescriptorProviderClass="blah.TestCoreContainerProvider"> <core name="collection1" instanceDir="collection1" /> </cores> </solr>
          Erick Erickson made changes -
          Assignee Erick Erickson [ erickerickson ]
          Robert Muir made changes -
          Fix Version/s 4.1 [ 12321141 ]
          Fix Version/s 4.0 [ 12314992 ]
          Hoss Man made changes -
          Fix Version/s 3.6 [ 12319065 ]
          Hide
          Hoss Man added a comment -

          Bulk of fixVersion=3.6 -> fixVersion=4.0 for issues that have no assignee and have not been updated recently.

          email notification suppressed to prevent mass-spam
          psuedo-unique token identifying these issues: hoss20120321nofix36

          Show
          Hoss Man added a comment - Bulk of fixVersion=3.6 -> fixVersion=4.0 for issues that have no assignee and have not been updated recently. email notification suppressed to prevent mass-spam psuedo-unique token identifying these issues: hoss20120321nofix36
          Simon Willnauer made changes -
          Fix Version/s 3.6 [ 12319065 ]
          Fix Version/s 3.5 [ 12317876 ]
          Robert Muir made changes -
          Fix Version/s 3.5 [ 12317876 ]
          Fix Version/s 3.4 [ 12316683 ]
          Hide
          Robert Muir added a comment -

          3.4 -> 3.5

          Show
          Robert Muir added a comment - 3.4 -> 3.5
          Robert Muir made changes -
          Fix Version/s 3.4 [ 12316683 ]
          Fix Version/s 4.0 [ 12314992 ]
          Fix Version/s 3.3 [ 12316471 ]
          Robert Muir made changes -
          Fix Version/s 3.3 [ 12316471 ]
          Fix Version/s 3.2 [ 12316172 ]
          Hide
          Robert Muir added a comment -

          Bulk move 3.2 -> 3.3

          Show
          Robert Muir added a comment - Bulk move 3.2 -> 3.3
          Hoss Man made changes -
          Fix Version/s 3.2 [ 12316172 ]
          Fix Version/s Next [ 12315093 ]
          Hide
          Noble Paul added a comment -

          Imagine having 10's of 1000's of such small files in a directory

          Show
          Noble Paul added a comment - Imagine having 10's of 1000's of such small files in a directory
          Hide
          Lance Norskog added a comment -

          Another way to solve this is to stop changing solr.xml. Instead, have a directory full of CORENAME.xml which is the solr.xml just for that core.

          Show
          Lance Norskog added a comment - Another way to solve this is to stop changing solr.xml. Instead, have a directory full of CORENAME.xml which is the solr.xml just for that core.
          Hoss Man made changes -
          Fix Version/s Next [ 12315093 ]
          Fix Version/s 1.5 [ 12313566 ]
          Hide
          Hoss Man added a comment -

          Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email...

          http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

          Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed.

          A unique token for finding these 240 issues in the future: hossversioncleanup20100527

          Show
          Hoss Man added a comment - Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed. A unique token for finding these 240 issues in the future: hossversioncleanup20100527
          Shalin Shekhar Mangar made changes -
          Component/s multicore [ 12313102 ]
          Noble Paul made changes -
          Link This issue relates to SOLR-1293 [ SOLR-1293 ]
          Noble Paul made changes -
          Description Persisting and loading details from one xml is fine if the no:of cores are small and the no:of cores are few/fixed . If there are 10's of thousands of cores in a single box adding a new core (with persistent=true) becomes very expensive because every core creation has to write this huge xml.

          Moreover , there is a good chance that the file gets corrupted and all the cores become unusable . In that case I would prefer it to be stored in a centralized DB which is backed up/replicated and all the information is available in a centralized location.

          We may need to refactor CoreContainer to have a pluggable implementation which can load/persist the details . The default implementation should write/read from/to solr.xml . And the class should be pluggable as follows in solr.xml
          {code:xml}
          <solr dataProvider="com.foo.FoodataProvider">
          </solr>
          {code}
          There will be a new interface (or abstract class ) called SolrDataProvider which this class must implement
          Persisting and loading details from one xml is fine if the no:of cores are small and the no:of cores are few/fixed . If there are 10's of thousands of cores in a single box adding a new core (with persistent=true) becomes very expensive because every core creation has to write this huge xml.

          Moreover , there is a good chance that the file gets corrupted and all the cores become unusable . In that case I would prefer it to be stored in a centralized DB which is backed up/replicated and all the information is available in a centralized location.

          We may need to refactor CoreContainer to have a pluggable implementation which can load/persist the details . The default implementation should write/read from/to solr.xml . And the class should be pluggable as follows in solr.xml
          {code:xml}
          <solr>
            <dataProvider class="com.foo.FooDataProvider" attr1="val1" attr2="val2"/>
          </solr>
          {code}
          There will be a new interface (or abstract class ) called SolrDataProvider which this class must implement
          Shalin Shekhar Mangar made changes -
          Field Original Value New Value
          Summary Support pluggable peristence/loading of solr.xml details Support pluggable persistence/loading of solr.xml details
          Fix Version/s 1.5 [ 12313566 ]
          Noble Paul created issue -

            People

            • Assignee:
              Erick Erickson
              Reporter:
              Noble Paul
            • Votes:
              2 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development