Solr
  1. Solr
  2. SOLR-1293

Support for large no:of cores and faster loading/unloading of cores

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: 4.2
    • Component/s: multicore
    • Labels:
      None

      Description

      Solr , currently ,is not very suitable for a large no:of homogeneous cores where you require fast/frequent loading/unloading of cores . usually a core is required to be loaded just to fire a search query or to just index one document
      The requirements of such a system are.

      • Very efficient loading of cores . Solr cannot afford to read and parse and create Schema, SolrConfig Objects for each core each time the core has to be loaded ( SOLR-919 , SOLR-920)
      • START STOP core . Currently it is only possible to unload a core (SOLR-880)
      • Automatic loading of cores . If a core is present and it is not loaded and a request comes for that load it automatically before serving up a request
      • As there are a large no:of cores , all the cores cannot be kept loaded always. There has to be an upper limit beyond which we need to unload a few cores (probably the least recently used ones)
      • Automatic allotment of dataDir for cores. If the no:of cores is too high al the cores' dataDirs cannot live in the same dir. There is an upper limit on the no:of dirs you can create in a unix dir w/o affecting performance

        Issue Links

          Activity

          Hide
          Noble Paul added a comment -

          The patch is untested. The internal patch we used was on an older svn version . That patch is just merged to trunk and I am submitting that. I plan to test this sometime soon. But for those need this really badly this can be a starting point.

          refer the wiki page for usage details http://wiki.apache.org/solr/LotsOfCores

          Show
          Noble Paul added a comment - The patch is untested. The internal patch we used was on an older svn version . That patch is just merged to trunk and I am submitting that. I plan to test this sometime soon. But for those need this really badly this can be a starting point. refer the wiki page for usage details http://wiki.apache.org/solr/LotsOfCores
          Hide
          Otis Gospodnetic added a comment -

          Do you have any thoughts on handling the situation where each core belongs to a different party and each party has access only to its own core via Solr Admin (i.e. doesn't see all the other cores hosted by the instance)? Only the privileged administrator user can see and access all cores.

          Have you done any work in on this or is this on your TODO?

          Show
          Otis Gospodnetic added a comment - Do you have any thoughts on handling the situation where each core belongs to a different party and each party has access only to its own core via Solr Admin (i.e. doesn't see all the other cores hosted by the instance)? Only the privileged administrator user can see and access all cores. Have you done any work in on this or is this on your TODO?
          Hide
          Noble Paul added a comment -

          access control is not something we were trying to do in Solr. The Solr farm is setup behind our mail servers and it is never exposed outside.

          Show
          Noble Paul added a comment - access control is not something we were trying to do in Solr. The Solr farm is setup behind our mail servers and it is never exposed outside.
          Hide
          Otis Gospodnetic added a comment -

          OK, thanks.
          When you go to your Solr Admin page today, it lists all cores, even if there are 10000 of them?

          Show
          Otis Gospodnetic added a comment - OK, thanks. When you go to your Solr Admin page today, it lists all cores, even if there are 10000 of them?
          Hide
          Noble Paul added a comment -

          yes. if you use the core admin command status , use verbose=false so that you only get the minimum info

          Show
          Noble Paul added a comment - yes. if you use the core admin command status , use verbose=false so that you only get the minimum info
          Hide
          Hoss Man added a comment -

          Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email...

          http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

          Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed.

          A unique token for finding these 240 issues in the future: hossversioncleanup20100527

          Show
          Hoss Man added a comment - Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed. A unique token for finding these 240 issues in the future: hossversioncleanup20100527
          Hide
          Robert Muir added a comment -

          Bulk move 3.2 -> 3.3

          Show
          Robert Muir added a comment - Bulk move 3.2 -> 3.3
          Hide
          Robert Muir added a comment -

          3.4 -> 3.5

          Show
          Robert Muir added a comment - 3.4 -> 3.5
          Hide
          Hoss Man added a comment -

          Bulk of fixVersion=3.6 -> fixVersion=4.0 for issues that have no assignee and have not been updated recently.

          email notification suppressed to prevent mass-spam
          psuedo-unique token identifying these issues: hoss20120321nofix36

          Show
          Hoss Man added a comment - Bulk of fixVersion=3.6 -> fixVersion=4.0 for issues that have no assignee and have not been updated recently. email notification suppressed to prevent mass-spam psuedo-unique token identifying these issues: hoss20120321nofix36
          Hide
          Jose Faria added a comment -

          I do not know if this is the right place, but on the wiki http://wiki.apache.org/solr/LotsOfCores there is a note that this feature will be available at 4.0, but since May it was moved to 4.1

          Show
          Jose Faria added a comment - I do not know if this is the right place, but on the wiki http://wiki.apache.org/solr/LotsOfCores there is a note that this feature will be available at 4.0, but since May it was moved to 4.1
          Hide
          Erick Erickson added a comment -

          Well, I think this JIRA will finally get some action...

          Jose:
          The actual availability of any particular feature is best tracked by the actual JIRA ticket. The "fix version" is usually the earliest possible fix. Not until the resolution is something like "fixed" is the code really in the code line.

          All:
          OK, I'm thinking along these lines. I've started implementation, but wanted to open up the discussion in case I'm going down the wrong path.

          Assumption:
          1> For installations with multiple thousands of cores, provision has to me made for some kind of administrative process, probably an RDBMS that really maintains this information.

          So here's a brief outline of the approach I'm thinking about.
          1> Add an additional optional parameter to the <cores> entry in solr.xml, LRUCacheSize=#. (what default?)
          2> Implement SOLR-1306, allow a data provider to be specified in solr.xml that gives back core descriptions, something like: <coreDescriptorProvider class="com.foo.FooDataProvider" [attr="val"]/> (don't quite know what attrs we want, if any).
          3> Add two optional attributes to individual <core> entries
          a> sticky="true|false". Default to true. Any cores marked with this would never be aged out, essentially treat them just as current.
          b> loadOnStartup="true|false", default to true.
          4> so the process of getting a core would be something like
          a> check the normal list, just like now. If a core was found, return it.
          b> Check the LRU list, if a core was found, return it.
          c> ask the dataprovider (if defined) for the core descriptor. create the core and put it in the LRU list.
          d> remove any core entries over the LRU limit. Any hints on the right cache to use? There's the Lucene LRUCache, ConcurrentLRUCache, the LRUHashMap in lucene that I can't find in any of the compiled jars....). I've got to close the core as it's removed.... It looks like I can use ConcurrentLRUCache and add a listener to close the core when it's removed from the list.

          Processing-wise, in the usual case this would cost an extra check each time a core was fetched. If <a> above failed, we would have to see if the dataprovider was defined before returning null. I don't think that's onerous, the rest of the costs would only be incurred when a dataprovider did exist.

          But one design decisions here is along these lines. What to do with persistence and stickiness? Specifically, if the coreDescriptorProvider gives us a core from, say, an RDBMS, should we allow that core to be persisted into the solr.xml file if they've set persist="true" in solr.xml? I'm thinking that we can make this all work with maximum flexibility if we allow the coreDataProvider to tell us whether we should persist any core currently loaded....

          Anyway, I'll be fleshing this out over the next little while, anybody want to weigh in?

          Erick

          Show
          Erick Erickson added a comment - Well, I think this JIRA will finally get some action... Jose: The actual availability of any particular feature is best tracked by the actual JIRA ticket. The "fix version" is usually the earliest possible fix. Not until the resolution is something like "fixed" is the code really in the code line. All: OK, I'm thinking along these lines. I've started implementation, but wanted to open up the discussion in case I'm going down the wrong path. Assumption: 1> For installations with multiple thousands of cores, provision has to me made for some kind of administrative process, probably an RDBMS that really maintains this information. So here's a brief outline of the approach I'm thinking about. 1> Add an additional optional parameter to the <cores> entry in solr.xml, LRUCacheSize=#. (what default?) 2> Implement SOLR-1306 , allow a data provider to be specified in solr.xml that gives back core descriptions, something like: <coreDescriptorProvider class="com.foo.FooDataProvider" [attr="val"] /> (don't quite know what attrs we want, if any). 3> Add two optional attributes to individual <core> entries a> sticky="true|false". Default to true. Any cores marked with this would never be aged out, essentially treat them just as current. b> loadOnStartup="true|false", default to true. 4> so the process of getting a core would be something like a> check the normal list, just like now. If a core was found, return it. b> Check the LRU list, if a core was found, return it. c> ask the dataprovider (if defined) for the core descriptor. create the core and put it in the LRU list. d> remove any core entries over the LRU limit. Any hints on the right cache to use? There's the Lucene LRUCache, ConcurrentLRUCache, the LRUHashMap in lucene that I can't find in any of the compiled jars....). I've got to close the core as it's removed.... It looks like I can use ConcurrentLRUCache and add a listener to close the core when it's removed from the list. Processing-wise, in the usual case this would cost an extra check each time a core was fetched. If <a> above failed, we would have to see if the dataprovider was defined before returning null. I don't think that's onerous, the rest of the costs would only be incurred when a dataprovider did exist. But one design decisions here is along these lines. What to do with persistence and stickiness? Specifically, if the coreDescriptorProvider gives us a core from, say, an RDBMS, should we allow that core to be persisted into the solr.xml file if they've set persist="true" in solr.xml? I'm thinking that we can make this all work with maximum flexibility if we allow the coreDataProvider to tell us whether we should persist any core currently loaded.... Anyway, I'll be fleshing this out over the next little while, anybody want to weigh in? Erick
          Hide
          Jack Krupansky added a comment -

          an RDBMS

          Is a full RDBMS needed? How about a NoSQL approach... like... um... Solr (or raw Lucene) itself?

          Show
          Jack Krupansky added a comment - an RDBMS Is a full RDBMS needed? How about a NoSQL approach... like... um... Solr (or raw Lucene) itself?
          Hide
          Erick Erickson added a comment -

          I don't care what's used to store the info. The provider that the user provides cares, but that's the point of getting that info through a custom component, Solr doesn't need to know. Nor should it <G>...

          Show
          Erick Erickson added a comment - I don't care what's used to store the info. The provider that the user provides cares, but that's the point of getting that info through a custom component, Solr doesn't need to know. Nor should it <G>...
          Hide
          Noble Paul added a comment - - edited

          Rdbms is not required. We are managing that with the xml itself. Now that we have moved to zookeeper for cloud, we should piggyback on zookeeper for everything

          Show
          Noble Paul added a comment - - edited Rdbms is not required. We are managing that with the xml itself. Now that we have moved to zookeeper for cloud, we should piggyback on zookeeper for everything
          Hide
          Jack Krupansky added a comment -

          Solr doesn't need to know

          True, but what store would you propose using in unit tests? I suppose you could develop a "Mock RDBMS" which could be even simpler than Solr so unit tests don't need a solr running.

          Show
          Jack Krupansky added a comment - Solr doesn't need to know True, but what store would you propose using in unit tests? I suppose you could develop a "Mock RDBMS" which could be even simpler than Solr so unit tests don't need a solr running.
          Hide
          Noble Paul added a comment -

          If you wish to test the zk persistence feature should we just not use an embedded zk?

          Show
          Noble Paul added a comment - If you wish to test the zk persistence feature should we just not use an embedded zk?
          Hide
          Jack Krupansky added a comment -

          piggyback on zookeeper

          That's okay, but zk is optimized for a "small" amount of configuration info - 1 MB limit. Is "large number" times data per core going to be under 1 MB?

          Is "large number" supposed to be hundreds, thousands, tens of thousands, hundreds of thousands, millions, ...? I mean, if a web site had millions of users, could they have one loadable core per user? The use case should be more specific about the goals.

          Show
          Jack Krupansky added a comment - piggyback on zookeeper That's okay, but zk is optimized for a "small" amount of configuration info - 1 MB limit. Is "large number" times data per core going to be under 1 MB? Is "large number" supposed to be hundreds, thousands, tens of thousands, hundreds of thousands, millions, ...? I mean, if a web site had millions of users, could they have one loadable core per user? The use case should be more specific about the goals.
          Hide
          Andrzej Rusin added a comment -

          Whatever would be the storage of the cores info, it would be nice to have some API and/or command line tools for (batch) manipulating the cores; what do you think?

          Show
          Andrzej Rusin added a comment - Whatever would be the storage of the cores info, it would be nice to have some API and/or command line tools for (batch) manipulating the cores; what do you think?
          Hide
          Erick Erickson added a comment -

          Well, I don't think the use-case I'm working on needs an API or command-line tools, so I probably won't be working on it. I'd be glad to commit it in if someone else wanted to do it.

          Show
          Erick Erickson added a comment - Well, I don't think the use-case I'm working on needs an API or command-line tools, so I probably won't be working on it. I'd be glad to commit it in if someone else wanted to do it.
          Hide
          Noble Paul added a comment -

          Is "large number" supposed to be hundreds, thousands, tens of thousands, hundreds of thousands, millions, ...?

          I'll be surprised if it ever crosses a few 10000's . But let us say the upper limit sa a 100000 , shouldn't it be simple to keep in ZK?

          Show
          Noble Paul added a comment - Is "large number" supposed to be hundreds, thousands, tens of thousands, hundreds of thousands, millions, ...? I'll be surprised if it ever crosses a few 10000's . But let us say the upper limit sa a 100000 , shouldn't it be simple to keep in ZK?
          Hide
          Otis Gospodnetic added a comment -

          General comment:
          We may want the index/core re-opener to remain aware of previous locations (nodes) on which cores were opened for the purposes of reusing any possible OS-level caches that may still exist on those nodes for that core. For example, if the cluster has nodes 1-100 and core Foo was on nodes 1, 2, and 3 before it was closed, then maybe next time it needs to be opened it would ideally be opened on those 1, 2, and 3 nodes. Of course, nodes 1, 2, or 3 may no longer be around or may be currently overloaded, or.... in which case alternative nodes need to be picked.

          Show
          Otis Gospodnetic added a comment - General comment: We may want the index/core re-opener to remain aware of previous locations (nodes) on which cores were opened for the purposes of reusing any possible OS-level caches that may still exist on those nodes for that core. For example, if the cluster has nodes 1-100 and core Foo was on nodes 1, 2, and 3 before it was closed, then maybe next time it needs to be opened it would ideally be opened on those 1, 2, and 3 nodes. Of course, nodes 1, 2, or 3 may no longer be around or may be currently overloaded, or.... in which case alternative nodes need to be picked.
          Hide
          Erick Erickson added a comment -

          Otis:

          I'm not sure I understand this. As I'm looking at this particular implementation, all the potential cores (configuration, data files, etc) are already on the particular node, it's just a matter of loading/unloading them. If you're thinking about SolrCloud/ZK, oh my aching head! I guess I'd propose that how this all works with ZK be split off to different tickets all together, too much for me to deal with....

          I'm explicitly thinking of this as having no cluster-awareness, it's all local to a single Solr node. Any meta-level coordination on which node a particular query should be routed to is assumed to be out of scope, at least for this version.

          That said, I can certainly see the value in what you're talking about, that's just not the use-case I'm trying to address....

          Show
          Erick Erickson added a comment - Otis: I'm not sure I understand this. As I'm looking at this particular implementation, all the potential cores (configuration, data files, etc) are already on the particular node, it's just a matter of loading/unloading them. If you're thinking about SolrCloud/ZK, oh my aching head! I guess I'd propose that how this all works with ZK be split off to different tickets all together, too much for me to deal with.... I'm explicitly thinking of this as having no cluster-awareness, it's all local to a single Solr node. Any meta-level coordination on which node a particular query should be routed to is assumed to be out of scope, at least for this version. That said, I can certainly see the value in what you're talking about, that's just not the use-case I'm trying to address....
          Hide
          Erick Erickson added a comment -

          I've implemented some parts of this (SOLR-880, SOLR-1028), I should be checking them in sometime relatively soon, then on to some other JIRAs related to this one. But I got to thinking that maybe what we really want is two new characteristics for cores, call the loadOnStartup(T|F, default T) and sticky(T|F, default T).

          What I've done so far conflates the two ideas; things loaded "lazily" are assumed to be NOT sticky and there's really no reason to conflate them. Use cases are

          LOS=T, STICKY=T - really, what we have now. Pay the penalty on startup for loading the core at startup in exchange for speed later.

          LOS=T, STICKY=F - load on startup, but allow the core to be automatically unloaded later. For preloading expected 'hot' cores. Cores are unloaded on an LRU basis. NOTE: a core can be unloaded and then loaded again later if it's referenced.

          LOS=F, STICKY=T - Defer loading the core, but once it's loaded, keep it loaded. Get's us started fast, amortizes loading the core. This one I actually expect to be the least useful, but it's a consequence of the others and doesn't cost anything extra to implement coding-wise.

          LOS=F, STICKY=F - what I was originally thinking of as "lazy loading". Cores get loaded when first referenced, and swapped out on an LRU algorithm.

          Looking at what I've done on the two JIRA's mentioned, this is actually not at all difficult, just a matter of putting the CoreConfig in the right list...

          So, if any STICKY=F is found, there's a LRU cache created (actually a LinkedHashMap with removeEldestEntry overridden), with an optional size specified in the <cores...> tag. I'd guess I'll default it to 100 or some such if (and only if) there's at least one STICKY=F defined but no cache size in <cores...>. Of course if the user defined cacheSize in <cores...>, I'd allocate the cache up front.

          Thoughts?

          Show
          Erick Erickson added a comment - I've implemented some parts of this ( SOLR-880 , SOLR-1028 ), I should be checking them in sometime relatively soon, then on to some other JIRAs related to this one. But I got to thinking that maybe what we really want is two new characteristics for cores, call the loadOnStartup(T|F, default T) and sticky(T|F, default T). What I've done so far conflates the two ideas; things loaded "lazily" are assumed to be NOT sticky and there's really no reason to conflate them. Use cases are LOS=T, STICKY=T - really, what we have now. Pay the penalty on startup for loading the core at startup in exchange for speed later. LOS=T, STICKY=F - load on startup, but allow the core to be automatically unloaded later. For preloading expected 'hot' cores. Cores are unloaded on an LRU basis. NOTE: a core can be unloaded and then loaded again later if it's referenced. LOS=F, STICKY=T - Defer loading the core, but once it's loaded, keep it loaded. Get's us started fast, amortizes loading the core. This one I actually expect to be the least useful, but it's a consequence of the others and doesn't cost anything extra to implement coding-wise. LOS=F, STICKY=F - what I was originally thinking of as "lazy loading". Cores get loaded when first referenced, and swapped out on an LRU algorithm. Looking at what I've done on the two JIRA's mentioned, this is actually not at all difficult, just a matter of putting the CoreConfig in the right list... So, if any STICKY=F is found, there's a LRU cache created (actually a LinkedHashMap with removeEldestEntry overridden), with an optional size specified in the <cores...> tag. I'd guess I'll default it to 100 or some such if (and only if) there's at least one STICKY=F defined but no cache size in <cores...>. Of course if the user defined cacheSize in <cores...>, I'd allocate the cache up front. Thoughts?
          Hide
          Noble Paul added a comment -

          the combination of LOS, STICKY and their defaults looks fine to me

          Show
          Noble Paul added a comment - the combination of LOS, STICKY and their defaults looks fine to me
          Hide
          Erick Erickson added a comment -

          About persistence. And about coreDescriptorProvider in general. If one is supplied, I'm thinking that it will always have first crack at most things having to do with getting a CoreDescriptor. For instance:
          1> When persisting a core, ask the coreDescriptor whether to persist to solr.xml.
          2> When listing cores, give preference to any descriptor the coreDescriptor knows about. I.e. override the ones in any CoreContainer lists with ones from the provider.
          3> ???

          The mind-set here is that, if a CoreDescriptorProvider is present, it should be the arbiter of relevant decisions about that core. We can default to reasonable stuff (e.g. default behavior for CoreDescriptorProvider.shouldPersist(String coreName) is to return false)

          I'm seeing one other thing related to persistence, NOT having to do with a CoreDescriptorProvider. Since CoreContainer now has two new params, loadOnStartup=true and swappable=false magically show up in the persisted file if they aren't specified. It would be a bit more aesthetic to only show what was specified by the user, but I'm not sure it's worth any effort, and it appears that this is true for some other properties as well.

          Show
          Erick Erickson added a comment - About persistence. And about coreDescriptorProvider in general. If one is supplied, I'm thinking that it will always have first crack at most things having to do with getting a CoreDescriptor. For instance: 1> When persisting a core, ask the coreDescriptor whether to persist to solr.xml. 2> When listing cores, give preference to any descriptor the coreDescriptor knows about. I.e. override the ones in any CoreContainer lists with ones from the provider. 3> ??? The mind-set here is that, if a CoreDescriptorProvider is present, it should be the arbiter of relevant decisions about that core. We can default to reasonable stuff (e.g. default behavior for CoreDescriptorProvider.shouldPersist(String coreName) is to return false) I'm seeing one other thing related to persistence, NOT having to do with a CoreDescriptorProvider. Since CoreContainer now has two new params, loadOnStartup=true and swappable=false magically show up in the persisted file if they aren't specified. It would be a bit more aesthetic to only show what was specified by the user, but I'm not sure it's worth any effort, and it appears that this is true for some other properties as well.
          Hide
          Erick Erickson added a comment -

          All the functionality here is part of other JIRAs, e.g. SOLR-4196, SOLR-4478 and the like.

          Show
          Erick Erickson added a comment - All the functionality here is part of other JIRAs, e.g. SOLR-4196 , SOLR-4478 and the like.
          Hide
          Uwe Schindler added a comment -

          Closed after release.

          Show
          Uwe Schindler added a comment - Closed after release.

            People

            • Assignee:
              Erick Erickson
              Reporter:
              Noble Paul
            • Votes:
              16 Vote for this issue
              Watchers:
              21 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development