Solr
  1. Solr
  2. SOLR-1449

solrconfig.xml syntax to add classpath elements from outside of instanceDir

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4
    • Component/s: None
    • Labels:
      None

      Description

      the idea has been discussed numerous times that it would be nice if there was a way to configure a core to load plugins from specific jars (or "classes" style directories) by path w/o needing to copy them to the "./lib" dir in the instanceDir.

      The current workaround is "symlinks" but that doesn't really help the situation of the Solr Release artifacts, where we wind up making numerous copies of jars to support multiple example directories (you can't have reliable symlinks in zip files)

      1. SOLR-1449.patch
        11 kB
        Hoss Man
      2. SOLR-1449.patch
        13 kB
        Hoss Man
      3. SOLR-1449.patch
        20 kB
        Hoss Man
      4. SOLR-1449.patch
        26 kB
        Hoss Man
      5. SOLR-1449.patch
        28 kB
        Hoss Man
      6. SOLR-1449.patch
        28 kB
        Hoss Man
      7. SOLR-1449.patch
        30 kB
        Hoss Man
      8. SOLR-1449.patch
        27 kB
        Mark Miller
      9. SOLR-1449.patch
        29 kB
        Hoss Man
      10. SOLR-1449.patch
        26 kB
        Noble Paul
      11. SOLR-1449.patch
        38 kB
        Yonik Seeley

        Issue Links

          Activity

          Hide
          Hoss Man added a comment -

          Example of how this might work. It provides support for "regex" based jar selectors in a specified directory (so we can point at jars w/o hardcoding version numbers all over the place) and leaves the door open to a simpler glob style syntax to be added later.

          syntax...

            <!-- all jars in a directory -->
            <lib dir="../../dist/solr-cell-lib/" />
            <!-- any jar in the directory matching the regex -->
            <lib dir="../../dist/" regex="apache-solr-cell-(\d|\.)+-.*\.jar" />
          

          Internally the parsing code is confined to SolrConfig.xml, but a new method has been added to SolrResourceLoader that repalces the ClassLoader with a new one where the parent is fixed on the previous classloader – it seems a little sketchy, and i would have much rather had SOlrConfig use some new ClassLoaderUtils to build up a comprehensive ClassLoader and then provide it when constructing SolrResourceLoader – expect SolrConfig doesn't get to construct SOlrResourceLoader.

          The patch modifies the build.xml file just enough to demonstrate using the solr-cell libs into the example w/o copying them, but i didn't systematicly fix all the other palces we could use this.

          I also didn't test the patch exhaustively.

          Note: I've marked this issue as Fix for 1.4 to try and give it some visibility before the release since there seems to be some interest in getting the release size smaller – but unless adequate eyeballs give it a thumbs up and help do some more rigerous testing i'm leary to try and inlcude it now. (The term "Classloader Hell" exists for a reason – the upside is i'm 95% sure that even if there winds up being a bug in the nested classloaders, anyone using the existing method (ie: copies their libs) should still work, since that code path should be the same.

          Show
          Hoss Man added a comment - Example of how this might work. It provides support for "regex" based jar selectors in a specified directory (so we can point at jars w/o hardcoding version numbers all over the place) and leaves the door open to a simpler glob style syntax to be added later. syntax... <!-- all jars in a directory --> <lib dir="../../dist/solr-cell-lib/" /> <!-- any jar in the directory matching the regex --> <lib dir="../../dist/" regex="apache-solr-cell-(\d|\.)+-.*\.jar" /> Internally the parsing code is confined to SolrConfig.xml, but a new method has been added to SolrResourceLoader that repalces the ClassLoader with a new one where the parent is fixed on the previous classloader – it seems a little sketchy, and i would have much rather had SOlrConfig use some new ClassLoaderUtils to build up a comprehensive ClassLoader and then provide it when constructing SolrResourceLoader – expect SolrConfig doesn't get to construct SOlrResourceLoader. The patch modifies the build.xml file just enough to demonstrate using the solr-cell libs into the example w/o copying them, but i didn't systematicly fix all the other palces we could use this. I also didn't test the patch exhaustively. Note: I've marked this issue as Fix for 1.4 to try and give it some visibility before the release since there seems to be some interest in getting the release size smaller – but unless adequate eyeballs give it a thumbs up and help do some more rigerous testing i'm leary to try and inlcude it now. (The term "Classloader Hell" exists for a reason – the upside is i'm 95% sure that even if there winds up being a bug in the nested classloaders, anyone using the existing method (ie: copies their libs) should still work, since that code path should be the same.
          Hide
          Shalin Shekhar Mangar added a comment -

          Note: I've marked this issue as Fix for 1.4 to try and give it some visibility before the release since there seems to be some interest in getting the release size smaller

          I haven't seen the patch yet. But I do not understand why we need to copy the jars in the released artifact. We can have some instructions or scripts (I assume we don't want ant) to copy the jars to the correct places for running solr-cell examples. Is there a problem with that?

          Show
          Shalin Shekhar Mangar added a comment - Note: I've marked this issue as Fix for 1.4 to try and give it some visibility before the release since there seems to be some interest in getting the release size smaller I haven't seen the patch yet. But I do not understand why we need to copy the jars in the released artifact. We can have some instructions or scripts (I assume we don't want ant) to copy the jars to the correct places for running solr-cell examples. Is there a problem with that?
          Hide
          Yonik Seeley added a comment -

          We can have some instructions or scripts (I assume we don't want ant) to copy the jars to the correct places for running solr-cell examples. Is there a problem with that?

          For the same reason we have an example at all I guess - a better out-of-the-box experience.

          IMO, Solr is an enterprise search server, not a bunch of jars you wire together yourself to create one. It would be nice if we could do database import from the same example server... but I think JDBC driver issues (and the necessity to have a database to connect to?) make this tougher.

          Show
          Yonik Seeley added a comment - We can have some instructions or scripts (I assume we don't want ant) to copy the jars to the correct places for running solr-cell examples. Is there a problem with that? For the same reason we have an example at all I guess - a better out-of-the-box experience. IMO, Solr is an enterprise search server, not a bunch of jars you wire together yourself to create one. It would be nice if we could do database import from the same example server... but I think JDBC driver issues (and the necessity to have a database to connect to?) make this tougher.
          Hide
          Shalin Shekhar Mangar added a comment -

          For the same reason we have an example at all I guess - a better out-of-the-box experience.

          I agree with that goal. It's just that having a regex based class loading from externally located jars, although cool, seems an overly fancy way to solve this particular problem.

          It would be nice if we could do database import from the same example server... but I think JDBC driver issues (and the necessity to have a database to connect to?) make this tougher.

          The database and the driver jar is checked in. The only issue I guess is that our example is a single-core one and the DIH examples have db, rss and mail each of which requires its own schema.

          The main problem is the mail DIH example. It uses Tika for indexing email attachments and therefore copies all its dependencies. I'm not sure if that is necessary. We can have an option to disable that and document the attachment support in the wiki only. I'm guessing that very few users will actually need to look at that example.

          The other thing that we can do is to offer separate source and binary+example downloads.

          Show
          Shalin Shekhar Mangar added a comment - For the same reason we have an example at all I guess - a better out-of-the-box experience. I agree with that goal. It's just that having a regex based class loading from externally located jars, although cool, seems an overly fancy way to solve this particular problem. It would be nice if we could do database import from the same example server... but I think JDBC driver issues (and the necessity to have a database to connect to?) make this tougher. The database and the driver jar is checked in. The only issue I guess is that our example is a single-core one and the DIH examples have db, rss and mail each of which requires its own schema. The main problem is the mail DIH example. It uses Tika for indexing email attachments and therefore copies all its dependencies. I'm not sure if that is necessary. We can have an option to disable that and document the attachment support in the wiki only. I'm guessing that very few users will actually need to look at that example. The other thing that we can do is to offer separate source and binary+example downloads.
          Hide
          Hoss Man added a comment -

          I feel like these comments aren't really on topic ... this type of feature has been discussed numerous times on the list as a desirable way for users to specify where a core should look to load plugins without needing to copy jars to a specific hardcoded directory – particularly when people want to use the same plugins in multiple cores (that have unique instanceDirs)

          Independent of any question of what our releases look like, and what kinds of examples we want to include, these is still a feature that users have been requesting.

          My point about our examples was merely that an additional use case that would be improved by this feature is that we would be able to have a single copy of any jar in the release, no matter how many examples wanted to refer to it (which would also make it easier to add examples that would work "as is" without debating the potential increase n the size of the release, or wether we should ask the user to copy the jars, etc...

          It's just that having a regex based class loading from externally located jars, although cool, seems an overly fancy way to solve this particular problem.

          As i already said: that wasn't a particular goal, just a nice side effect – but FWIW: the regex based loading was actually the easiest way i found to identify multiple jars – i was looking for an easy way to do shell style fileglobbing in java and nothing jumped out at me, java.util.regex on the other hand is a stock and easy to use.

          Show
          Hoss Man added a comment - I feel like these comments aren't really on topic ... this type of feature has been discussed numerous times on the list as a desirable way for users to specify where a core should look to load plugins without needing to copy jars to a specific hardcoded directory – particularly when people want to use the same plugins in multiple cores (that have unique instanceDirs) Independent of any question of what our releases look like, and what kinds of examples we want to include, these is still a feature that users have been requesting. My point about our examples was merely that an additional use case that would be improved by this feature is that we would be able to have a single copy of any jar in the release, no matter how many examples wanted to refer to it (which would also make it easier to add examples that would work "as is" without debating the potential increase n the size of the release, or wether we should ask the user to copy the jars, etc... It's just that having a regex based class loading from externally located jars, although cool, seems an overly fancy way to solve this particular problem. As i already said: that wasn't a particular goal, just a nice side effect – but FWIW: the regex based loading was actually the easiest way i found to identify multiple jars – i was looking for an easy way to do shell style fileglobbing in java and nothing jumped out at me, java.util.regex on the other hand is a stock and easy to use.
          Hide
          Shalin Shekhar Mangar added a comment -

          particularly when people want to use the same plugins in multiple cores (that have unique instanceDirs)

          That is taken care of by the sharedLib attribute in solr.xml

          Show
          Shalin Shekhar Mangar added a comment - particularly when people want to use the same plugins in multiple cores (that have unique instanceDirs) That is taken care of by the sharedLib attribute in solr.xml
          Hide
          Yonik Seeley added a comment -

          repalces the ClassLoader with a new one where the parent is fixed on the previous classloader

          I'm not a classloading expert... but couldn't this cause things to work differently by changing the order of the <lib> statements?

          Show
          Yonik Seeley added a comment - repalces the ClassLoader with a new one where the parent is fixed on the previous classloader I'm not a classloading expert... but couldn't this cause things to work differently by changing the order of the <lib> statements?
          Hide
          Hoss Man added a comment -

          That is taken care of by the sharedLib attribute in solr.xml

          sharedLib only helps if you want the same plugins in every core (not just to reuse the same plugins in multiple cores. There's also no way to reload a plugin from the sharedLib (whereas the normal lib for a core does get reloaded when the core is reloaded)

          I'm not a classloading expert... but couldn't this cause things to work differently by changing the order of the <lib> statements?

          Excellent point ... yes you have to declare things in a dependency order. This didn't seem like a big deal to me when i first wrote it, but i can see how that could prove very tedious, it could make it impossible to use some sets of jars without having them in a specific set of directories.

          The other option was to maintain a list of all the jars in parallel with the "current" classloader, and replace the classloader anytime new jars were added – this seemed like a bigger evil to me originally because it made the behavior really unpredictable if someone tried to use the class loader in the middle of adding more jars (because there would be classes that had been loaded by a classloader that was no longer in use) but frankly: i don't see how the behavior would really be any weirder then what would currently happen, and it would certainly be nice to make it less dependent on the order things were declared.

          I'll try to get another patch up today.

          Show
          Hoss Man added a comment - That is taken care of by the sharedLib attribute in solr.xml sharedLib only helps if you want the same plugins in every core (not just to reuse the same plugins in multiple cores. There's also no way to reload a plugin from the sharedLib (whereas the normal lib for a core does get reloaded when the core is reloaded) I'm not a classloading expert... but couldn't this cause things to work differently by changing the order of the <lib> statements? Excellent point ... yes you have to declare things in a dependency order. This didn't seem like a big deal to me when i first wrote it, but i can see how that could prove very tedious, it could make it impossible to use some sets of jars without having them in a specific set of directories. The other option was to maintain a list of all the jars in parallel with the "current" classloader, and replace the classloader anytime new jars were added – this seemed like a bigger evil to me originally because it made the behavior really unpredictable if someone tried to use the class loader in the middle of adding more jars (because there would be classes that had been loaded by a classloader that was no longer in use) but frankly: i don't see how the behavior would really be any weirder then what would currently happen, and it would certainly be nice to make it less dependent on the order things were declared. I'll try to get another patch up today.
          Hide
          Shalin Shekhar Mangar added a comment -

          sharedLib only helps if you want the same plugins in every core (not just to reuse the same plugins in multiple cores. There's also no way to reload a plugin from the sharedLib (whereas the normal lib for a core does get reloaded when the core is reloaded)

          OK, that's true.

          As i already said: that wasn't a particular goal, just a nice side effect - but FWIW: the regex based loading was actually the easiest way i found to identify multiple jar

          I understand you want to make a core's lib directory configurable. That is fine. It is the regex based loading that makes me nervous. Why not load all jars in a specified directory?

          Show
          Shalin Shekhar Mangar added a comment - sharedLib only helps if you want the same plugins in every core (not just to reuse the same plugins in multiple cores. There's also no way to reload a plugin from the sharedLib (whereas the normal lib for a core does get reloaded when the core is reloaded) OK, that's true. As i already said: that wasn't a particular goal, just a nice side effect - but FWIW: the regex based loading was actually the easiest way i found to identify multiple jar I understand you want to make a core's lib directory configurable. That is fine. It is the regex based loading that makes me nervous. Why not load all jars in a specified directory?
          Hide
          Hoss Man added a comment -

          I understand you want to make a core's lib directory configurable. That is fine. It is the regex based loading that makes me nervous. Why not load all jars in a specified directory?

          The way you worded that is part of what i'm trying to fix: "a core's lib directroy" ... i'm not trying to make the path of a single directory configurable, i'm trying to eliminate the idea that there is a single "lib" dir for a core – people should be able to load any jars they want, from anywhere they want.

          The regex based selection already supported loading all files in a directory – that's the default behavior if no regex is selected – but in general there are two main reasons i can think of why we should have regex (or glob) support...

          1) so people don't have to load every jar in a directory just to get some of those jars.

          slurping up every file in ./lib was fine when we were forcing people to manually create a directory inside the instance dir (if you don't want it loaded don't copy it), but if we're trying to be flexible and allow people to point at jars anywhere then we shouldn't hobble them by making them get everything in a directory. it should be possible to have an assload of jars in a big directory and say i only want this subset for this core.

          2) so people don't have to hardcode jar version info.

          people should have an easy way to say "load the tika jar" in their config, and it should still work even if they delete apache-tika-0.4.jar and repalce it with apache-tika-0.5.jar. I don't want to make them hardcode names.

          I'm not sure why regex matching on filenames makes you "nervous" ... my best guess is that you're worried regex mistakes silently not loading the files (since a regex was the only way to laod a single jar explicitly) so in this newest patch i've added support for a "path" options instead of dir+regex ... if an explicitly path is used and it can't be found it's logged as an error....

            <!-- all files in dir -->
            <lib dir="../../dist/solr-cell-lib/" />
            <!-- all files in dir matching the regex -->
            <lib dir="../../dist/" regex="apache-solr-cell-(\d|\.)+-.*\.jar" />
            <!-- nothing matches, so will be ignored -->
            <lib dir="/total/crap/dir/ignored" /> 
            <!-- an exact path (jar or classes dir), will log error if not found -->
            <lib path="../a-jar-that-does-not-exist.jar" /> 
          

          This patch also makes the changes i described in my previous comment (removing the dependency on the order that the 'lib' dirs are declared)

          Show
          Hoss Man added a comment - I understand you want to make a core's lib directory configurable. That is fine. It is the regex based loading that makes me nervous. Why not load all jars in a specified directory? The way you worded that is part of what i'm trying to fix: "a core's lib directroy" ... i'm not trying to make the path of a single directory configurable, i'm trying to eliminate the idea that there is a single "lib" dir for a core – people should be able to load any jars they want, from anywhere they want. The regex based selection already supported loading all files in a directory – that's the default behavior if no regex is selected – but in general there are two main reasons i can think of why we should have regex (or glob) support... 1) so people don't have to load every jar in a directory just to get some of those jars. slurping up every file in ./lib was fine when we were forcing people to manually create a directory inside the instance dir (if you don't want it loaded don't copy it), but if we're trying to be flexible and allow people to point at jars anywhere then we shouldn't hobble them by making them get everything in a directory. it should be possible to have an assload of jars in a big directory and say i only want this subset for this core. 2) so people don't have to hardcode jar version info. people should have an easy way to say "load the tika jar" in their config, and it should still work even if they delete apache-tika-0.4.jar and repalce it with apache-tika-0.5.jar. I don't want to make them hardcode names. — I'm not sure why regex matching on filenames makes you "nervous" ... my best guess is that you're worried regex mistakes silently not loading the files (since a regex was the only way to laod a single jar explicitly) so in this newest patch i've added support for a "path" options instead of dir+regex ... if an explicitly path is used and it can't be found it's logged as an error.... <!-- all files in dir --> <lib dir="../../dist/solr-cell-lib/" /> <!-- all files in dir matching the regex --> <lib dir="../../dist/" regex="apache-solr-cell-(\d|\.)+-.*\.jar" /> <!-- nothing matches, so will be ignored --> <lib dir="/total/crap/dir/ignored" /> <!-- an exact path (jar or classes dir), will log error if not found --> <lib path="../a-jar-that-does-not-exist.jar" /> This patch also makes the changes i described in my previous comment (removing the dependency on the order that the 'lib' dirs are declared)
          Hide
          Noble Paul added a comment - - edited

          I have a few questions in mind.

          • Is this an issue which users have reported? in my experience with Solr mailing list, I am yet to see a request where users wish to add arbitrary directories to classpath
          • How important is this feature to be in 1.4?
          • Users in general have a lot of problems with classloading. Even with the current support with one lib directory I have seen so many users having trouble with classloading . This can only add to that confusion

          I am not aware of any project which allows this level of configurability for classpath. Most of the users never have to write custom components for Solr. In our organization, I have encountered very few cases where they needed to add custom jars to classpath. Even in cases where they did , they were some trivial jars and it can be put into solr_home/lib anyway.

          I am -1 on adding this to 1.4.

          Show
          Noble Paul added a comment - - edited I have a few questions in mind. Is this an issue which users have reported? in my experience with Solr mailing list, I am yet to see a request where users wish to add arbitrary directories to classpath How important is this feature to be in 1.4? Users in general have a lot of problems with classloading. Even with the current support with one lib directory I have seen so many users having trouble with classloading . This can only add to that confusion I am not aware of any project which allows this level of configurability for classpath. Most of the users never have to write custom components for Solr. In our organization, I have encountered very few cases where they needed to add custom jars to classpath. Even in cases where they did , they were some trivial jars and it can be put into solr_home/lib anyway. I am -1 on adding this to 1.4.
          Hide
          Erik Hatcher added a comment -

          I haven't tried out Hoss' patch, but based on the write-up, I'm +1.

          Having a configurable list of directories to load from will clean up our examples, reduce the size (or installation steps) of our example app. It will allow various plugins to live in one place and be referred to without having to copy files all over the place.

          I think this is an important feature to get into 1.4.

          Show
          Erik Hatcher added a comment - I haven't tried out Hoss' patch, but based on the write-up, I'm +1. Having a configurable list of directories to load from will clean up our examples, reduce the size (or installation steps) of our example app. It will allow various plugins to live in one place and be referred to without having to copy files all over the place. I think this is an important feature to get into 1.4.
          Hide
          Noble Paul added a comment -

          Let us get the facts. We may be barking up the wrong tree.

          we have a 116MB distribution out of which the solr.war + single+multicore example is around 4MB . Most of the users need this 4MB only (99% of the users do not need clustering + solr cell ). If we implement this issue we may cut down the size of the distro by around 20 MB (by eliminating duplication of tika jars). What we should have is a lighter version (solr.war + example solr home . <4MB and a full version.

          I am sure most of the users will be happy with the minimal solr. The rest of them will happily download the whole thing however big it is.

          This is not to say that we don't need to reduce the size of the distro. But adding complexity just for this is not really required. Just adding a .sh and .bat file to the tika example we can add jars from external path.

          Show
          Noble Paul added a comment - Let us get the facts. We may be barking up the wrong tree. we have a 116MB distribution out of which the solr.war + single+multicore example is around 4MB . Most of the users need this 4MB only (99% of the users do not need clustering + solr cell ). If we implement this issue we may cut down the size of the distro by around 20 MB (by eliminating duplication of tika jars). What we should have is a lighter version (solr.war + example solr home . <4MB and a full version. I am sure most of the users will be happy with the minimal solr. The rest of them will happily download the whole thing however big it is. This is not to say that we don't need to reduce the size of the distro. But adding complexity just for this is not really required. Just adding a .sh and .bat file to the tika example we can add jars from external path.
          Hide
          Hoss Man added a comment -

          Added tests for all the permutations of the solrconfig.xml syntax.

          Also fixed a missing dependency in the build.xml that i didn't notice before because i hadn't tried "ant clean test" (previously test depending on a special target for copying the solr cell libs so the SolrExample*Tests would find them, now it just depends on dist-contrib to ensure all the contrib jars/dependencies are in the expected place)

          Show
          Hoss Man added a comment - Added tests for all the permutations of the solrconfig.xml syntax. Also fixed a missing dependency in the build.xml that i didn't notice before because i hadn't tried "ant clean test" (previously test depending on a special target for copying the solr cell libs so the SolrExample*Tests would find them, now it just depends on dist-contrib to ensure all the contrib jars/dependencies are in the expected place)
          Hide
          Hoss Man added a comment -

          Is this an issue which users have reported? in my experience with Solr mailing list, I am yet to see a request where users wish to add arbitrary directories to classpath

          I don't really feel like searching through the archives at the moment, but it has come up – i don't know if anyone has explicitly requested the ability to add arbitrary directories, but there have certainly been discussions about the annoyance of needing to copy and/or symlink jars.

          If nothing else: I'm a user, and i'm requesting it.

          How important is this feature to be in 1.4?

          As i said in my first comment i don't know.

          It would be nice to have, but i certainly don't think it's a blocker ... even with the testing i've done, and even with the new tests i added to the patch, and even though the behavior for existing ./lib users hasn't been changed, i still wouldn't consider committing unless other people try it out and gave it a thumbs up.

          Users in general have a lot of problems with classloading. Even with the current support with one lib directory I have seen so many users having trouble with classloading . This can only add to that confusion

          I don't really see how this will make confusion about classloading any worse. most problems i can think of where people have classloader difficulty in solr stem from not understanding where they are suppose to copy their jars – they tend to get confused about which "lib" directory, particularly with example/lib containing the jetty jars. Allowing people to put the jar any where they want and point at it by name in the config file should reduce confusion.

          Besides which: they're still free to create a ./lib dir and copy jars – that still works, no configuration needed.

          I agree that the original patch (with the order in the config mattering) would have been confusing for people, but with the more recent patches where all jars are in the same classloader i can't imagine any situation where this will cause more problems/confusion then forcing people to make a lib dir.

          I am sure most of the users will be happy with the minimal solr. The rest of them will happily download the whole thing however big it is.

          I REALLY don't want to argue the merrits of this issue as if it's purpose was to decrease the size of the distribution – it was not the purpose, it's just a possible additional benefit – but i HAVE to disagree with you on this ... most users may only need a minimal solr, but we should not passively discourage people from finding features that can make them happier by making those features more complex to get (via an alternate larger download).

          Adding this feature, and using it to reduce the size of the current examples may not reduce the size of the current distribution enough to be worth worrying about, that's fine. But i'm trying to thing longer term: there have been multiple threads discussing the goal of adding many more example directories demonstrating cool use cases of solr via interesting permutations of features (DIH, clustering, solr cell, velocity, etc...). This patch (or something like it) is going to be necessary before we can do anything like that.

          Show
          Hoss Man added a comment - Is this an issue which users have reported? in my experience with Solr mailing list, I am yet to see a request where users wish to add arbitrary directories to classpath I don't really feel like searching through the archives at the moment, but it has come up – i don't know if anyone has explicitly requested the ability to add arbitrary directories, but there have certainly been discussions about the annoyance of needing to copy and/or symlink jars. If nothing else: I'm a user, and i'm requesting it. How important is this feature to be in 1.4? As i said in my first comment i don't know. It would be nice to have, but i certainly don't think it's a blocker ... even with the testing i've done, and even with the new tests i added to the patch, and even though the behavior for existing ./lib users hasn't been changed, i still wouldn't consider committing unless other people try it out and gave it a thumbs up. Users in general have a lot of problems with classloading. Even with the current support with one lib directory I have seen so many users having trouble with classloading . This can only add to that confusion I don't really see how this will make confusion about classloading any worse. most problems i can think of where people have classloader difficulty in solr stem from not understanding where they are suppose to copy their jars – they tend to get confused about which "lib" directory, particularly with example/lib containing the jetty jars. Allowing people to put the jar any where they want and point at it by name in the config file should reduce confusion. Besides which: they're still free to create a ./lib dir and copy jars – that still works, no configuration needed. I agree that the original patch (with the order in the config mattering) would have been confusing for people, but with the more recent patches where all jars are in the same classloader i can't imagine any situation where this will cause more problems/confusion then forcing people to make a lib dir. I am sure most of the users will be happy with the minimal solr. The rest of them will happily download the whole thing however big it is. I REALLY don't want to argue the merrits of this issue as if it's purpose was to decrease the size of the distribution – it was not the purpose, it's just a possible additional benefit – but i HAVE to disagree with you on this ... most users may only need a minimal solr, but we should not passively discourage people from finding features that can make them happier by making those features more complex to get (via an alternate larger download). Adding this feature, and using it to reduce the size of the current examples may not reduce the size of the current distribution enough to be worth worrying about, that's fine. But i'm trying to thing longer term: there have been multiple threads discussing the goal of adding many more example directories demonstrating cool use cases of solr via interesting permutations of features (DIH, clustering, solr cell, velocity, etc...). This patch (or something like it) is going to be necessary before we can do anything like that.
          Hide
          Mark Miller added a comment -

          +1. I don't find the pluses overwhelming, but I find pluses. On the other hand, after reading all the comments, I don't see a single good reason against.

          Its a no brainer to me. I think I might have some time to play with it tomorrow. Will report back if I do.

          Show
          Mark Miller added a comment - +1. I don't find the pluses overwhelming, but I find pluses. On the other hand, after reading all the comments, I don't see a single good reason against. Its a no brainer to me. I think I might have some time to play with it tomorrow. Will report back if I do.
          Hide
          Hoss Man added a comment -

          checkpoint patch, new additions...

          1. updated TestConfig to also assert that the existing automatic lib dir support works (i don't see any evidence that we already had a test for this)
          2. replaced the TODO's in the example solrconfig.xml with documentation explaining hte new <lib> directives.
          3. updated the README in the example solr home so that the note about ./lib also mentions the <lib> config option
          4. updated the build.xml and example/solrconfig.xml for the clusturing contrib so that they don't copy just for the example
          5. fixed a small bug in the clustering example schema (that example reuses the main example docs, but wasn't ignoring unexpected fields which have since been added to those docs on the trunk since the clustering example was added)

          Show
          Hoss Man added a comment - checkpoint patch, new additions... 1. updated TestConfig to also assert that the existing automatic lib dir support works (i don't see any evidence that we already had a test for this) 2. replaced the TODO's in the example solrconfig.xml with documentation explaining hte new <lib> directives. 3. updated the README in the example solr home so that the note about ./lib also mentions the <lib> config option 4. updated the build.xml and example/solrconfig.xml for the clusturing contrib so that they don't copy just for the example 5. fixed a small bug in the clustering example schema (that example reuses the main example docs, but wasn't ignoring unexpected fields which have since been added to those docs on the trunk since the clustering example was added)
          Hide
          Yonik Seeley added a comment -

          <lib dir="../../dist/"

          As you say, layout is a separate issue from the ability to add multiple lib entries - but should we reference the libs in contrib (where they are actually checked in) rather than dist? "ant example" doesn't currently copy anything to dist... and avoiding another copy of those libraries would be nice too.

          Show
          Yonik Seeley added a comment - <lib dir="../../dist/" As you say, layout is a separate issue from the ability to add multiple lib entries - but should we reference the libs in contrib (where they are actually checked in) rather than dist? "ant example" doesn't currently copy anything to dist... and avoiding another copy of those libraries would be nice too.
          Hide
          Hoss Man added a comment -

          minor update to the previous patch...

          1) the patch command didn't like the empty files, so now they have a single newline in them.
          2) because of a typo, javadoc-all on the trunk is getting lucking and finding the download clustering jars only if they've already been copied to the example/lib ... since the patch stops the copying to example/lib i fixed the topy so it finds them in lib/downloads

          Show
          Hoss Man added a comment - minor update to the previous patch... 1) the patch command didn't like the empty files, so now they have a single newline in them. 2) because of a typo, javadoc-all on the trunk is getting lucking and finding the download clustering jars only if they've already been copied to the example/lib ... since the patch stops the copying to example/lib i fixed the topy so it finds them in lib/downloads
          Hide
          Hoss Man added a comment -

          should we reference the libs in contrib (where they are actually checked in) rather than dist? "ant example" doesn't currently copy anything to dist... and avoiding another copy of those libraries would be nice too.

          FYI: We already copy them to ./dist/ as part of dist-contrib, and "ant example" already depends on dist-contrib so nothing changed there.

          Ultimately it's a question of how we want to solve SOLR-1433 ... this patch assumes we keep including the full ./dist structure that we have now for all the jars in our artifacts 9and SOLR-1433 could be solved w/o changing that by exclusing the source locations of those libs), but if the solution to SOLR-1433 is to not copy libs to ./dist at all then it's trivial to change the <lib> refs in the solrconfig.xml files to point at the original locations.

          Show
          Hoss Man added a comment - should we reference the libs in contrib (where they are actually checked in) rather than dist? "ant example" doesn't currently copy anything to dist... and avoiding another copy of those libraries would be nice too. FYI: We already copy them to ./dist/ as part of dist-contrib, and "ant example" already depends on dist-contrib so nothing changed there. Ultimately it's a question of how we want to solve SOLR-1433 ... this patch assumes we keep including the full ./dist structure that we have now for all the jars in our artifacts 9and SOLR-1433 could be solved w/o changing that by exclusing the source locations of those libs), but if the solution to SOLR-1433 is to not copy libs to ./dist at all then it's trivial to change the <lib> refs in the solrconfig.xml files to point at the original locations.
          Hide
          Hoss Man added a comment -

          Quick updated to match trunk (based on the decisions made in SOLR-1433)

          I just noticed that there are some jars copied into example-DIH - i remember looking at this before and seeing that the hsqldb jar was actaully committed into that directory because it's only needed to run the exampl, not to compile, but someone i missed noticing that there are two other jars that are copied at build time ... so we could probably replace those with lib references too. (i'll try to post another update later today)

          Show
          Hoss Man added a comment - Quick updated to match trunk (based on the decisions made in SOLR-1433 ) I just noticed that there are some jars copied into example-DIH - i remember looking at this before and seeing that the hsqldb jar was actaully committed into that directory because it's only needed to run the exampl, not to compile, but someone i missed noticing that there are two other jars that are copied at build time ... so we could probably replace those with lib references too. (i'll try to post another update later today)
          Hide
          Hoss Man added a comment -

          updated to deal with example-DIH as well

          (i verified that SolrResourceLoader logs that it found all of the jars, but i didn't actually test the IMAP importing – i don't have an IMAP account to try it on)

          Also made a minor addition to the comments in example/solr/conf/solrconfig.xml to make it clear that the entire filename must match the regex (per the semantics of Macther.matches())

          Show
          Hoss Man added a comment - updated to deal with example-DIH as well (i verified that SolrResourceLoader logs that it found all of the jars, but i didn't actually test the IMAP importing – i don't have an IMAP account to try it on) Also made a minor addition to the comments in example/solr/conf/solrconfig.xml to make it clear that the entire filename must match the regex (per the semantics of Macther.matches())
          Hide
          Noble Paul added a comment -

          I'm back from vacation and I realize that we have gone a long way in this issue.

          There is an issue SOLR-919 which enables re-use of SolrConfig object. To make it work we will have to make SolrConfig agnostic of SolrCore. So, the best solution is to make SolrCore add the jars to SolrResourceLoader. So SolrConfig can just return a data structure and SolrCore can decide how to add the jars to the path

          Show
          Noble Paul added a comment - I'm back from vacation and I realize that we have gone a long way in this issue. There is an issue SOLR-919 which enables re-use of SolrConfig object. To make it work we will have to make SolrConfig agnostic of SolrCore. So, the best solution is to make SolrCore add the jars to SolrResourceLoader. So SolrConfig can just return a data structure and SolrCore can decide how to add the jars to the path
          Hide
          Hoss Man added a comment -

          Noble: if you have any specific suggested improvements for the patch that you think will make eventual work on SOLR-919 easier, then i'm all ears – but i'd rather not try to code something in anticipation of a future feature that doesn't even have a stub patch or fleshed out design yet.

          Even if we don't make any of the changes suggested in this issue, some very complicated questions are going to have to be answered in SOLR-919 – like: should two cores sharing a SolrConfig object and an instanceDir (but with different data dir propertes) share the same resourceloader/classloader, or should they have unique classloaders containing different copies of the same jars from ./lib. Those types of questions aren't things we should be attempting to answer in this issue – but they will ultimately need to be addressed in SOLR-919, and they will need to be addressed regardless of what we decide to do here.

          This patch doesn't change any of the internal public APIs, it just adds support for some new syntax in the config file, so if SolrConfig.initLibs needs to be refactored later to support SOLR-919 that will be totally fine.

          Show
          Hoss Man added a comment - Noble: if you have any specific suggested improvements for the patch that you think will make eventual work on SOLR-919 easier, then i'm all ears – but i'd rather not try to code something in anticipation of a future feature that doesn't even have a stub patch or fleshed out design yet. Even if we don't make any of the changes suggested in this issue, some very complicated questions are going to have to be answered in SOLR-919 – like: should two cores sharing a SolrConfig object and an instanceDir (but with different data dir propertes) share the same resourceloader/classloader, or should they have unique classloaders containing different copies of the same jars from ./lib. Those types of questions aren't things we should be attempting to answer in this issue – but they will ultimately need to be addressed in SOLR-919 , and they will need to be addressed regardless of what we decide to do here. This patch doesn't change any of the internal public APIs, it just adds support for some new syntax in the config file, so if SolrConfig.initLibs needs to be refactored later to support SOLR-919 that will be totally fine.
          Hide
          Noble Paul added a comment -

          the idea is to have separate resourceloaders for each core (they are lightweight) but comomn schema/config objects. SOLR-1293 includes the changes required for this.

          Show
          Noble Paul added a comment - the idea is to have separate resourceloaders for each core (they are lightweight) but comomn schema/config objects. SOLR-1293 includes the changes required for this.
          Hide
          Hoss Man added a comment -

          As i said: these aren't questions that we should be attempting to answer here - they are about the impacts of potential future features.

          This patch works against the trunk today ... SolrConfig currently maintains a reference to a SolrResourceLoader which it may have instantiated itself in it's constructor, or it may have received it as a constructor argument – either way this patch ensures that that SolrResourceLoader gets a consistent classloader based on the Configs that are parsed without needing to make any "initLibs" or "getLibs" style functions public or changing the contract of initializing SolrConfig/SolrResourceLoader. Even if someone is doing funky old school embedded Solr code where they construct their own SolrConfig objects the <lib> config options will still work.

          If SOLR-919 or SOLR-1293 require refactoring things so that a SolrConfig instance can be reused with different classloaders that's going to require eliminating some public constructors and Refactoring the contract of when/how a SolrResourceLoader is initialized – when we do that we can worry about refactoring the code in this patch.

          Show
          Hoss Man added a comment - As i said: these aren't questions that we should be attempting to answer here - they are about the impacts of potential future features. This patch works against the trunk today ... SolrConfig currently maintains a reference to a SolrResourceLoader which it may have instantiated itself in it's constructor, or it may have received it as a constructor argument – either way this patch ensures that that SolrResourceLoader gets a consistent classloader based on the Configs that are parsed without needing to make any "initLibs" or "getLibs" style functions public or changing the contract of initializing SolrConfig/SolrResourceLoader. Even if someone is doing funky old school embedded Solr code where they construct their own SolrConfig objects the <lib> config options will still work. If SOLR-919 or SOLR-1293 require refactoring things so that a SolrConfig instance can be reused with different classloaders that's going to require eliminating some public constructors and Refactoring the contract of when/how a SolrResourceLoader is initialized – when we do that we can worry about refactoring the code in this patch.
          Hide
          Shalin Shekhar Mangar added a comment -

          Hoss, Noble has been refactoring SolrConfig in SOLR-1198 with the end goal being pluggable loading of SolrConfig. This makes SOLR-919 and SOLR-1293 easier. But it is also about how we want SolrPlugins to be loaded. Ideally they'd keep configuration in SolrConfig but the actual loading will happen in SolrCore or somewhere else outside.

          A large part of work for SOLR-1198 has been done in 1.4. This patch also marked for 1.4 goes against that example by modifying the resource loader. So we need to ask now what is the function of SolrConfig? Do we want it to load/modify SolrResourceLoader objects or just be a representation of configuration?

          Show
          Shalin Shekhar Mangar added a comment - Hoss, Noble has been refactoring SolrConfig in SOLR-1198 with the end goal being pluggable loading of SolrConfig. This makes SOLR-919 and SOLR-1293 easier. But it is also about how we want SolrPlugins to be loaded. Ideally they'd keep configuration in SolrConfig but the actual loading will happen in SolrCore or somewhere else outside. A large part of work for SOLR-1198 has been done in 1.4. This patch also marked for 1.4 goes against that example by modifying the resource loader. So we need to ask now what is the function of SolrConfig? Do we want it to load/modify SolrResourceLoader objects or just be a representation of configuration?
          Hide
          Hoss Man added a comment -

          Noble has been refactoring SolrConfig in SOLR-1198 with the end goal being pluggable loading of SolrConfig

          I understand Noble's goal(s) .. i'm actually really stoked to be moving in that direction and look forward to people being able to provide/reuse their own SolrConfig implementations which may not use XML at all...

          So we need to ask now what is the function of SolrConfig? Do we want it to load/modify SolrResourceLoader objects or just be a representation of configuration?

          I disagree that we need to ask that question "now", in this issue ... that is an important question for other issues (which you and Noble have already referenced). Those issues are going to face tough decisions about how to deal with the existing contracts of Solr config like all the methods it inherits from Config (including getResourceLoader(), and openResource()) or the fact that all of it's public constructors are expected to initialize a new SolrResourceLoader.

          When the time comes to make those tough decisions, we will have to make non-back-compat changes to the contracts of SolrConfig and SolrResourceLoader, and document a 'new' contract for how to properly initialize them. THAT is the appropriate time (in my opinion) to worry about how we should refactoring all the SolrResourceLoader initialization code – but today, on the trunk, SolrConfig is still responsible for initializing SolrResourceLoader in many cases. The patch as i wrote it ensures that no matter how a SolrResourceLoader is instantiated, or who instantiates it, if a SolrConfig instance is told to use it, then it gets a class loader based on the options in that SolrConfig.

          This is the simplest possible patch I could think of to make the change desired. I deliberately avoided making any public API changes to SolrCOnfig or SolrResourceLoader because i didn't want to commit to who/how the ClassLoader would get those changes – so in the future (when we have to make/advertise big API changes to SolrConfig to eliminate direct references to SOlrResourceLoader anyway) it will be easy to refactor it (the logic in initLibs) to someplace else and document exactly how the SolrResourceLoader should be initialized cleanly.

          At this point, i personally have no interest in trying to "guess" what the right decisions will be down the road, nor am i interested in writing a more complicated patch that would commit us to that guess with modifications SolrConfig's public API. As i said in a previous comment...

          if you have any specific suggested improvements for the patch that you think will make eventual work on SOLR-919 easier, then i'm all ears

          ...but to be more explicit: attach an alternate patch, and i'll review it.

          Show
          Hoss Man added a comment - Noble has been refactoring SolrConfig in SOLR-1198 with the end goal being pluggable loading of SolrConfig I understand Noble's goal(s) .. i'm actually really stoked to be moving in that direction and look forward to people being able to provide/reuse their own SolrConfig implementations which may not use XML at all... So we need to ask now what is the function of SolrConfig? Do we want it to load/modify SolrResourceLoader objects or just be a representation of configuration? I disagree that we need to ask that question "now", in this issue ... that is an important question for other issues (which you and Noble have already referenced). Those issues are going to face tough decisions about how to deal with the existing contracts of Solr config like all the methods it inherits from Config (including getResourceLoader(), and openResource()) or the fact that all of it's public constructors are expected to initialize a new SolrResourceLoader. When the time comes to make those tough decisions, we will have to make non-back-compat changes to the contracts of SolrConfig and SolrResourceLoader, and document a 'new' contract for how to properly initialize them. THAT is the appropriate time (in my opinion) to worry about how we should refactoring all the SolrResourceLoader initialization code – but today, on the trunk, SolrConfig is still responsible for initializing SolrResourceLoader in many cases. The patch as i wrote it ensures that no matter how a SolrResourceLoader is instantiated, or who instantiates it, if a SolrConfig instance is told to use it, then it gets a class loader based on the options in that SolrConfig. This is the simplest possible patch I could think of to make the change desired. I deliberately avoided making any public API changes to SolrCOnfig or SolrResourceLoader because i didn't want to commit to who/how the ClassLoader would get those changes – so in the future (when we have to make/advertise big API changes to SolrConfig to eliminate direct references to SOlrResourceLoader anyway) it will be easy to refactor it (the logic in initLibs) to someplace else and document exactly how the SolrResourceLoader should be initialized cleanly. At this point, i personally have no interest in trying to "guess" what the right decisions will be down the road, nor am i interested in writing a more complicated patch that would commit us to that guess with modifications SolrConfig's public API. As i said in a previous comment... if you have any specific suggested improvements for the patch that you think will make eventual work on SOLR-919 easier, then i'm all ears ...but to be more explicit: attach an alternate patch, and i'll review it.
          Hide
          Mark Miller added a comment -

          I haven't had a chance to do any testing yet, but just took some time reviewing the patch.

          1. Should the change in clustering/example/conf/schema.xml be part of this patch?

          Here is a no credit update to the patch that fixes a few small things:

          2. example README formatting
          3. Fix spelling in SolrResourceLoader.addToClassLaoder method
          4. Fix liib spelling in build.xml

          A couple thoughts:

          5. Did you consider just extending URLClassLoader and opening up addUrl rather than replacing the loader? Almost seems nicer - no biggie though.
          6. What about solr.xml's sharedlib setting? Should it support the same syntax options as the solrconfig lib setting ?

          Hopefully I can do some testing too.

          Show
          Mark Miller added a comment - I haven't had a chance to do any testing yet, but just took some time reviewing the patch. 1. Should the change in clustering/example/conf/schema.xml be part of this patch? Here is a no credit update to the patch that fixes a few small things: 2. example README formatting 3. Fix spelling in SolrResourceLoader.addToClassLaoder method 4. Fix liib spelling in build.xml A couple thoughts: 5. Did you consider just extending URLClassLoader and opening up addUrl rather than replacing the loader? Almost seems nicer - no biggie though. 6. What about solr.xml's sharedlib setting? Should it support the same syntax options as the solrconfig lib setting ? Hopefully I can do some testing too.
          Hide
          Mark Miller added a comment -

          RE: 5

          hmm .. I guess you'd prob have to add some synchronization that wouldn't be worth it - forget that.

          Show
          Mark Miller added a comment - RE: 5 hmm .. I guess you'd prob have to add some synchronization that wouldn't be worth it - forget that.
          Hide
          Hoss Man added a comment - - edited

          1. Should the change in clustering/example/conf/schema.xml be part of this patch?

          ...probably not, i totally forgot i had to make that fix to get the example working. i'll try to commit that seperately tonight.

          5. Did you consider just extending URLClassLoader and opening up addUrl rather than replacing the loader? Almost seems nicer - no biggie though.

          I went down the road, but i ran into a problem – i don't remember the specifics, i was probably doing something stupid, but in trying to figure it out i noticed the caveat about using addURL...

             "The classes that are loaded are by default granted permission only 
              to access the URLs specified when the URLClassLoader was created.
          

          ...and figured it was safer just to go this way and document that you should only modify the classloader prior to using the ResourceLoader.

          6. What about solr.xml's sharedlib setting? Should it support the same syntax options as the

          solrconfig lib setting ?

          we can always add the syntax to solr.xml as well – but that gets more challanging because of the persistence options, so i punted on it for now.

          Show
          Hoss Man added a comment - - edited 1. Should the change in clustering/example/conf/schema.xml be part of this patch? ...probably not, i totally forgot i had to make that fix to get the example working. i'll try to commit that seperately tonight. 5. Did you consider just extending URLClassLoader and opening up addUrl rather than replacing the loader? Almost seems nicer - no biggie though. I went down the road, but i ran into a problem – i don't remember the specifics, i was probably doing something stupid, but in trying to figure it out i noticed the caveat about using addURL... "The classes that are loaded are by default granted permission only to access the URLs specified when the URLClassLoader was created. ...and figured it was safer just to go this way and document that you should only modify the classloader prior to using the ResourceLoader. 6. What about solr.xml's sharedlib setting? Should it support the same syntax options as the solrconfig lib setting ? we can always add the syntax to solr.xml as well – but that gets more challanging because of the persistence options, so i punted on it for now.
          Hide
          Noble Paul added a comment -

          Mark the latest patch does not seem to compile.
          Anyway I propose a simple suggestion

          add a new method to SolrConfig

          public List<String> getPaths()
          

          in the SolrResourceLoader we may just need an extra method addPath(String path)

          i guess this should be good enough

          Show
          Noble Paul added a comment - Mark the latest patch does not seem to compile. Anyway I propose a simple suggestion add a new method to SolrConfig public List< String > getPaths() in the SolrResourceLoader we may just need an extra method addPath(String path) i guess this should be good enough
          Hide
          Mark Miller added a comment -

          Mark the latest patch does not seem to compile.

          Right - my fault - since my changes are just cosmetic (and the delete liib doesn't really break anything), I didn't test.

          SolrConfig.java still has this method misspelling: getResourceLoader().addToClassLaoder(path);

          Works after that - I won't attach a new patch though - its just cosmetic fixes. Use Hoss' - he can easily integrate those 3 changes (2 misspellings and some space formatting).

          Show
          Mark Miller added a comment - Mark the latest patch does not seem to compile. Right - my fault - since my changes are just cosmetic (and the delete liib doesn't really break anything), I didn't test. SolrConfig.java still has this method misspelling: getResourceLoader().addToClassLaoder(path); Works after that - I won't attach a new patch though - its just cosmetic fixes. Use Hoss' - he can easily integrate those 3 changes (2 misspellings and some space formatting).
          Hide
          Hoss Man added a comment -

          updated to trunk (schema.xml change already committed) plus the fixes miller pointed out

          Show
          Hoss Man added a comment - updated to trunk (schema.xml change already committed) plus the fixes miller pointed out
          Hide
          Hoss Man added a comment -

          add a new method to SolrConfig ... public List<String> getPaths() .. in the SolrResourceLoader we may just need an extra method addPath(String path)

          I feel like we keep going around in circles on this, i don't know how to express myself any better...

          1) there should never be a public addPath(String) method on SOlrResourceLoader ... the ClassLoader should be initialized completely before the SolrResourceLoader is used, or really wacky inconsistent classloader behavior can result.

          2) ideally SolrResourceLoader should require that all URLs for the ClassLoader be specified in it's constructor, so we can keep the ClassLoader final – but given the existing COnstructors for SOlrCOnfig that's impossible (sometimes the SolrConfig constructs the SolrResourceLoader, other times the caller constructs the SolrResourceLoader, and then passes it as an argument to the SolrConfig constructor) so in this patch i focused on keeping the changes as minimal as possible, and not altering the public API.

          3) Because of these multiple ways the SolrResourceLoader can be initialized, the SolrCOnfig constructor is in the best position to ensure that the ClassLoader URLs are added as early as possible.

          4) We could add a "getPaths() method to SolrConfig – but for back-compatibility we would still need to set those paths on the SolrResourceLoader in the SolrConfig constructor for embedded SOlr users people who might be initializing things in a different order then we do in CoreContainer.

          5) someday, in order to achieve all of the various SolrConfig simplification/reuse goals that have been discussed, we are going to need to make non-back-compat changes to eliminate all references to SolrResourceLoader from SolrConfig – including eliminating all of the current public constructors from SolrConfig (because they construct a SolrResourceLoader) and then, since everyone will have to change how the instantiate SolrCOnfig objects anyway, it will be easy to change the semantics of initializing a SOlrConfig+SolrCore+SolrResourceLoader so that they make more sense...

             SolrCofig config = new SolrConfig(...)
             SolrResourceLoader loader = new SolrResourceLoader(cofig.getLibPaths(), ...)
             ...
             SolrCore = new SolrCore(config, loader, ...)
          

          ...and when we reach that point it will be trivial to refactor the code added by this patch so that it works that way, because at this point it is all nicely self contained in package protected methods.

          ...

          All of this reasoning is why i keep saying: it's a waste of time to worry about how this code will impact reusable SolrCOnfig options in the future since it's all non-public and can be refactored at any time.

          If someone wants to take my patch and refactor it now then be my guest ... but unless you plan on massively increasing the scope to completely sever the direct connection between SolrCOnfig and SOlrResourceLoader i can't imagine how adding SolrCOnfig.getPaths() and making someone besides SOlrConfig responsible for passing those paths to SolrResourceLoader wlll result in a patch that would be any cleaner or easier to maintain moving forward. ... but i'll be happy to get proven wrong.

          Show
          Hoss Man added a comment - add a new method to SolrConfig ... public List<String> getPaths() .. in the SolrResourceLoader we may just need an extra method addPath(String path) I feel like we keep going around in circles on this, i don't know how to express myself any better... 1) there should never be a public addPath(String) method on SOlrResourceLoader ... the ClassLoader should be initialized completely before the SolrResourceLoader is used, or really wacky inconsistent classloader behavior can result. 2) ideally SolrResourceLoader should require that all URLs for the ClassLoader be specified in it's constructor, so we can keep the ClassLoader final – but given the existing COnstructors for SOlrCOnfig that's impossible (sometimes the SolrConfig constructs the SolrResourceLoader, other times the caller constructs the SolrResourceLoader, and then passes it as an argument to the SolrConfig constructor) so in this patch i focused on keeping the changes as minimal as possible, and not altering the public API. 3) Because of these multiple ways the SolrResourceLoader can be initialized, the SolrCOnfig constructor is in the best position to ensure that the ClassLoader URLs are added as early as possible. 4) We could add a "getPaths() method to SolrConfig – but for back-compatibility we would still need to set those paths on the SolrResourceLoader in the SolrConfig constructor for embedded SOlr users people who might be initializing things in a different order then we do in CoreContainer. 5) someday, in order to achieve all of the various SolrConfig simplification/reuse goals that have been discussed, we are going to need to make non-back-compat changes to eliminate all references to SolrResourceLoader from SolrConfig – including eliminating all of the current public constructors from SolrConfig (because they construct a SolrResourceLoader) and then, since everyone will have to change how the instantiate SolrCOnfig objects anyway, it will be easy to change the semantics of initializing a SOlrConfig+SolrCore+SolrResourceLoader so that they make more sense... SolrCofig config = new SolrConfig(...) SolrResourceLoader loader = new SolrResourceLoader(cofig.getLibPaths(), ...) ... SolrCore = new SolrCore(config, loader, ...) ...and when we reach that point it will be trivial to refactor the code added by this patch so that it works that way, because at this point it is all nicely self contained in package protected methods. ... All of this reasoning is why i keep saying: it's a waste of time to worry about how this code will impact reusable SolrCOnfig options in the future since it's all non-public and can be refactored at any time. If someone wants to take my patch and refactor it now then be my guest ... but unless you plan on massively increasing the scope to completely sever the direct connection between SolrCOnfig and SOlrResourceLoader i can't imagine how adding SolrCOnfig.getPaths() and making someone besides SOlrConfig responsible for passing those paths to SolrResourceLoader wlll result in a patch that would be any cleaner or easier to maintain moving forward. ... but i'll be happy to get proven wrong.
          Hide
          Noble Paul added a comment -

          Actually I was planning to commit SOLR-919 in 1.4 itself. My current plan is to make the changes as soon as t 1.4 is released. So the refactoring is going to happen 'now' .

          Let me take a look at the patch and see how we can make this 'future proof'

          Show
          Noble Paul added a comment - Actually I was planning to commit SOLR-919 in 1.4 itself. My current plan is to make the changes as soon as t 1.4 is released. So the refactoring is going to happen 'now' . Let me take a look at the patch and see how we can make this 'future proof'
          Hide
          Noble Paul added a comment -

          a simpler patch. SolrConfig is agnostic of SolrResourceLoader

          Show
          Noble Paul added a comment - a simpler patch. SolrConfig is agnostic of SolrResourceLoader
          Hide
          Hoss Man added a comment -

          My current plan is to make the changes as soon as t 1.4 is released. So the refactoring is going to happen 'now' .

          "now" is pre 1.4 ... whatever changes we anticipate making after 1.4 are "later" that was a big part of my point: we can easily refactor all of this after 1.4 is released, we can even do it an hour after 1.4 is released, but i was trying to minimize the number of changes needed to make this work prior to 1.4.

          a simpler patch. SolrConfig is agnostic of SolrResourceLoader

          How can you even remotely attempt to make that claim? Nothing in your patch does anything to make SolrConfig agnostic of SolrResourceLoader – SolrConfig still constructs, maintains a refrence to, and provides the a public getter for the SolrResourceLoader.

          To summarize the main differnces i can see between your patch and mine:

          • you move the FileFiltering out of SolrResourceLoader - i have no opinion about that, as i said it would be easy to refactor, i don't really care where that logic is
          • you've add a protected getPaths method to SolrConfig which exposes a NamedList of FileFilter objects – this feels very hackish to me, not just because the method name is very vague, and not just because it uses NamedList in an odd way, but because it changes made to the contents of the directory at runtime would be returned by anyone attempting to use these FileFilters later, but that won't accurately express the state of the paths when the SolrResourceLoader (that SolrConfig still has a refrence to) was initialized. Since it's not certain if this is really what should hapen if/when people re-use a SolrConfig, this may have to change anyway, and illustrates why i felt like it was better not to try and get ahead of ourselves now.
          • You've moved the responsibility of modifying the classpath (and completeling the initialization of SolrResourceLoader) to SolrCore...

          ...this last change demonstrates the exact bug i explained i was trying to avoid: An embedded Solr user who constructs their own SolrConfig object will then have an incomplete SolrResourceLoader (with a Classpath that hasn't been fully intialized yet) which they might use prior to passing that SolrConfig object to hte constructor of SolrCore.

          This may be an esoteric example, and not likely to cause problems for anyone in practice – but why risk introducing a bug at all when it's EASIER to make it work for everyone by letting SolrConfig be responsible for initializing it?

          Nothing in your patch is going to save work down the road when it comes time to actually decouple SolrConfig from SolrResourceLoader ... all you've done here is make it harder to use SolrConifg today. At somepoint, to reuse SolrConfigs, we're going to need to break the existing public SolrConfig constructors, and force legacy embedded Solr users to change the way they construct their SolrCore + SolrConfig + SolrResourceLoader, so why not wait until then to worry about this bullshit so we can fix SolrResourceLoader initialization the right way by making people construct a SolrConfig before constructing a SolrResourceLoader and passing the jar paths in to the SolrResourceLoader constructor?!


          Honestly though, it's a waste of time to keep arguing about any of this now, because it's really just a moot fucking issue at this point.

          I posted this patch almost 3 weeks ago, hoping I could get at least one other person to test it within a few days, so I could commit it and we could smoke test it on the trunk for at least a week or two before starting the release process for 1.4. But at this point only Noble and Miller have acknowledged reading the patch – miller hasn't acknowledged running/testing it, and noble and I can't agree on how it should be implemented. Grant's goal (last I heard) was to start the release early next week, and even if we all magically agreed on what the best patch looked like, and one or two people stepped up and said they tested it, I still wouldn't feel comfortable commiting a change to existing functionality like this so close to the release date.

          I think we need to postpone this to 1.5.

          Show
          Hoss Man added a comment - My current plan is to make the changes as soon as t 1.4 is released. So the refactoring is going to happen 'now' . "now" is pre 1.4 ... whatever changes we anticipate making after 1.4 are "later" that was a big part of my point: we can easily refactor all of this after 1.4 is released, we can even do it an hour after 1.4 is released, but i was trying to minimize the number of changes needed to make this work prior to 1.4. a simpler patch. SolrConfig is agnostic of SolrResourceLoader How can you even remotely attempt to make that claim? Nothing in your patch does anything to make SolrConfig agnostic of SolrResourceLoader – SolrConfig still constructs, maintains a refrence to, and provides the a public getter for the SolrResourceLoader. To summarize the main differnces i can see between your patch and mine: you move the FileFiltering out of SolrResourceLoader - i have no opinion about that, as i said it would be easy to refactor, i don't really care where that logic is you've add a protected getPaths method to SolrConfig which exposes a NamedList of FileFilter objects – this feels very hackish to me, not just because the method name is very vague, and not just because it uses NamedList in an odd way, but because it changes made to the contents of the directory at runtime would be returned by anyone attempting to use these FileFilters later, but that won't accurately express the state of the paths when the SolrResourceLoader (that SolrConfig still has a refrence to) was initialized. Since it's not certain if this is really what should hapen if/when people re-use a SolrConfig, this may have to change anyway, and illustrates why i felt like it was better not to try and get ahead of ourselves now. You've moved the responsibility of modifying the classpath (and completeling the initialization of SolrResourceLoader) to SolrCore... ...this last change demonstrates the exact bug i explained i was trying to avoid: An embedded Solr user who constructs their own SolrConfig object will then have an incomplete SolrResourceLoader (with a Classpath that hasn't been fully intialized yet) which they might use prior to passing that SolrConfig object to hte constructor of SolrCore. This may be an esoteric example, and not likely to cause problems for anyone in practice – but why risk introducing a bug at all when it's EASIER to make it work for everyone by letting SolrConfig be responsible for initializing it? Nothing in your patch is going to save work down the road when it comes time to actually decouple SolrConfig from SolrResourceLoader ... all you've done here is make it harder to use SolrConifg today. At somepoint, to reuse SolrConfigs, we're going to need to break the existing public SolrConfig constructors, and force legacy embedded Solr users to change the way they construct their SolrCore + SolrConfig + SolrResourceLoader, so why not wait until then to worry about this bullshit so we can fix SolrResourceLoader initialization the right way by making people construct a SolrConfig before constructing a SolrResourceLoader and passing the jar paths in to the SolrResourceLoader constructor?! Honestly though, it's a waste of time to keep arguing about any of this now, because it's really just a moot fucking issue at this point. I posted this patch almost 3 weeks ago, hoping I could get at least one other person to test it within a few days, so I could commit it and we could smoke test it on the trunk for at least a week or two before starting the release process for 1.4. But at this point only Noble and Miller have acknowledged reading the patch – miller hasn't acknowledged running/testing it, and noble and I can't agree on how it should be implemented. Grant's goal (last I heard) was to start the release early next week, and even if we all magically agreed on what the best patch looked like, and one or two people stepped up and said they tested it, I still wouldn't feel comfortable commiting a change to existing functionality like this so close to the release date. I think we need to postpone this to 1.5.
          Hide
          Hoss Man added a comment -

          Moving to 1.5 due to lack of both testing and consensus on approach.

          Show
          Hoss Man added a comment - Moving to 1.5 due to lack of both testing and consensus on approach.
          Hide
          Yonik Seeley added a comment -

          I'm not quite giving up on this yet... Grant said he'd cut an RC on monday (that's an informal RC I believe... not the one we would VOTE on)... we'd still have another week before the VOTE to find problems. Add 3 days and you get Oct 22nd.

          I'd still like to get this in 1.4 if possible since it can reduce our download from 90M to 60M. It would also help avoid possible accusations of "bloat".

          I didn't previously thoroughly review this because Hoss & Mark were both on it... and now Noble too.
          But now I'm going to start re-reading this issue to see if we are truly at an impasse.

          Show
          Yonik Seeley added a comment - I'm not quite giving up on this yet... Grant said he'd cut an RC on monday (that's an informal RC I believe... not the one we would VOTE on)... we'd still have another week before the VOTE to find problems. Add 3 days and you get Oct 22nd. I'd still like to get this in 1.4 if possible since it can reduce our download from 90M to 60M. It would also help avoid possible accusations of "bloat". I didn't previously thoroughly review this because Hoss & Mark were both on it... and now Noble too. But now I'm going to start re-reading this issue to see if we are truly at an impasse.
          Hide
          Yonik Seeley added a comment -

          OK, at this point, I've only been investigating how SOLR-919 relates to this issue:

          • There are enough issues that it doesn't look like SOLR-919 would be a quick commit after 1.4 is released
            1. SolrConfig is currently a mix of mutable / immutable - for example CacheConfig contains cumulative stats (in addition to the resource loader of course)
            2. It's not clear to me that you want different resource loaders for each shared SolrConfig - people would definitely want to avoid loading more than
            one copy of dictionary based stemmers such as kstemmer or smart_cn. They also may want true singleton classes, which means a single class loader for their class. Also, if one shares IndexSchema objects, etc, but doesn't share the resource loader, isn't it weird that for a core, some of the objects will not be from it's ResourceLoader? Starts smelling like "classloader hell" to me...
            3. There is a lot of middle ground between using the exact same SolrConfig objects and reparsing XML. For example, clone type functionallity could be used to either create a new config from an existing one, or wrap a common core. Immutable stuff could be shallowly copied, while mutable stuff could have their own per-core instance.
          • It doesn't seem like adding <lib> entries to solrconfig.xml will cause future problems for SOLR-919 above and beyond what already exist
          • It seems logical that configuration should be able to affect where/how libraries are loaded.

          Noble, if you agree with the last two points, do you have further objections to the last patch that Hoss put up?
          Does anyone else have objections to committing this if it's properly reviewed by more people?

          If no objections, I'll move on to actually reviewing and testing this patch with the goal of committing it Monday morning. In fact I'll start soon regardless because of the limited time and the async nature of our communication.

          Show
          Yonik Seeley added a comment - OK, at this point, I've only been investigating how SOLR-919 relates to this issue: There are enough issues that it doesn't look like SOLR-919 would be a quick commit after 1.4 is released 1. SolrConfig is currently a mix of mutable / immutable - for example CacheConfig contains cumulative stats (in addition to the resource loader of course) 2. It's not clear to me that you want different resource loaders for each shared SolrConfig - people would definitely want to avoid loading more than one copy of dictionary based stemmers such as kstemmer or smart_cn. They also may want true singleton classes, which means a single class loader for their class. Also, if one shares IndexSchema objects, etc, but doesn't share the resource loader, isn't it weird that for a core, some of the objects will not be from it's ResourceLoader? Starts smelling like "classloader hell" to me... 3. There is a lot of middle ground between using the exact same SolrConfig objects and reparsing XML. For example, clone type functionallity could be used to either create a new config from an existing one, or wrap a common core. Immutable stuff could be shallowly copied, while mutable stuff could have their own per-core instance. It doesn't seem like adding <lib> entries to solrconfig.xml will cause future problems for SOLR-919 above and beyond what already exist It seems logical that configuration should be able to affect where/how libraries are loaded. Noble, if you agree with the last two points, do you have further objections to the last patch that Hoss put up? Does anyone else have objections to committing this if it's properly reviewed by more people? If no objections, I'll move on to actually reviewing and testing this patch with the goal of committing it Monday morning. In fact I'll start soon regardless because of the limited time and the async nature of our communication.
          Hide
          Yonik Seeley added a comment -

          Updated patch to trunk.

          Show
          Yonik Seeley added a comment - Updated patch to trunk.
          Hide
          Yonik Seeley added a comment -

          I've been testing this, including multi-core setups that continuously reolad the cores to check for memory/classloader leaks. Everything looks good.
          Unless there are objections, I'll commit this monday morning, in time for the first release candidate.

          Show
          Yonik Seeley added a comment - I've been testing this, including multi-core setups that continuously reolad the cores to check for memory/classloader leaks. Everything looks good. Unless there are objections, I'll commit this monday morning, in time for the first release candidate.
          Hide
          Noble Paul added a comment -

          It doesn't seem like adding <lib> entries to solrconfig.xml will cause future problems for SOLR-919 above and beyond what already exist

          I agree with the configuration . Though the regex seems to be an overkill .I am yet to see why it would be useful. Moreover , it leaves ambiguity as to what are the jars which got loaded .

          It seems logical that configuration should be able to affect where/how libraries are loaded.

          yes configuration should have a say of the libraries. But the configuration should be a data structure. I see it as a source of information to the system. This will enable users plugin there own SolrConfigs implementations (loaded from external data sources zookeeper etc ).

          It's not clear to me that you want different resource loaders for each shared SolrConfig - people would definitely want to avoid loading more than one copy of dictionary based stemmers such as kstemmer or smart_cn.

          SolrResourceLoader is a lightweight component. There is no need to couple it with the fact whether SolrConfig is shared or not.

          My patch is slightly awkward, But that was introduced because of the regex option. Otherwise it would have been simpler. But ,overall , my patch introduced fewer changes .

          SolrConfig is currently a mix of mutable / immutable - for example CacheConfig contains cumulative stats (in addition to the resource loader of course)

          SolrConfig holds a reference to the SolrResourceLoader, but it never uses it anywhere internally. The CacheConfig is an anomaly. But that can be rectified .

          Show
          Noble Paul added a comment - It doesn't seem like adding <lib> entries to solrconfig.xml will cause future problems for SOLR-919 above and beyond what already exist I agree with the configuration . Though the regex seems to be an overkill .I am yet to see why it would be useful. Moreover , it leaves ambiguity as to what are the jars which got loaded . It seems logical that configuration should be able to affect where/how libraries are loaded. yes configuration should have a say of the libraries. But the configuration should be a data structure. I see it as a source of information to the system. This will enable users plugin there own SolrConfigs implementations (loaded from external data sources zookeeper etc ). It's not clear to me that you want different resource loaders for each shared SolrConfig - people would definitely want to avoid loading more than one copy of dictionary based stemmers such as kstemmer or smart_cn. SolrResourceLoader is a lightweight component. There is no need to couple it with the fact whether SolrConfig is shared or not. My patch is slightly awkward, But that was introduced because of the regex option. Otherwise it would have been simpler. But ,overall , my patch introduced fewer changes . SolrConfig is currently a mix of mutable / immutable - for example CacheConfig contains cumulative stats (in addition to the resource loader of course) SolrConfig holds a reference to the SolrResourceLoader, but it never uses it anywhere internally. The CacheConfig is an anomaly. But that can be rectified .
          Hide
          Noble Paul added a comment -

          Anyway , I do not wish to delay the 1.4 release by wringing my hand on the specifics of the implementation

          Show
          Noble Paul added a comment - Anyway , I do not wish to delay the 1.4 release by wringing my hand on the specifics of the implementation
          Hide
          Yonik Seeley added a comment -

          Though the regex seems to be an overkill

          Sure... but regex is built into Java and globbing is not. Comments in the code say we should implement globbing in the future.
          And since the regex is specified via the "regex" attribute (as opposed to the value of an XML element) it's perfectly extensible in the future.

          the configuration should be a data structure. I see it as a source of information to the system.

          Configuration can also have behavior. The important part is that is the separation from XML and allowing others to create configuration objects.

          SolrResourceLoader is a lightweight component. There is no need to couple it with the fact whether SolrConfig is shared or not.

          Just bringing up that it's not lightweight if has a separate classloader that loads heavyweight things, and separate classloaders don't sound like they will work well with cached Schemas... seems like they at least need a common classloader parent for anything loaded by the schema.

          But ,overall , my patch introduced fewer changes .

          That's not true though... Hoss pointed out the risk for embedded users. And it seems like you were bit by this yourself - see CoreContainer.create()... the Schema is created before the SolrCore is, and thus would not have been using the right classloader.

          SolrConfig holds a reference to the SolrResourceLoader, but it never uses it anywhere internally.

          See above - the schema uses the loader from SolrConfig.

          Show
          Yonik Seeley added a comment - Though the regex seems to be an overkill Sure... but regex is built into Java and globbing is not. Comments in the code say we should implement globbing in the future. And since the regex is specified via the "regex" attribute (as opposed to the value of an XML element) it's perfectly extensible in the future. the configuration should be a data structure. I see it as a source of information to the system. Configuration can also have behavior. The important part is that is the separation from XML and allowing others to create configuration objects. SolrResourceLoader is a lightweight component. There is no need to couple it with the fact whether SolrConfig is shared or not. Just bringing up that it's not lightweight if has a separate classloader that loads heavyweight things, and separate classloaders don't sound like they will work well with cached Schemas... seems like they at least need a common classloader parent for anything loaded by the schema. But ,overall , my patch introduced fewer changes . That's not true though... Hoss pointed out the risk for embedded users. And it seems like you were bit by this yourself - see CoreContainer.create()... the Schema is created before the SolrCore is, and thus would not have been using the right classloader. SolrConfig holds a reference to the SolrResourceLoader, but it never uses it anywhere internally. See above - the schema uses the loader from SolrConfig.
          Hide
          Yonik Seeley added a comment -

          I've committed this since it does not change any public APIs, but only adds the <lib> elements to solrconfig.xml, and does not preclude changing to the approach that Noble took in the future.

          Show
          Yonik Seeley added a comment - I've committed this since it does not change any public APIs, but only adds the <lib> elements to solrconfig.xml, and does not preclude changing to the approach that Noble took in the future.
          Hide
          Grant Ingersoll added a comment -

          Bulk close for Solr 1.4

          Show
          Grant Ingersoll added a comment - Bulk close for Solr 1.4

            People

            • Assignee:
              Hoss Man
              Reporter:
              Hoss Man
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development