Details

    • Type: Wish Wish
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 4.0
    • Fix Version/s: 4.2, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      Solr gives the ability to its users to select the postings format to use on a per-field basis but only Lucene40PostingsFormat is available by default (unless users add lucene-codecs to the Solr lib directory). Maybe we should add lucene-codecs to Solr libs (I mean in the WAR file) so that people can try our non-default postings formats with minimum effort?

      1. SOLR-3843.patch
        5 kB
        Steve Rowe
      2. SOLR-3843.patch
        5 kB
        Steve Rowe
      3. SOLR-3843.patch
        4 kB
        Robert Muir

        Issue Links

          Activity

          Hide
          Uwe Schindler added a comment -

          -1, they should simply put them into $solr_home/lib where all other plugins are. We donÄt want to bloat the WAR file. Solr has support for Lucene's SPI loaded from SolrResourceLoader.

          Show
          Uwe Schindler added a comment - -1, they should simply put them into $solr_home/lib where all other plugins are. We donÄt want to bloat the WAR file. Solr has support for Lucene's SPI loaded from SolrResourceLoader.
          Hide
          Uwe Schindler added a comment -

          Just to add: If somebody wants to try out codecs, he will be for sure able to add the JAR file to his solr_home. We should maybe only add this to a wiki page.

          Show
          Uwe Schindler added a comment - Just to add: If somebody wants to try out codecs, he will be for sure able to add the JAR file to his solr_home. We should maybe only add this to a wiki page.
          Hide
          Robert Muir added a comment -

          Also I had to turn off per-field codec support by default anyway because Solr keeps the IndexWriter open across core reloads (SOLR-3610).

          Someone must turn it on explicitly by setting their codec factory to SchemaCodecFactory in solrconfig.xml (realizing there are tradeoffs).
          Same thing goes with Similarity.

          Analyzer was fixed by changing solr to always pass the newest Analyzer as a param add/updateDocument (so its not really set in the IWConfig),
          but the general problem still exists.

          Show
          Robert Muir added a comment - Also I had to turn off per-field codec support by default anyway because Solr keeps the IndexWriter open across core reloads ( SOLR-3610 ). Someone must turn it on explicitly by setting their codec factory to SchemaCodecFactory in solrconfig.xml (realizing there are tradeoffs). Same thing goes with Similarity. Analyzer was fixed by changing solr to always pass the newest Analyzer as a param add/updateDocument (so its not really set in the IWConfig), but the general problem still exists.
          Hide
          Adrien Grand added a comment -

          Thanks Uwe and Robert for these precisions. I added some documentation on Solr wiki:

          Show
          Adrien Grand added a comment - Thanks Uwe and Robert for these precisions. I added some documentation on Solr wiki: http://wiki.apache.org/solr/SchemaXml#Data_Types http://wiki.apache.org/solr/SolrConfigXml#codecFactory
          Hide
          Mark Miller added a comment -

          Also I had to turn off per-field codec support by default anyway because Solr keeps the IndexWriter open across core reloads (SOLR-3610).

          We should probably consider that again. Some of my initial work around this area when this first came up was not really up to dealing with it well. Opening a new IndexWriter was kind of a hackey operation for replication. Things have changed though, and open a new IndexWriter should be first class now. I think it's probably fine to reopen it on core reloads.

          Show
          Mark Miller added a comment - Also I had to turn off per-field codec support by default anyway because Solr keeps the IndexWriter open across core reloads ( SOLR-3610 ). We should probably consider that again. Some of my initial work around this area when this first came up was not really up to dealing with it well. Opening a new IndexWriter was kind of a hackey operation for replication. Things have changed though, and open a new IndexWriter should be first class now. I think it's probably fine to reopen it on core reloads.
          Hide
          Robert Muir added a comment -

          +1. My approach so far was to disable this (currently expert) stuff because of the problems you get if you add new fields to the schema and reload. But it seems bad to not allow anything passed to IndexWriter to interact with IndexSchema: if we can do a better job we can make things easier.

          Show
          Robert Muir added a comment - +1. My approach so far was to disable this (currently expert) stuff because of the problems you get if you add new fields to the schema and reload. But it seems bad to not allow anything passed to IndexWriter to interact with IndexSchema: if we can do a better job we can make things easier.
          Hide
          Yonik Seeley added a comment -

          Reopening.

          Core codecs and Solr should just work w/o requiring users to copy any jar files around.

          Show
          Yonik Seeley added a comment - Reopening. Core codecs and Solr should just work w/o requiring users to copy any jar files around.
          Hide
          Erik Hatcher added a comment -

          Core codecs and Solr should just work w/o requiring users to copy any jar files around.

          But if we can just put a <lib> in solrconfig that points to it in the example configuration moving forward, does that address this?

          Show
          Erik Hatcher added a comment - Core codecs and Solr should just work w/o requiring users to copy any jar files around. But if we can just put a <lib> in solrconfig that points to it in the example configuration moving forward, does that address this?
          Hide
          Mark Miller added a comment -

          I had forgotten about this issue - I think we can fix the problem with iw and core reload easily now - I think we should start including these codes and add that fix.

          Show
          Mark Miller added a comment - I had forgotten about this issue - I think we can fix the problem with iw and core reload easily now - I think we should start including these codes and add that fix.
          Hide
          Yonik Seeley added a comment -

          As far as size, the lucene codecs jar is only 278K. It seems pretty "core" really, and should be included by default.

          Show
          Yonik Seeley added a comment - As far as size, the lucene codecs jar is only 278K. It seems pretty "core" really, and should be included by default.
          Hide
          Robert Muir added a comment -

          I had forgotten about this issue - I think we can fix the problem with iw and core reload easily now - I think we should start including these codes and add that fix.

          Maybe we can open a separate issue for this? Ideally it would also fix the same trap for similarity too: a really good thing if we can solve it.

          As far as codecs.jar, I want to point out additionally that its very strange the solr-test-framework.jar is shipped in binary releases (and it depends on this codecs jar), but the codecs jar isnt anywhere in the binary package. So that means the solr-test-framework.jar is really unusable in the current packaging.

          Show
          Robert Muir added a comment - I had forgotten about this issue - I think we can fix the problem with iw and core reload easily now - I think we should start including these codes and add that fix. Maybe we can open a separate issue for this? Ideally it would also fix the same trap for similarity too: a really good thing if we can solve it. As far as codecs.jar, I want to point out additionally that its very strange the solr-test-framework.jar is shipped in binary releases (and it depends on this codecs jar), but the codecs jar isnt anywhere in the binary package. So that means the solr-test-framework.jar is really unusable in the current packaging.
          Hide
          Mark Miller added a comment -

          Maybe we can open a separate issue for this?

          Yeah, I'll open one.

          Show
          Mark Miller added a comment - Maybe we can open a separate issue for this? Yeah, I'll open one.
          Hide
          Mark Miller added a comment -
          Show
          Mark Miller added a comment - SOLR-4417
          Hide
          Mark Miller added a comment -

          I've committed an initial attempt at SOLR-4417

          Show
          Mark Miller added a comment - I've committed an initial attempt at SOLR-4417
          Hide
          Robert Muir added a comment -

          Here's the start to a patch (I havent tested the build with it or looked at maven and so on).

          This adds the codecs jar and enables SchemaCodecFactory by default: so the format for postings lists and docvalues can be customized easily in the fieldtype.

          I didnt want to turn this factory on by default because of SOLR-4417, but Mark fixed that.

          Show
          Robert Muir added a comment - Here's the start to a patch (I havent tested the build with it or looked at maven and so on). This adds the codecs jar and enables SchemaCodecFactory by default: so the format for postings lists and docvalues can be customized easily in the fieldtype. I didnt want to turn this factory on by default because of SOLR-4417 , but Mark fixed that.
          Hide
          Robert Muir added a comment -

          Smoketesting passes with this patch. But i am not sure if anything should/needs to be changed in maven.

          Show
          Robert Muir added a comment - Smoketesting passes with this patch. But i am not sure if anything should/needs to be changed in maven.
          Hide
          Steve Rowe added a comment -

          Smoketesting passes with this patch. But i am not sure if anything should/needs to be changed in maven.

          The attached patch is Robert's with the addition of a dependency from the Solr webapp module on the lucene-codecs jar. With this change, when the war is built by Maven, the lucene-codecs jar is put in the same place as when the war is built by the Ant build: under WEB-INF/lib/.

          Show
          Steve Rowe added a comment - Smoketesting passes with this patch. But i am not sure if anything should/needs to be changed in maven. The attached patch is Robert's with the addition of a dependency from the Solr webapp module on the lucene-codecs jar. With this change, when the war is built by Maven, the lucene-codecs jar is put in the same place as when the war is built by the Ant build: under WEB-INF/lib/.
          Hide
          Robert Muir added a comment -

          Thanks Steve: I was actually (and still am i think) uncertain who should have the dependency.

          If you think about it, its no different than the analysis module cases: but i don't see the webapp depending on them here.

          At the moment, i understand the reasoning behind the hard dependency to analysis-common.jar (because bogusly the factory stuff is there, imo it should not be).

          But somewhere in maven, something in solr depends on the other analysis modules it bundles (e.g. analyzers-phonetic), yet you could remove this jar and solr would work fine (as long as you didnt use these particular phonetic analyzers).

          So I feel like these analysis components (except common, see above), along with codecs.jar, should be depended on in the same place. I guess theoretically they are optional dependencies but I don't think we should do that (unless we test every possibility with/without optional X,Y,Z, so I think its a bad idea). But they are the same in this sense.

          Show
          Robert Muir added a comment - Thanks Steve: I was actually (and still am i think) uncertain who should have the dependency. If you think about it, its no different than the analysis module cases: but i don't see the webapp depending on them here. At the moment, i understand the reasoning behind the hard dependency to analysis-common.jar (because bogusly the factory stuff is there, imo it should not be). But somewhere in maven, something in solr depends on the other analysis modules it bundles (e.g. analyzers-phonetic), yet you could remove this jar and solr would work fine (as long as you didnt use these particular phonetic analyzers). So I feel like these analysis components (except common, see above), along with codecs.jar, should be depended on in the same place. I guess theoretically they are optional dependencies but I don't think we should do that (unless we test every possibility with/without optional X,Y,Z, so I think its a bad idea). But they are the same in this sense.
          Hide
          Steve Rowe added a comment -

          In the Maven build, it's the solr core module that depends on these analysis modules. Here's the output from mvn dependency:tree in maven-build/solr/webapp/:

          [INFO] --- maven-dependency-plugin:2.4:tree (default-cli) @ solr ---
          [INFO] org.apache.solr:solr:war:5.0-SNAPSHOT
          [INFO] +- org.apache.solr:solr-core:jar:5.0-SNAPSHOT:compile
          [INFO] |  +- org.apache.lucene:lucene-core:jar:5.0-SNAPSHOT:compile
          [INFO] |  +- org.apache.lucene:lucene-analyzers-common:jar:5.0-SNAPSHOT:compile
          [INFO] |  +- org.apache.lucene:lucene-analyzers-kuromoji:jar:5.0-SNAPSHOT:compile
          [INFO] |  +- org.apache.lucene:lucene-analyzers-morfologik:jar:5.0-SNAPSHOT:compile
          [INFO] |  |  \- org.carrot2:morfologik-polish:jar:1.5.5:compile
          [INFO] |  |     \- org.carrot2:morfologik-stemming:jar:1.5.5:compile
          [INFO] |  |        \- org.carrot2:morfologik-fsa:jar:1.5.5:compile
          [INFO] |  +- org.apache.lucene:lucene-analyzers-phonetic:jar:5.0-SNAPSHOT:compile
          [INFO] |  +- org.apache.lucene:lucene-highlighter:jar:5.0-SNAPSHOT:compile
          [INFO] |  +- org.apache.lucene:lucene-memory:jar:5.0-SNAPSHOT:compile
          [INFO] |  +- org.apache.lucene:lucene-misc:jar:5.0-SNAPSHOT:compile
          [INFO] |  +- org.apache.lucene:lucene-queryparser:jar:5.0-SNAPSHOT:compile
          [INFO] |  +- org.apache.lucene:lucene-spatial:jar:5.0-SNAPSHOT:compile
          [INFO] |  |  \- com.spatial4j:spatial4j:jar:0.3:compile
          [INFO] |  +- org.apache.lucene:lucene-suggest:jar:5.0-SNAPSHOT:compile
          [INFO] |  +- org.apache.lucene:lucene-grouping:jar:5.0-SNAPSHOT:compile
          [INFO] |  +- org.apache.lucene:lucene-queries:jar:5.0-SNAPSHOT:compile
          [INFO] |  +- commons-codec:commons-codec:jar:1.7:compile
          [INFO] |  +- commons-cli:commons-cli:jar:1.2:compile
          [INFO] |  +- commons-fileupload:commons-fileupload:jar:1.2.1:compile
          [INFO] |  +- commons-io:commons-io:jar:2.1:compile
          [INFO] |  +- commons-lang:commons-lang:jar:2.6:compile
          [INFO] |  +- com.google.guava:guava:jar:13.0.1:compile
          [INFO] |  +- org.codehaus.woodstox:wstx-asl:jar:3.2.7:runtime
          [INFO] |  +- org.apache.httpcomponents:httpclient:jar:4.2.3:compile
          [INFO] |  |  \- org.apache.httpcomponents:httpcore:jar:4.2.2:compile
          [INFO] |  \- org.apache.httpcomponents:httpmime:jar:4.2.3:compile
          [INFO] +- org.apache.solr:solr-solrj:jar:5.0-SNAPSHOT:compile
          [INFO] |  \- org.apache.zookeeper:zookeeper:jar:3.4.5:compile
          [INFO] +- org.apache.lucene:lucene-codecs:jar:5.0-SNAPSHOT:compile
          [INFO] +- org.eclipse.jetty.orbit:javax.servlet:jar:3.0.0.v201112011016:provided
          [INFO] +- org.slf4j:slf4j-jdk14:jar:1.6.4:runtime (scope not updated to compile)
          [INFO] +- org.slf4j:jcl-over-slf4j:jar:1.6.4:compile
          [INFO] +- org.slf4j:slf4j-api:jar:1.6.4:compile
          [INFO] \- junit:junit:jar:4.10:test
          

          This parallels the Ant build: these analyzer jars are included in the "solr.lucene.libs" path, which is included in "solr.base.classpath".

          I put the lucene-codecs dependency on the solr webapp module rather than the solr core module because all non-test compilation succeeds without lucene-codecs. (The lucene-test-framework pulls lucene-codecs into all Solr test classpaths.) And this issue is about packaging of the war: adding the dependency to the webapp module fixes exactly the problem.

          Show
          Steve Rowe added a comment - In the Maven build, it's the solr core module that depends on these analysis modules. Here's the output from mvn dependency:tree in maven-build/solr/webapp/ : [INFO] --- maven-dependency-plugin:2.4:tree (default-cli) @ solr --- [INFO] org.apache.solr:solr:war:5.0-SNAPSHOT [INFO] +- org.apache.solr:solr-core:jar:5.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-core:jar:5.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-analyzers-common:jar:5.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-analyzers-kuromoji:jar:5.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-analyzers-morfologik:jar:5.0-SNAPSHOT:compile [INFO] | | \- org.carrot2:morfologik-polish:jar:1.5.5:compile [INFO] | | \- org.carrot2:morfologik-stemming:jar:1.5.5:compile [INFO] | | \- org.carrot2:morfologik-fsa:jar:1.5.5:compile [INFO] | +- org.apache.lucene:lucene-analyzers-phonetic:jar:5.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-highlighter:jar:5.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-memory:jar:5.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-misc:jar:5.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-queryparser:jar:5.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-spatial:jar:5.0-SNAPSHOT:compile [INFO] | | \- com.spatial4j:spatial4j:jar:0.3:compile [INFO] | +- org.apache.lucene:lucene-suggest:jar:5.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-grouping:jar:5.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-queries:jar:5.0-SNAPSHOT:compile [INFO] | +- commons-codec:commons-codec:jar:1.7:compile [INFO] | +- commons-cli:commons-cli:jar:1.2:compile [INFO] | +- commons-fileupload:commons-fileupload:jar:1.2.1:compile [INFO] | +- commons-io:commons-io:jar:2.1:compile [INFO] | +- commons-lang:commons-lang:jar:2.6:compile [INFO] | +- com.google.guava:guava:jar:13.0.1:compile [INFO] | +- org.codehaus.woodstox:wstx-asl:jar:3.2.7:runtime [INFO] | +- org.apache.httpcomponents:httpclient:jar:4.2.3:compile [INFO] | | \- org.apache.httpcomponents:httpcore:jar:4.2.2:compile [INFO] | \- org.apache.httpcomponents:httpmime:jar:4.2.3:compile [INFO] +- org.apache.solr:solr-solrj:jar:5.0-SNAPSHOT:compile [INFO] | \- org.apache.zookeeper:zookeeper:jar:3.4.5:compile [INFO] +- org.apache.lucene:lucene-codecs:jar:5.0-SNAPSHOT:compile [INFO] +- org.eclipse.jetty.orbit:javax.servlet:jar:3.0.0.v201112011016:provided [INFO] +- org.slf4j:slf4j-jdk14:jar:1.6.4:runtime (scope not updated to compile) [INFO] +- org.slf4j:jcl-over-slf4j:jar:1.6.4:compile [INFO] +- org.slf4j:slf4j-api:jar:1.6.4:compile [INFO] \- junit:junit:jar:4.10:test This parallels the Ant build: these analyzer jars are included in the "solr.lucene.libs" path, which is included in "solr.base.classpath". I put the lucene-codecs dependency on the solr webapp module rather than the solr core module because all non-test compilation succeeds without lucene-codecs . (The lucene-test-framework pulls lucene-codecs into all Solr test classpaths.) And this issue is about packaging of the war: adding the dependency to the webapp module fixes exactly the problem.
          Hide
          Robert Muir added a comment -

          I put the lucene-codecs dependency on the solr webapp module rather than the solr core module because all non-test compilation succeeds without lucene-codecs. (The lucene-test-framework pulls lucene-codecs into all Solr test classpaths.) And this issue is about packaging of the war: adding the dependency to the webapp module fixes exactly the problem.

          But it would also succeed without analyzers-phonetic. How are they any different?

          Show
          Robert Muir added a comment - I put the lucene-codecs dependency on the solr webapp module rather than the solr core module because all non-test compilation succeeds without lucene-codecs. (The lucene-test-framework pulls lucene-codecs into all Solr test classpaths.) And this issue is about packaging of the war: adding the dependency to the webapp module fixes exactly the problem. But it would also succeed without analyzers-phonetic. How are they any different?
          Hide
          Steve Rowe added a comment -

          But it would also succeed without analyzers-phonetic. How are they any different?

          They're not.

          I think the Ant build should change here: the solr compilation classpath shouldn't have things on it that aren't required for compilation. (This goes for the analysis module dependencies in the Maven build too, of course.)

          Is there a place where (optional) runtime dependencies are added to the stuff that goes into the war? I haven't looked at this in a while.

          Show
          Steve Rowe added a comment - But it would also succeed without analyzers-phonetic. How are they any different? They're not. I think the Ant build should change here: the solr compilation classpath shouldn't have things on it that aren't required for compilation. (This goes for the analysis module dependencies in the Maven build too, of course.) Is there a place where (optional) runtime dependencies are added to the stuff that goes into the war? I haven't looked at this in a while.
          Hide
          Robert Muir added a comment -

          I dont think the ant build makes any distinction here.

          But yeah there is probably bigger issue / better way to go about it, someting like:

          • solr core etc should only have the minimal dependencies
          • tests using the solr example should somehow be in webapp/test or something.
          • webapp depends on these modules like phonetic and codecs.
          • the fact that lucene-test-framework brings in codecs anyway is an impl detail

          I guess for now I was just looking at us doing things consistently. Even if we are consistently wrong

          Show
          Robert Muir added a comment - I dont think the ant build makes any distinction here. But yeah there is probably bigger issue / better way to go about it, someting like: solr core etc should only have the minimal dependencies tests using the solr example should somehow be in webapp/test or something. webapp depends on these modules like phonetic and codecs. the fact that lucene-test-framework brings in codecs anyway is an impl detail I guess for now I was just looking at us doing things consistently. Even if we are consistently wrong
          Hide
          Steve Rowe added a comment - - edited

          I guess for now I was just looking at us doing things consistently. Even if we are consistently wrong

          Right, makes sense - in this case the consistent thing to do is to make the solr-core module, rather than the webapp module, depend on the lucene-codecs jar in the Maven build. The attached patch does this.

          Show
          Steve Rowe added a comment - - edited I guess for now I was just looking at us doing things consistently. Even if we are consistently wrong Right, makes sense - in this case the consistent thing to do is to make the solr-core module, rather than the webapp module, depend on the lucene-codecs jar in the Maven build. The attached patch does this.
          Hide
          Robert Muir added a comment -

          +1

          I do think we should make a separate issue/discussion to refactor the tests/dependencies (in both ant and maven), but I think we should move forward with this for 4.2

          Show
          Robert Muir added a comment - +1 I do think we should make a separate issue/discussion to refactor the tests/dependencies (in both ant and maven), but I think we should move forward with this for 4.2
          Hide
          Michael McCandless added a comment -

          +1 to add lucene-codecs to Solr: Lucene has a number of useful codec components, growing over time ... I think we should make it as easy as possible for users to access these from Solr.

          Show
          Michael McCandless added a comment - +1 to add lucene-codecs to Solr: Lucene has a number of useful codec components, growing over time ... I think we should make it as easy as possible for users to access these from Solr.
          Hide
          Uwe Schindler added a comment -

          -1 To add it to the war. Its so easy to add analyzer JAR files to the solr/lib folder, same applies to codecs.

          If this DV codec is so important for facetting and sorting and nuking FieldCache, move it to lucene-core.jar.

          Show
          Uwe Schindler added a comment - -1 To add it to the war. Its so easy to add analyzer JAR files to the solr/lib folder, same applies to codecs. If this DV codec is so important for facetting and sorting and nuking FieldCache, move it to lucene-core.jar.
          Hide
          Robert Muir added a comment -

          So Solr shouldnt bundle any analyzers either?

          I'm not trying to say that codecs need to be in core, man this is experimental stuff and I definitely dont want to increase our backwards compatibility requirements.

          I just want to make it easier for users to experiment.

          Show
          Robert Muir added a comment - So Solr shouldnt bundle any analyzers either? I'm not trying to say that codecs need to be in core, man this is experimental stuff and I definitely dont want to increase our backwards compatibility requirements. I just want to make it easier for users to experiment.
          Hide
          Uwe Schindler added a comment -

          If users want to experiment they just need to copypaste a file, where is the problem?

          In addition: The analysis-extras module is also not needed (except the special ICUField, which may more into a solr-icu module), as all analysis factories are already inside the analyzers jar. In my opinion, the Solr WAR file should only bundle analyzers-common.jar and nothing more. The analysis-extras build.xml file is the worst I have seen: It just copies some JAR files from Lucene to Solr.

          To make it easier for people, we can add a command that uses get/ivy to download the JAR file from Maven and install it in solr's lib folder. Optional stuff should not be in the WAR file.

          Show
          Uwe Schindler added a comment - If users want to experiment they just need to copypaste a file, where is the problem? In addition: The analysis-extras module is also not needed (except the special ICUField, which may more into a solr-icu module), as all analysis factories are already inside the analyzers jar. In my opinion, the Solr WAR file should only bundle analyzers-common.jar and nothing more. The analysis-extras build.xml file is the worst I have seen: It just copies some JAR files from Lucene to Solr. To make it easier for people, we can add a command that uses get/ivy to download the JAR file from Maven and install it in solr's lib folder. Optional stuff should not be in the WAR file.
          Hide
          Uwe Schindler added a comment -

          I could agree to add this for now, but once you committed this: Open a new issue to cleanup solr.war and remove all optional stuff (like analyzers-phonetic.jar). Instead add a internet downloader using ivy/maven for setup of solr/lib folder.

          Show
          Uwe Schindler added a comment - I could agree to add this for now, but once you committed this: Open a new issue to cleanup solr.war and remove all optional stuff (like analyzers-phonetic.jar). Instead add a internet downloader using ivy/maven for setup of solr/lib folder.
          Hide
          Mark Miller added a comment -

          If users want to experiment they just need to copypaste a file, where is the problem?

          That it's much easier to not have to copy past a file?

          At the sizes of the files involved, your just being a masochist

          Show
          Mark Miller added a comment - If users want to experiment they just need to copypaste a file, where is the problem? That it's much easier to not have to copy past a file? At the sizes of the files involved, your just being a masochist
          Hide
          Uwe Schindler added a comment -

          OK, then we can also remove the modules in Lucene completely! Let's just create a 8 MB lucene.jar file.

          We have modules to make this possible and let users start with a small installation without useless stuff they will never need... This is just my opinion, but to me it looks we can get rid of all modules, have one big build.xml, one big classpath and finally have only one big JAR file for Lucene and Solr - but that's real masochism, especially for projects like ES!

          Show
          Uwe Schindler added a comment - OK, then we can also remove the modules in Lucene completely! Let's just create a 8 MB lucene.jar file. We have modules to make this possible and let users start with a small installation without useless stuff they will never need... This is just my opinion, but to me it looks we can get rid of all modules, have one big build.xml, one big classpath and finally have only one big JAR file for Lucene and Solr - but that's real masochism, especially for projects like ES!
          Hide
          Robert Muir added a comment -

          OK, then we can also remove the modules in Lucene completely! Let's just create a 8 MB lucene.jar file.

          This would be a 20MB jar. If you included their dependencies so it actually functioned correctly, 43MB.

          Show
          Robert Muir added a comment - OK, then we can also remove the modules in Lucene completely! Let's just create a 8 MB lucene.jar file. This would be a 20MB jar. If you included their dependencies so it actually functioned correctly, 43MB.
          Hide
          Mark Miller added a comment -

          Modules in Lucene have little to do with Solr. We shouldn't make users work to save 300k in the webapp. This is super silly stuff...

          Show
          Mark Miller added a comment - Modules in Lucene have little to do with Solr. We shouldn't make users work to save 300k in the webapp. This is super silly stuff...
          Hide
          Uwe Schindler added a comment -

          Have you looked at ElasticSearch? Its very tiny (20 MB alltogether), no useless analyzers for every language on earth. If you need kumoroji, enter:

          bin/plugin -install elasticsearch/elasticsearch-analysis-kuromoji
          

          This downloads the plugin and installs it into the ES lib folder. This is how it should work, instead of one horrible huge war file.

          But it bundles lucene-codecs.jar, but that has another reason (I think it uses bloom, as far as I remember).

          Show
          Uwe Schindler added a comment - Have you looked at ElasticSearch? Its very tiny (20 MB alltogether), no useless analyzers for every language on earth. If you need kumoroji, enter: bin/plugin -install elasticsearch/elasticsearch-analysis-kuromoji This downloads the plugin and installs it into the ES lib folder. This is how it should work, instead of one horrible huge war file. But it bundles lucene-codecs.jar, but that has another reason (I think it uses bloom, as far as I remember).
          Hide
          Mark Miller added a comment -

          Your talking about something completely different - I'm talking about adding 300k to a webapp - sounds like you want to file a different JIRA issue that has little to do with that.

          In the modern day, I have no problem with the Solr dist - I'd much rather get everything simply as we do than have to stitch crap together. I have disk space and bandwidth as does the majority of the modern world now. If you are offering to write a package manager for solr for unix/windows/mac, please go ahead But until then, it makes no sense to not include the codecs the same way we do with analyzers and spellchecker and highlighter, and whatever else we need.

          If I had to run 10 commands to get solr, get spellchecking, get analyzers, get highlighing, get QueryParsers, get MoreLikeThis, etc, I would shoot myself.

          Show
          Mark Miller added a comment - Your talking about something completely different - I'm talking about adding 300k to a webapp - sounds like you want to file a different JIRA issue that has little to do with that. In the modern day, I have no problem with the Solr dist - I'd much rather get everything simply as we do than have to stitch crap together. I have disk space and bandwidth as does the majority of the modern world now. If you are offering to write a package manager for solr for unix/windows/mac, please go ahead But until then, it makes no sense to not include the codecs the same way we do with analyzers and spellchecker and highlighter, and whatever else we need. If I had to run 10 commands to get solr, get spellchecking, get analyzers, get highlighing, get QueryParsers, get MoreLikeThis, etc, I would shoot myself.
          Hide
          Uwe Schindler added a comment -

          Hi Mark,
          I cannot do anything against this, but I can still say, that I don't agree with you. That's all. Please respect my opinion.
          Uwe

          Show
          Uwe Schindler added a comment - Hi Mark, I cannot do anything against this, but I can still say, that I don't agree with you. That's all. Please respect my opinion. Uwe
          Hide
          Markus Jelsma added a comment -

          As user we already repack the war with the jars we need, including the codecs jar. But because the codecs jar can provide better performance on SolrCloud (bloom filter) i think 300k justifies adding it to a vanilla build.

          Show
          Markus Jelsma added a comment - As user we already repack the war with the jars we need, including the codecs jar. But because the codecs jar can provide better performance on SolrCloud (bloom filter) i think 300k justifies adding it to a vanilla build.
          Hide
          Mark Miller added a comment -

          Uwe,
          I do respect your opinion in the large sense of the phrase, but I don't agree with you about not adding the 300k jar. That is all

          Show
          Mark Miller added a comment - Uwe, I do respect your opinion in the large sense of the phrase, but I don't agree with you about not adding the 300k jar. That is all
          Hide
          Robert Muir added a comment -

          Have you looked at ElasticSearch? Its very tiny (20 MB alltogether), no useless analyzers for every language on earth. If you need kumoroji, enter:

          bin/plugin -install elasticsearch/elasticsearch-analysis-kuromoji

          This downloads the plugin and installs it into the ES lib folder. This is how it should work, instead of one horrible huge war file.

          But I'm not sure this is a good thing. I did some quick google searches and found:

          I think both search servers are good for the lucene ecosystem and its not my intent to stir up some battle about which is better.
          I'm guessing that you can access all of the lucene analyzers from either one, but the impression from packaging is that Solr
          is better.

          Lets not make this same mistake with codecs!

          Most users probably could care less about SPI etc (this is all implementation details). They do care about being able to
          search different languages and index their content with the appropriate data structures.

          I'm happy to open an issue to refactor our build and tests to internally reflect the fact that, using solr-core as a library for example, you dont technically need certain jars.

          But can we separate this from packaging, at least for now? It would be depressing to me to see articles like this that say solr has bad support for flexible indexing.

          Show
          Robert Muir added a comment - Have you looked at ElasticSearch? Its very tiny (20 MB alltogether), no useless analyzers for every language on earth. If you need kumoroji, enter: bin/plugin -install elasticsearch/elasticsearch-analysis-kuromoji This downloads the plugin and installs it into the ES lib folder. This is how it should work, instead of one horrible huge war file. But I'm not sure this is a good thing. I did some quick google searches and found: http://www.sentric.ch/blog/why-we-chose-solr-4-0-instead-of-elasticsearch "Better language support out of the box" http://blog.sematext.com/2012/09/04/solr-vs-elasticsearch-part-2-data-handling/ "Apache Solr 4.0 beta has the advantage over ElasticSearch because it can handle more languages out of the box" I think both search servers are good for the lucene ecosystem and its not my intent to stir up some battle about which is better. I'm guessing that you can access all of the lucene analyzers from either one, but the impression from packaging is that Solr is better. Lets not make this same mistake with codecs! Most users probably could care less about SPI etc (this is all implementation details). They do care about being able to search different languages and index their content with the appropriate data structures. I'm happy to open an issue to refactor our build and tests to internally reflect the fact that, using solr-core as a library for example, you dont technically need certain jars. But can we separate this from packaging , at least for now? It would be depressing to me to see articles like this that say solr has bad support for flexible indexing.
          Hide
          Robert Muir added a comment -

          I opened SOLR-4520 to clean up the dependencies.

          Show
          Robert Muir added a comment - I opened SOLR-4520 to clean up the dependencies.
          Hide
          Commit Tag Bot added a comment -

          [trunk commit] Robert Muir
          http://svn.apache.org/viewvc?view=revision&revision=1451542

          SOLR-3843: add lucene-codecs.jar

          Show
          Commit Tag Bot added a comment - [trunk commit] Robert Muir http://svn.apache.org/viewvc?view=revision&revision=1451542 SOLR-3843 : add lucene-codecs.jar
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Robert Muir
          http://svn.apache.org/viewvc?view=revision&revision=1451543

          SOLR-3843: add lucene-codecs.jar

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Robert Muir http://svn.apache.org/viewvc?view=revision&revision=1451543 SOLR-3843 : add lucene-codecs.jar
          Hide
          Uwe Schindler added a comment -

          Closed after release.

          Show
          Uwe Schindler added a comment - Closed after release.

            People

            • Assignee:
              Robert Muir
              Reporter:
              Adrien Grand
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development