Solr
  1. Solr
  2. SOLR-3623

inconsistent treatment of lucene jars & third-party deps in analysis-extras & uima (in war and in lucene-libs)

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-BETA, 6.0
    • Component/s: Build
    • Labels:
      None

      Description

      Various dependencies for contrib/analysis-extras are packaged contrib/analysis-extras/lucene-libs (along with instructions in contrib/analysis-extras/README.txt that users need to include them explicitly) even though these jars are already hardcoded into the solr war file.

      1. SOLR-3623.patch
        7 kB
        Hoss Man
      2. SOLR-3623.patch
        6 kB
        Hoss Man

        Issue Links

          Activity

          Hide
          Hoss Man added a comment -

          I can't reproduce any problem with the packaging of analysis-extras.

          What i did...

          1) downloaded & uncompressed the binary package of Solr 4.0.0-ALPHA

          2) noted the following info in contrib/analysis-extras/README.txt ...

          Relies upon the following lucene components (in lucene-libs/):
          
           * lucene-analyzers-icu-X.Y.jar
           * lucene-analyzers-smartcn-X.Y.jar
           * lucene-analyzers-stempel-X.Y.jar
           
          And the ICU library (in lib/):
          
           * icu4j-X.Y.jar
          

          3) noted the existence of the following jars...

          hossman@frisbee:~/tmp/solr-4.0.0-ALPHA-binary/apache-solr-4.0.0-ALPHA$ ls contrib/analysis-extras/lib/*.jar
          contrib/analysis-extras/lib/icu4j-4.8.1.1.jar
          hossman@frisbee:~/tmp/solr-4.0.0-ALPHA-binary/apache-solr-4.0.0-ALPHA$ ls contrib/analysis-extras/lucene-libs/*.jar
          contrib/analysis-extras/lucene-libs/lucene-analyzers-icu-4.0.0-ALPHA.jar
          contrib/analysis-extras/lucene-libs/lucene-analyzers-morfologik-4.0.0-ALPHA.jar
          contrib/analysis-extras/lucene-libs/lucene-analyzers-smartcn-4.0.0-ALPHA.jar
          contrib/analysis-extras/lucene-libs/lucene-analyzers-stempel-4.0.0-ALPHA.jar
          contrib/analysis-extras/lucene-libs/morfologik-fsa-1.5.2.jar
          contrib/analysis-extras/lucene-libs/morfologik-polish-1.5.2.jar
          contrib/analysis-extras/lucene-libs/morfologik-stemming-1.5.2.jar
          

          4) made the following additions to the example solrconfig.xml...

            <lib dir="../../contrib/analysis-extras/lib" regex=".*\.jar" />
            <lib dir="../../contrib/analysis-extras/lucene-libs" regex=".*\.jar" />
            <lib dir="../../dist/" regex="apache-solr-analysis-extras-\d.*\.jar" />
          

          5) made the following additions to the example schema.xml...

             <field name="icufield" type="icu" indexed="true" stored="true"/>
          ...
              <fieldType name="icu" class="solr.TextField">
                <analyzer>
          	<tokenizer class="solr.ICUTokenizerFactory"/>
                </analyzer>
              </fieldType>
          

          6) ran the example with "java -jar start.jar"

          7) observed no problems.

          Show
          Hoss Man added a comment - I can't reproduce any problem with the packaging of analysis-extras. What i did... 1) downloaded & uncompressed the binary package of Solr 4.0.0-ALPHA 2) noted the following info in contrib/analysis-extras/README.txt ... Relies upon the following lucene components (in lucene-libs/): * lucene-analyzers-icu-X.Y.jar * lucene-analyzers-smartcn-X.Y.jar * lucene-analyzers-stempel-X.Y.jar And the ICU library (in lib/): * icu4j-X.Y.jar 3) noted the existence of the following jars... hossman@frisbee:~/tmp/solr-4.0.0-ALPHA-binary/apache-solr-4.0.0-ALPHA$ ls contrib/analysis-extras/lib/*.jar contrib/analysis-extras/lib/icu4j-4.8.1.1.jar hossman@frisbee:~/tmp/solr-4.0.0-ALPHA-binary/apache-solr-4.0.0-ALPHA$ ls contrib/analysis-extras/lucene-libs/*.jar contrib/analysis-extras/lucene-libs/lucene-analyzers-icu-4.0.0-ALPHA.jar contrib/analysis-extras/lucene-libs/lucene-analyzers-morfologik-4.0.0-ALPHA.jar contrib/analysis-extras/lucene-libs/lucene-analyzers-smartcn-4.0.0-ALPHA.jar contrib/analysis-extras/lucene-libs/lucene-analyzers-stempel-4.0.0-ALPHA.jar contrib/analysis-extras/lucene-libs/morfologik-fsa-1.5.2.jar contrib/analysis-extras/lucene-libs/morfologik-polish-1.5.2.jar contrib/analysis-extras/lucene-libs/morfologik-stemming-1.5.2.jar 4) made the following additions to the example solrconfig.xml... <lib dir="../../contrib/analysis-extras/lib" regex=".*\.jar" /> <lib dir="../../contrib/analysis-extras/lucene-libs" regex=".*\.jar" /> <lib dir="../../dist/" regex="apache-solr-analysis-extras-\d.*\.jar" /> 5) made the following additions to the example schema.xml... <field name="icufield" type="icu" indexed="true" stored="true"/> ... <fieldType name="icu" class="solr.TextField"> <analyzer> <tokenizer class="solr.ICUTokenizerFactory"/> </analyzer> </fieldType> 6) ran the example with "java -jar start.jar" 7) observed no problems.
          Hide
          Lance Norskog added a comment -

          4.0.0.0-ALPHA is old news The build has been rearranged. Please retest against the trunk or 4.x branch. There is no contrib/xx/lucene-libs in the current trunk or 4.x.

          These two lines are the only <lib> directories in solrconfig.xml. This description of the classloader sequence suggests that the /browse VelocityWriter handler should throw exceptions. It works.

            <lib dir="../../../dist/" regex="apache-solr-velocity-\d.*\.jar" />
            <lib dir="../../../contrib/velocity/lib" regex=".*\.jar" />
          

          (Notice 3 dot-dot-slashes instead of two because the example changed to use collection1.)

          Show
          Lance Norskog added a comment - 4.0.0.0-ALPHA is old news The build has been rearranged. Please retest against the trunk or 4.x branch. There is no contrib/xx/lucene-libs in the current trunk or 4.x. These two lines are the only <lib> directories in solrconfig.xml. This description of the classloader sequence suggests that the /browse VelocityWriter handler should throw exceptions. It works. <lib dir= "../../../dist/" regex= "apache-solr-velocity-\d.*\.jar" /> <lib dir= "../../../contrib/velocity/lib" regex= ".*\.jar" /> (Notice 3 dot-dot-slashes instead of two because the example changed to use collection1.)
          Hide
          Lance Norskog added a comment -

          The contrib/analysis-extras libraries have three kinds of jars: Solr factories, lucene analyzer classes, and some have external dependencies. In trunk & 4.x the lucene-analyzers-*.jar files are copied into the war file. The apache-solr-analysis-extras jar and the dependent jars are not.

          Except for morfologik's dependent jars. lucene/analysis/module-build.xml defines a fileset property analyzers-morfologik.fileset. solr/contrib/analysis-extras/build.xml uses this property to copy the morfologik libraries into the war. And if fact a morfologik field type works if you include ../../../dist/apache-solr-analysis-extras-4.0-SNAPSHOT.jar and not the analysis-extras lib/ directory.

          The other analyzers packaged into analysis-extras are icu, phonetic, smartcn and stempel. icu and phonetic have dependent libraries. phonetic requires commons-codec which is already in solr/lib. icu requires icu4j which is not already added to solr. I have tried these two lines in both orders, and neither worked.

            <lib dir="../../../contrib/analysis-extras/lib" regex=".*\.jar" />
            <lib dir="../../../dist/" regex="apache-solr-analysis-extras-\d.*\.jar" />
          

          Classloading is broken in the trunk and 4.x.

          Show
          Lance Norskog added a comment - The contrib/analysis-extras libraries have three kinds of jars: Solr factories, lucene analyzer classes, and some have external dependencies. In trunk & 4.x the lucene-analyzers-*.jar files are copied into the war file. The apache-solr-analysis-extras jar and the dependent jars are not. Except for morfologik's dependent jars. lucene/analysis/module-build.xml defines a fileset property analyzers-morfologik.fileset . solr/contrib/analysis-extras/build.xml uses this property to copy the morfologik libraries into the war. And if fact a morfologik field type works if you include ../../../dist/apache-solr-analysis-extras-4.0-SNAPSHOT.jar and not the analysis-extras lib/ directory. The other analyzers packaged into analysis-extras are icu, phonetic, smartcn and stempel. icu and phonetic have dependent libraries. phonetic requires commons-codec which is already in solr/lib. icu requires icu4j which is not already added to solr. I have tried these two lines in both orders, and neither worked. <lib dir= "../../../contrib/analysis-extras/lib" regex= ".*\.jar" /> <lib dir= "../../../dist/" regex= "apache-solr-analysis-extras-\d.*\.jar" /> Classloading is broken in the trunk and 4.x.
          Hide
          Hoss Man added a comment - - edited

          There is no contrib/xx/lucene-libs in the current trunk or 4.x.

          They are created as part of the packaging process, they never exist in a source checkout.

          run "ant create-package" on the 4x branch, and then look inside the resulting package/apache-solr-4.0-SNAPSHOT.tgz...

          apache-solr-4.0-SNAPSHOT/contrib/analysis-extras/lib/icu4j-4.8.1.1.jar
          apache-solr-4.0-SNAPSHOT/contrib/analysis-extras/lucene-libs/lucene-analyzers-icu-4.0-SNAPSHOT.jar
          apache-solr-4.0-SNAPSHOT/contrib/analysis-extras/lucene-libs/lucene-analyzers-morfologik-4.0-SNAPSHOT.jar
          apache-solr-4.0-SNAPSHOT/contrib/analysis-extras/lucene-libs/lucene-analyzers-smartcn-4.0-SNAPSHOT.jar
          apache-solr-4.0-SNAPSHOT/contrib/analysis-extras/lucene-libs/lucene-analyzers-stempel-4.0-SNAPSHOT.jar
          apache-solr-4.0-SNAPSHOT/contrib/analysis-extras/lucene-libs/morfologik-fsa-1.5.3.jar
          apache-solr-4.0-SNAPSHOT/contrib/analysis-extras/lucene-libs/morfologik-polish-1.5.3.jar
          apache-solr-4.0-SNAPSHOT/contrib/analysis-extras/lucene-libs/morfologik-stemming-1.5.3.jar
          apache-solr-4.0-SNAPSHOT/dist/apache-solr-analysis-extras-4.0-SNAPSHOT.jar
          

          In trunk & 4.x the lucene-analyzers-*.jar files are copied into the war file. The apache-solr-analysis-extras jar and the dependent jars are not.

          That i can reproduce...

          WEB-INF/lib/lucene-analyzers-common-4.0-SNAPSHOT.jar
          WEB-INF/lib/lucene-analyzers-kuromoji-4.0-SNAPSHOT.jar
          WEB-INF/lib/lucene-analyzers-morfologik-4.0-SNAPSHOT.jar
          WEB-INF/lib/lucene-analyzers-phonetic-4.0-SNAPSHOT.jar
          WEB-INF/lib/morfologik-fsa-1.5.3.jar
          WEB-INF/lib/morfologik-polish-1.5.3.jar
          WEB-INF/lib/morfologik-stemming-1.5.3.jar
          

          ...and there is definitely an inconsistency here – if some of the lucene-analyzers-*.jar files are being packaged in the war, then we should not also be putting them in contrib/*/lucene-libs and instructing people that they are a dependency in the README for that contrib.

          It appears that icu4j-*.jar and lucene-analyzers-icu-*.jar are the only two external libs needed to make contrib/analysis-extras work, and we just need to update contrib/analysis-extras/README and contrib/analysis-extras/build.xml to reflect this.

          I added the following to the (packaged) example/solr/collection1/conf/solrconfig.xml...

            <lib dir="../../../contrib/analysis-extras/lib" regex=".*\.jar" />
            <lib dir="../../../contrib/analysis-extras/lucene-libs" regex="lucene-analyzers-icu.*\.jar" />
            <lib dir="../../../dist/" regex="apache-solr-analysis-extras-\d.*\.jar" />
          

          ...and the "icu" field type seemed to work fine.

          Show
          Hoss Man added a comment - - edited There is no contrib/xx/lucene-libs in the current trunk or 4.x. They are created as part of the packaging process, they never exist in a source checkout. run "ant create-package" on the 4x branch, and then look inside the resulting package/apache-solr-4.0-SNAPSHOT.tgz... apache-solr-4.0-SNAPSHOT/contrib/analysis-extras/lib/icu4j-4.8.1.1.jar apache-solr-4.0-SNAPSHOT/contrib/analysis-extras/lucene-libs/lucene-analyzers-icu-4.0-SNAPSHOT.jar apache-solr-4.0-SNAPSHOT/contrib/analysis-extras/lucene-libs/lucene-analyzers-morfologik-4.0-SNAPSHOT.jar apache-solr-4.0-SNAPSHOT/contrib/analysis-extras/lucene-libs/lucene-analyzers-smartcn-4.0-SNAPSHOT.jar apache-solr-4.0-SNAPSHOT/contrib/analysis-extras/lucene-libs/lucene-analyzers-stempel-4.0-SNAPSHOT.jar apache-solr-4.0-SNAPSHOT/contrib/analysis-extras/lucene-libs/morfologik-fsa-1.5.3.jar apache-solr-4.0-SNAPSHOT/contrib/analysis-extras/lucene-libs/morfologik-polish-1.5.3.jar apache-solr-4.0-SNAPSHOT/contrib/analysis-extras/lucene-libs/morfologik-stemming-1.5.3.jar apache-solr-4.0-SNAPSHOT/dist/apache-solr-analysis-extras-4.0-SNAPSHOT.jar In trunk & 4.x the lucene-analyzers-*.jar files are copied into the war file. The apache-solr-analysis-extras jar and the dependent jars are not. That i can reproduce... WEB-INF/lib/lucene-analyzers-common-4.0-SNAPSHOT.jar WEB-INF/lib/lucene-analyzers-kuromoji-4.0-SNAPSHOT.jar WEB-INF/lib/lucene-analyzers-morfologik-4.0-SNAPSHOT.jar WEB-INF/lib/lucene-analyzers-phonetic-4.0-SNAPSHOT.jar WEB-INF/lib/morfologik-fsa-1.5.3.jar WEB-INF/lib/morfologik-polish-1.5.3.jar WEB-INF/lib/morfologik-stemming-1.5.3.jar ...and there is definitely an inconsistency here – if some of the lucene-analyzers-*.jar files are being packaged in the war, then we should not also be putting them in contrib/*/lucene-libs and instructing people that they are a dependency in the README for that contrib. It appears that icu4j-*.jar and lucene-analyzers-icu-*.jar are the only two external libs needed to make contrib/analysis-extras work, and we just need to update contrib/analysis-extras/README and contrib/analysis-extras/build.xml to reflect this. I added the following to the (packaged) example/solr/collection1/conf/solrconfig.xml... <lib dir= "../../../contrib/analysis-extras/lib" regex= ".*\.jar" /> <lib dir= "../../../contrib/analysis-extras/lucene-libs" regex= "lucene-analyzers-icu.*\.jar" /> <lib dir= "../../../dist/" regex= "apache-solr-analysis-extras-\d.*\.jar" /> ...and the "icu" field type seemed to work fine.
          Hide
          Hoss Man added a comment -

          updating description

          Show
          Hoss Man added a comment - updating description
          Hide
          Robert Muir added a comment -

          I agree with Hossman.

          I saw this before, and forgot to do anything about it... sorry at some point the 4 morfologik jars moved into the war.
          But this is redundant, if we want them in the jar, we should move the factories out of any contrib.

          https://issues.apache.org/jira/browse/LUCENE-3977?focusedCommentId=13258480&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13258480

          The other jars (common, kuromoji, phonetic) are correct, their factories are in solr core.

          Show
          Robert Muir added a comment - I agree with Hossman. I saw this before, and forgot to do anything about it... sorry at some point the 4 morfologik jars moved into the war. But this is redundant, if we want them in the jar, we should move the factories out of any contrib. https://issues.apache.org/jira/browse/LUCENE-3977?focusedCommentId=13258480&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13258480 The other jars (common, kuromoji, phonetic) are correct, their factories are in solr core.
          Hide
          Hoss Man added a comment -

          Hmm .. ok, something wonky here i'm missing.

          I started by trying to do the following....

          svn mv solr/contrib/analysis-extras/src/java/org/apache/solr/analysis/MorfologikFilterFactory.java solr/core/src/java/org/apache/solr/analysis/
          svn mv solr/contrib/analysis-extras/src/java/org/apache/solr/analysis/SmartChineseSentenceTokenizerFactory.java solr/core/src/java/org/apache/solr/analysis/
          svn mv solr/contrib/analysis-extras/src/java/org/apache/solr/analysis/SmartChineseWordTokenFilterFactory.java solr/core/src/java/org/apache/solr/analysis/
          svn mv solr/contrib/analysis-extras/src/java/org/apache/solr/analysis/StempelPolishStemFilterFactory.java solr/core/src/java/org/apache/solr/analysis/
          svn mv solr/contrib/analysis-extras/src/test/org/apache/solr/analysis/TestMorfologikFilterFactory.java solr/core/src/test/org/apache/solr/analysis/
          svn mv solr/contrib/analysis-extras/src/test/org/apache/solr/analysis/TestSmartChineseFactories.java solr/core/src/test/org/apache/solr/analysis/
          cd solr/core
          ant test -Dtests.class=\*.analysis.\*
          

          ...my understanding being that the morfologik jars and their lucene counterparts should already be in solr core, so these solr classes and tests should be able to move over w/o any other changes. right?

          But this is causing all sorts of compilation failures related to not finding packages/classes like morfologik.stemming.PolishStemmer, org.apache.lucene.analysis.cn.smart.*, org.apache.lucene.analysis.stempel.*, etc...

          So clearly i'm missing something here in how these dependent jars and classpaths are setup (i haven't looked ath te build system closely since the ivy change) so i'll have to dig into this more later today.

          (posting this now in slim hope that sarowe or rmuir see it and say "oh, yeah - the thing you are overlooking is...")

          Show
          Hoss Man added a comment - Hmm .. ok, something wonky here i'm missing. I started by trying to do the following.... svn mv solr/contrib/analysis-extras/src/java/org/apache/solr/analysis/MorfologikFilterFactory.java solr/core/src/java/org/apache/solr/analysis/ svn mv solr/contrib/analysis-extras/src/java/org/apache/solr/analysis/SmartChineseSentenceTokenizerFactory.java solr/core/src/java/org/apache/solr/analysis/ svn mv solr/contrib/analysis-extras/src/java/org/apache/solr/analysis/SmartChineseWordTokenFilterFactory.java solr/core/src/java/org/apache/solr/analysis/ svn mv solr/contrib/analysis-extras/src/java/org/apache/solr/analysis/StempelPolishStemFilterFactory.java solr/core/src/java/org/apache/solr/analysis/ svn mv solr/contrib/analysis-extras/src/test/org/apache/solr/analysis/TestMorfologikFilterFactory.java solr/core/src/test/org/apache/solr/analysis/ svn mv solr/contrib/analysis-extras/src/test/org/apache/solr/analysis/TestSmartChineseFactories.java solr/core/src/test/org/apache/solr/analysis/ cd solr/core ant test -Dtests.class=\*.analysis.\* ...my understanding being that the morfologik jars and their lucene counterparts should already be in solr core, so these solr classes and tests should be able to move over w/o any other changes. right? But this is causing all sorts of compilation failures related to not finding packages/classes like morfologik.stemming.PolishStemmer, org.apache.lucene.analysis.cn.smart.*, org.apache.lucene.analysis.stempel.*, etc... So clearly i'm missing something here in how these dependent jars and classpaths are setup (i haven't looked ath te build system closely since the ivy change) so i'll have to dig into this more later today. (posting this now in slim hope that sarowe or rmuir see it and say "oh, yeah - the thing you are overlooking is...")
          Hide
          Robert Muir added a comment -

          a few things:

          • only the morfologik jars are misplaced. so there is no lucene-analyzers-smartcn etc in solr core.
          • the morfologik jars may be in the wrong place, but they are not configured in classpaths for solr core (e.g. solr.base.classpath)
          • a quick glance at that reveals other things in that classpath that shouldnt be in there like analyzers-uima.jar (which, should instead be configured in the uima contrib's classpath only)
          Show
          Robert Muir added a comment - a few things: only the morfologik jars are misplaced. so there is no lucene-analyzers-smartcn etc in solr core. the morfologik jars may be in the wrong place, but they are not configured in classpaths for solr core (e.g. solr.base.classpath) a quick glance at that reveals other things in that classpath that shouldnt be in there like analyzers-uima.jar (which, should instead be configured in the uima contrib's classpath only)
          Hide
          Hoss Man added a comment -

          rmuir: sorry, but i'm totally lost – too many unspoken assumptions in my comment, and some vagueness in your comments has me turned around...

          only the morfologik jars are misplaced. so there is no lucene-analyzers-smartcn etc in solr core.

          • can you be specific as to what you mean by "the morfologik jars" ? because evidently i don't have a clue what that means and i missunderstood what you were including in that list originally (i thought you ment all of the anlysis-extra libs that weren't ICU).
          • can you clarify what you think the "misplaced" location(s) are?
          • are you saying it is good or bad that lucene-analyzers-smartcn is not currently in solr core? (ie: in your opinion, should SmartChinese*Factory move into solr/core ?)

          the morfologik jars may be in the wrong place, but they are not configured in classpaths for solr core (e.g. solr.base.classpath)

          • again: what exactly do you mean by "wrong place" ?
          • am i correct in understanding that you feel they should be in the war file, but that it is a mistake they are not included in solr.base.classpath at compile time?

          a quick glance at that reveals other things in that classpath that shouldnt be in there like analyzers-uima.jar (which, should instead be configured in the uima contrib's classpath only)

          Skimming the build files ... if i understand correctly how things work in the ivy world ... the crux of the problem appears to be that "lucene-jars-to-solr" uses a fileset which is not equivalent to the "solr.base.classpath" used for compilation & tests ... is that it?

          (A <path /> can contain "<fileset/>"s (and fileset references), and "<path/>"s can be used inside <copy/> – so it should be easy to eliminate some redundancy here and have most of this in a single "path" refrenced by ID in both "solr.base.classpath" and the "lucene-jars-to-solr" <copy/>)

          Show
          Hoss Man added a comment - rmuir: sorry, but i'm totally lost – too many unspoken assumptions in my comment, and some vagueness in your comments has me turned around... only the morfologik jars are misplaced. so there is no lucene-analyzers-smartcn etc in solr core. can you be specific as to what you mean by "the morfologik jars" ? because evidently i don't have a clue what that means and i missunderstood what you were including in that list originally (i thought you ment all of the anlysis-extra libs that weren't ICU). can you clarify what you think the "misplaced" location(s) are? are you saying it is good or bad that lucene-analyzers-smartcn is not currently in solr core? (ie: in your opinion, should SmartChinese*Factory move into solr/core ?) the morfologik jars may be in the wrong place, but they are not configured in classpaths for solr core (e.g. solr.base.classpath) again: what exactly do you mean by "wrong place" ? am i correct in understanding that you feel they should be in the war file, but that it is a mistake they are not included in solr.base.classpath at compile time? a quick glance at that reveals other things in that classpath that shouldnt be in there like analyzers-uima.jar (which, should instead be configured in the uima contrib's classpath only) Skimming the build files ... if i understand correctly how things work in the ivy world ... the crux of the problem appears to be that "lucene-jars-to-solr" uses a fileset which is not equivalent to the "solr.base.classpath" used for compilation & tests ... is that it? (A <path /> can contain " <fileset/> "s (and fileset references), and " <path/> "s can be used inside <copy/> – so it should be easy to eliminate some redundancy here and have most of this in a single "path" refrenced by ID in both "solr.base.classpath" and the "lucene-jars-to-solr" <copy/> )
          Hide
          Robert Muir added a comment -

          Sorry... lemme try to explain in more detail:

          can you be specific as to what you mean by "the morfologik jars" ? because evidently i don't have a clue what that means and i missunderstood what you were including in that list originally (i thought you ment all of the anlysis-extra libs that weren't ICU).

          WEB-INF/lib/lucene-analyzers-morfologik-4.0-SNAPSHOT.jar <-- this is the lucene integration code (analyzer, tokenfilter)
          WEB-INF/lib/morfologik-fsa-1.5.3.jar <-- these 3 jars are dependencies of the above
          WEB-INF/lib/morfologik-polish-1.5.3.jar
          WEB-INF/lib/morfologik-stemming-1.5.3.jar
          

          But this does not good for solr users: because the factory (MorfologikFilterFactory.java) is in apache-solr-analysis-extras.jar. Furthermore, I think having this situation (where these files are in the war, but the factory as a plugin) causes classloader hell.

          So with the factory in this contrib module, the jar files should really be going in contrib/analysis-extras/lucene-libs as part of the packaging process just like the other dependencies this contrib module has, otherwise we should move the factory to core (see below)

          are you saying it is good or bad that lucene-analyzers-smartcn is not currently in solr core? (ie: in your opinion, should SmartChinese*Factory move into solr/core ?)

          In my opinion, it would be nice because we could have a text_zh configured in the example that indexes chinese as words. Currently to do this, you have to deal with this huge hassle that is this crazy analysis-extras contrib which is a big barrier for indexing Chinese text.

          But thats just my opinion, i hate the contrib in general because I think its a pain to use. The reason it exists was because I initially wanted to integrate smartchinese with solr but there were concerns about it increasing the size of the .war file since the smart chinese jar is 3MB. So I created this contrib and added factories for any analyzers that didnt have factories just as a way of at least providing some help to make them usable. Just FYI: the solr.war is near 20MB now.

          Still, as it is, at least its some way to provide factories for these analyzers versus having none before.

          again: what exactly do you mean by "wrong place" ?
          am i correct in understanding that you feel they should be in the war file, but that it is a mistake they are not included in solr.base.classpath at compile time?

          Under the current setup, the factory is in contrib/analysis-extras. the contrib/analysis-extras build logic puts these dependencies into contrib/analysis-extras' classpath so the tests will pass.

          If we want to move the factories to core, then we have to adjust the solr core classpath to then include the jar files instead.

          a quick glance at that reveals other things in that classpath that shouldnt be in there like analyzers-uima.jar (which, should instead be configured in the uima contrib's classpath only)

          Here is contrib/uima/build.xml:

            <path id="classpath">
              <pathelement path="${analyzers-uima.jar}"/>
              <path refid="solr.base.classpath"/>
            </path>
          

          So its useless to have analyzers-uima in the solr core classpath, because in the current packaging solr core code should not be depending on this jar. And contrib/uima already adds this itself.

          Show
          Robert Muir added a comment - Sorry... lemme try to explain in more detail: can you be specific as to what you mean by "the morfologik jars" ? because evidently i don't have a clue what that means and i missunderstood what you were including in that list originally (i thought you ment all of the anlysis-extra libs that weren't ICU). WEB-INF/lib/lucene-analyzers-morfologik-4.0-SNAPSHOT.jar <-- this is the lucene integration code (analyzer, tokenfilter) WEB-INF/lib/morfologik-fsa-1.5.3.jar <-- these 3 jars are dependencies of the above WEB-INF/lib/morfologik-polish-1.5.3.jar WEB-INF/lib/morfologik-stemming-1.5.3.jar But this does not good for solr users: because the factory (MorfologikFilterFactory.java) is in apache-solr-analysis-extras.jar. Furthermore, I think having this situation (where these files are in the war, but the factory as a plugin) causes classloader hell. So with the factory in this contrib module, the jar files should really be going in contrib/analysis-extras/lucene-libs as part of the packaging process just like the other dependencies this contrib module has, otherwise we should move the factory to core (see below) are you saying it is good or bad that lucene-analyzers-smartcn is not currently in solr core? (ie: in your opinion, should SmartChinese*Factory move into solr/core ?) In my opinion, it would be nice because we could have a text_zh configured in the example that indexes chinese as words. Currently to do this, you have to deal with this huge hassle that is this crazy analysis-extras contrib which is a big barrier for indexing Chinese text. But thats just my opinion, i hate the contrib in general because I think its a pain to use. The reason it exists was because I initially wanted to integrate smartchinese with solr but there were concerns about it increasing the size of the .war file since the smart chinese jar is 3MB. So I created this contrib and added factories for any analyzers that didnt have factories just as a way of at least providing some help to make them usable. Just FYI: the solr.war is near 20MB now. Still, as it is, at least its some way to provide factories for these analyzers versus having none before. again: what exactly do you mean by "wrong place" ? am i correct in understanding that you feel they should be in the war file, but that it is a mistake they are not included in solr.base.classpath at compile time? Under the current setup, the factory is in contrib/analysis-extras. the contrib/analysis-extras build logic puts these dependencies into contrib/analysis-extras' classpath so the tests will pass. If we want to move the factories to core, then we have to adjust the solr core classpath to then include the jar files instead. a quick glance at that reveals other things in that classpath that shouldnt be in there like analyzers-uima.jar (which, should instead be configured in the uima contrib's classpath only) Here is contrib/uima/build.xml: <path id="classpath"> <pathelement path="${analyzers-uima.jar}"/> <path refid="solr.base.classpath"/> </path> So its useless to have analyzers-uima in the solr core classpath, because in the current packaging solr core code should not be depending on this jar. And contrib/uima already adds this itself.
          Hide
          Hoss Man added a comment -

          So with the factory in this contrib module, the jar files should really be going in contrib/analysis-extras/lucene-libs as part of the packaging process just like the other dependencies this contrib module has, otherwise we should move the factory to core (see below)

          ok, cool. part of my missunderstanding was thinking that *morfo*.jar was being included in the war because it was a dependency of something else that was already in solr core.

          In my opinion, it would be nice because we could have a text_zh configured in the example that indexes chinese as words. Currently to do this, you have to deal with this huge hassle that is this crazy analysis-extras contrib which is a big barrier for indexing Chinese text.

          I've generally been a big proponent of the "small war" philosophy, but i certainly appreciate the value/importance of having a clean out of hte box experience for all langauges – i would definitely be interested to hear what other people think

          For now, assuming that analysis-extras is the "correct" place for these factories to live...

          The attached patch rectifies the inconsistency and cleans up the core classpath / lucene-libs file copying issue (ie: remove the morfo & uima jars from the war) by ensuring that the classpath and copy sources use a common list of jar files (ie: you have to go out of your way to make them different). A similar common list is used in the anslysis-extras build.xml to prevent them from every falling out of sync. I also included some minor fixes to the README files for both analysis-extras and uima

          I've done some basic junit/package testing, and things look like they are working as designed, ... but the one thing that still seems weird to me is the way the morfologik-*.jar files are treated differnetly then the icu4j*.jar...

          solr/contrib/analysis-extras/ivy.xml lists icu4j, and in the final Solr packaging that jar winds up in contrib/analysis-extras/lib (along with it's LICENSE/NOTICE) – but for the morfologik-*.jar those are not listed in ivy.xml; instead the "analyzers-morfologik.fileset" is inherited from contrib-build.xml and those jars wind up in contrib/analysis-extras/lucene-libs – w/o their LICENSE/NOTICE.

          shouldn't all those third-party jars be treated consistently?

          Show
          Hoss Man added a comment - So with the factory in this contrib module, the jar files should really be going in contrib/analysis-extras/lucene-libs as part of the packaging process just like the other dependencies this contrib module has, otherwise we should move the factory to core (see below) ok, cool. part of my missunderstanding was thinking that *morfo*.jar was being included in the war because it was a dependency of something else that was already in solr core. In my opinion, it would be nice because we could have a text_zh configured in the example that indexes chinese as words. Currently to do this, you have to deal with this huge hassle that is this crazy analysis-extras contrib which is a big barrier for indexing Chinese text. I've generally been a big proponent of the "small war" philosophy, but i certainly appreciate the value/importance of having a clean out of hte box experience for all langauges – i would definitely be interested to hear what other people think For now, assuming that analysis-extras is the "correct" place for these factories to live... The attached patch rectifies the inconsistency and cleans up the core classpath / lucene-libs file copying issue (ie: remove the morfo & uima jars from the war) by ensuring that the classpath and copy sources use a common list of jar files (ie: you have to go out of your way to make them different). A similar common list is used in the anslysis-extras build.xml to prevent them from every falling out of sync. I also included some minor fixes to the README files for both analysis-extras and uima I've done some basic junit/package testing, and things look like they are working as designed, ... but the one thing that still seems weird to me is the way the morfologik-*.jar files are treated differnetly then the icu4j*.jar... solr/contrib/analysis-extras/ivy.xml lists icu4j, and in the final Solr packaging that jar winds up in contrib/analysis-extras/lib (along with it's LICENSE/NOTICE) – but for the morfologik-*.jar those are not listed in ivy.xml; instead the "analyzers-morfologik.fileset" is inherited from contrib-build.xml and those jars wind up in contrib/analysis-extras/lucene-libs – w/o their LICENSE/NOTICE. shouldn't all those third-party jars be treated consistently?
          Hide
          Robert Muir added a comment -

          patch looks good, except I agree with you about the 3rd party jars.

          Lets be consistent on that:

          I think it should be in the ivy.xml, with LICENSE/NOTICE in lib/
          It would then get pulled into the classpath because its in lib/

          Show
          Robert Muir added a comment - patch looks good, except I agree with you about the 3rd party jars. Lets be consistent on that: I think it should be in the ivy.xml, with LICENSE/NOTICE in lib/ It would then get pulled into the classpath because its in lib/
          Hide
          Hoss Man added a comment -

          Yeah ... there's some risk of dependencies getting out of sync, but it's still better then not having licenses in the solr binary packages (i'll open a distinct issue to track brainstorming improvments)

          Ok, new patch, assumes the following svn copy commands which i left out of hte patch to keep it easily readable...

          svn cp ./lucene/analysis/morfologik/lib/morfologik*.sha1 solr/contrib/analysis-extras/lib/
          svn cp ./lucene/analysis/morfologik/lib/morfologik*.txt solr/contrib/analysis-extras/lib/
          
          Show
          Hoss Man added a comment - Yeah ... there's some risk of dependencies getting out of sync, but it's still better then not having licenses in the solr binary packages (i'll open a distinct issue to track brainstorming improvments) Ok, new patch, assumes the following svn copy commands which i left out of hte patch to keep it easily readable... svn cp ./lucene/analysis/morfologik/lib/morfologik*.sha1 solr/contrib/analysis-extras/lib/ svn cp ./lucene/analysis/morfologik/lib/morfologik*.txt solr/contrib/analysis-extras/lib/
          Hide
          Hoss Man added a comment -

          Opened SOLR-3664 to discuss better ways of dealing with this moving forward, but i didn't want that to slow down fixing this on 4x ASAP

          Show
          Hoss Man added a comment - Opened SOLR-3664 to discuss better ways of dealing with this moving forward, but i didn't want that to slow down fixing this on 4x ASAP
          Hide
          Robert Muir added a comment -

          +1 to commit. I agree lets fix this for now and move forward.

          I ran the svn moves and applied it locally, and did some "tests"

          • top-level ant clean-jars + 'ant test' from contrib/analysis-extras
          • top-level ant clean-jars + 'ant test' from contrib/uima

          Same with the javadocs. All dependencies seem correct to me.
          (one of these days i will look at the ant-unit stuff and figure out a way we can do these kinda checks from every module always to ensure this stuff is working).

          Show
          Robert Muir added a comment - +1 to commit. I agree lets fix this for now and move forward. I ran the svn moves and applied it locally, and did some "tests" top-level ant clean-jars + 'ant test' from contrib/analysis-extras top-level ant clean-jars + 'ant test' from contrib/uima Same with the javadocs. All dependencies seem correct to me. (one of these days i will look at the ant-unit stuff and figure out a way we can do these kinda checks from every module always to ensure this stuff is working).
          Hide
          Hoss Man added a comment -

          Committed revision 1364728. - trunk
          Committed revision 1364738. - 4x

          Show
          Hoss Man added a comment - Committed revision 1364728. - trunk Committed revision 1364738. - 4x
          Hide
          Lance Norskog added a comment -

          Moving jar deployment problem to SOLR-3760 because this issue drifted to licensing problems.

          Show
          Lance Norskog added a comment - Moving jar deployment problem to SOLR-3760 because this issue drifted to licensing problems.

            People

            • Assignee:
              Hoss Man
              Reporter:
              Lance Norskog
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development