Lucene - Core
  1. Lucene - Core
  2. LUCENE-2413

Consolidate all (Solr's & Lucene's) analyzers into modules/analysis

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      We've been wanting to do this for quite some time now... I think, now that Solr/Lucene are merged, and we're looking at opening an unstable line of development for Solr/Lucene, now is the right time to do it.

      A standalone module for all analyzers also empowers apps to separately version the analyzers from which version of Solr/Lucene they use, possibly enabling us to remove Version entirely from the analyzers.

      We should also do LUCENE-2309 (decouple, as much as possible, indexer from the analysis API), but I don't think that issue needs to block this consolidation.

      Once we do this, there is one place where our users can find all the analyzers that Solr/Lucene provide.

      1. LUCENE-2413_porter.patch
        52 kB
        Robert Muir
      2. LUCENE-2413_folding.patch
        9 kB
        Robert Muir
      3. LUCENE-2413-PFAW+LF.patch
        24 kB
        Uwe Schindler
      4. LUCENE-2413_teesink.patch
        16 kB
        Robert Muir
      5. LUCENE-2413-charfilter.patch
        9 kB
        Robert Muir
      6. LUCENE-2413_commongrams.patch
        12 kB
        Robert Muir
      7. LUCENE-2413_htmlstrip.patch
        5 kB
        Robert Muir
      8. LUCENE-2413_wdf.patch
        20 kB
        Robert Muir
      9. LUCENE-2413_removeDups.patch
        6 kB
        Robert Muir
      10. LUCENE-2413_pattern.patch
        37 kB
        Robert Muir
      11. LUCENE-2413_keep_hyphen_trim.patch
        19 kB
        Robert Muir
      12. LUCENE-2413_synonym.patch
        6 kB
        Robert Muir
      13. LUCENE-2413_testanalyzer.patch
        363 kB
        Robert Muir
      14. LUCENE-2413_testanalyzer.patch
        363 kB
        Robert Muir
      15. LUCENE-2413_tests2.patch
        39 kB
        Robert Muir
      16. LUCENE-2413_mockfilter.patch
        36 kB
        Robert Muir
      17. LUCENE-2413_mockfilter.patch
        40 kB
        Robert Muir
      18. LUCENE-2413_tests3.patch
        21 kB
        Robert Muir
      19. LUCENE-2413_test4.patch
        86 kB
        Robert Muir
      20. LUCENE-2413_icu.patch
        58 kB
        Robert Muir
      21. LUCENE-2413_keyword.patch
        36 kB
        Robert Muir
      22. LUCENE-2413_coreAnalyzers.patch
        252 kB
        Robert Muir
      23. LUCENE-2413_coreUtils.patch
        69 kB
        Robert Muir
      24. LUCENE-2413-dir-and-package-fixes.patch
        7 kB
        Steve Rowe
      25. LUCENE-2413_capitalize_phonetic.patch
        77 kB
        Robert Muir

        Issue Links

          Activity

          Hide
          Robert Muir added a comment -

          Does consolidation include contrib/icu, too?

          Otherwise we still suffer from similar problems, such as you need this filter in contrib/icu to standardize your width differences in CJK text,
          GreekLowerCaseFilter and such that are not really necessary and can be satisfied with case folding, etc.
          (and really NFKC_Casefold in my opinion should just replace LowerCaseFilter in every single last one of these analyzers)

          Show
          Robert Muir added a comment - Does consolidation include contrib/icu, too? Otherwise we still suffer from similar problems, such as you need this filter in contrib/icu to standardize your width differences in CJK text, GreekLowerCaseFilter and such that are not really necessary and can be satisfied with case folding, etc. (and really NFKC_Casefold in my opinion should just replace LowerCaseFilter in every single last one of these analyzers)
          Hide
          DM Smith added a comment -

          Robert: +1

          Show
          DM Smith added a comment - Robert: +1
          Hide
          Michael McCandless added a comment -

          Does consolidation include contrib/icu, too?

          +1

          Show
          Michael McCandless added a comment - Does consolidation include contrib/icu, too? +1
          Hide
          Steve Rowe added a comment - - edited

          Does consolidation include contrib/icu, too?

          +1

          Especially if we really go the route of individually packaging artifacts for & releasing each component separately.

          Show
          Steve Rowe added a comment - - edited Does consolidation include contrib/icu, too? +1 Especially if we really go the route of individually packaging artifacts for & releasing each component separately.
          Hide
          Robert Muir added a comment -

          I played with this issue enough already to realize its gonna be a pain, huge svn movements and lots of changes.

          so here's a patch that moves the PorterStemmer to contrib/analyzers... (under the 'en' pkg)... its a start.

          I would like to commit tomorrow unless anyone objects.

          Show
          Robert Muir added a comment - I played with this issue enough already to realize its gonna be a pain, huge svn movements and lots of changes. so here's a patch that moves the PorterStemmer to contrib/analyzers... (under the 'en' pkg)... its a start. I would like to commit tomorrow unless anyone objects.
          Hide
          Michael McCandless added a comment -

          +1 to doing this in as-baby-steps as we can

          Show
          Michael McCandless added a comment - +1 to doing this in as-baby-steps as we can
          Hide
          Robert Muir added a comment -

          committed LUCENE-2413_porter.patch revision 940459.

          Show
          Robert Muir added a comment - committed LUCENE-2413 _porter.patch revision 940459.
          Hide
          Robert Muir added a comment -

          attached is a patch to move ISOLatin1AccentFilter and ASCIIFoldingFilter to contrib (under miscellaneous).

          I hacked the analyzing queryparser's test to avoid a dependency
          on contrib analyzers, but gonna need a 'TestAnalyzer' soon.

          would like to commit soon unless there are objections

          Show
          Robert Muir added a comment - attached is a patch to move ISOLatin1AccentFilter and ASCIIFoldingFilter to contrib (under miscellaneous). I hacked the analyzing queryparser's test to avoid a dependency on contrib analyzers, but gonna need a 'TestAnalyzer' soon. would like to commit soon unless there are objections
          Hide
          Robert Muir added a comment -

          committed LUCENE-2413_folding.patch revision 940591.

          Show
          Robert Muir added a comment - committed LUCENE-2413 _folding.patch revision 940591.
          Hide
          Uwe Schindler added a comment -

          Here my patch for LengthFilter and PerFieldAnalyzerWrapper.

          Show
          Uwe Schindler added a comment - Here my patch for LengthFilter and PerFieldAnalyzerWrapper.
          Hide
          Robert Muir added a comment -

          Thanks Uwe, the help is appreciated!

          Show
          Robert Muir added a comment - Thanks Uwe, the help is appreciated!
          Hide
          Robert Muir added a comment -

          attached is a patch for TeeSink, it moves it to contrib/analyzers/sinks

          I moved the test in TestIW (seems to really be unrelated to IW) as-is to
          the TeeSinkTest, it appears from the JIRA issue etc that this is simply
          testing that end() is implemented correctly in TeeSink, and there is
          already a separate test for end() in TestIW.

          Show
          Robert Muir added a comment - attached is a patch for TeeSink, it moves it to contrib/analyzers/sinks I moved the test in TestIW (seems to really be unrelated to IW) as-is to the TeeSinkTest, it appears from the JIRA issue etc that this is simply testing that end() is implemented correctly in TeeSink, and there is already a separate test for end() in TestIW.
          Hide
          Uwe Schindler added a comment -

          Committed LUCENE-2413-PFAW+LF.patch revision: 940632

          Show
          Uwe Schindler added a comment - Committed LUCENE-2413 -PFAW+LF.patch revision: 940632
          Hide
          Robert Muir added a comment -

          Committed LUCENE-2413_teesink.patch revision 940633

          Show
          Robert Muir added a comment - Committed LUCENE-2413 _teesink.patch revision 940633
          Hide
          Robert Muir added a comment -

          attached is a patch that moves some high-level charfilter functionality to contrib/analyzers
          so MappingCharFilter,BaseCharFilter,NormalizeCharMap -> o.a.l.analysis.charfilter

          Show
          Robert Muir added a comment - attached is a patch that moves some high-level charfilter functionality to contrib/analyzers so MappingCharFilter,BaseCharFilter,NormalizeCharMap -> o.a.l.analysis.charfilter
          Hide
          Robert Muir added a comment -

          Committed LUCENE-2413-charfilter.patch revision 940676.

          Show
          Robert Muir added a comment - Committed LUCENE-2413 -charfilter.patch revision 940676.
          Hide
          Robert Muir added a comment -

          patch for commongrams(query)filter

          Show
          Robert Muir added a comment - patch for commongrams(query)filter
          Hide
          Robert Muir added a comment -

          Committed LUCENE-2413_commongrams.patch revision 940761.

          Show
          Robert Muir added a comment - Committed LUCENE-2413 _commongrams.patch revision 940761.
          Hide
          Robert Muir added a comment -

          htmlstripcharfilter -> o.a.l.charfilter.htmlstripcharfilter

          Show
          Robert Muir added a comment - htmlstripcharfilter -> o.a.l.charfilter.htmlstripcharfilter
          Hide
          Robert Muir added a comment -

          Committed LUCENE-2413_htmlstrip.patch revision 940768.

          Show
          Robert Muir added a comment - Committed LUCENE-2413 _htmlstrip.patch revision 940768.
          Hide
          Robert Muir added a comment -

          worddelimiterfilter -> analysis.misc.WordDelimiterFilter

          Show
          Robert Muir added a comment - worddelimiterfilter -> analysis.misc.WordDelimiterFilter
          Hide
          Robert Muir added a comment -

          Committed LUCENE-2413_wdf.patch revision 940781.

          Show
          Robert Muir added a comment - Committed LUCENE-2413 _wdf.patch revision 940781.
          Hide
          Robert Muir added a comment -

          removeDuplicatesTokenFilter -> misc/

          Show
          Robert Muir added a comment - removeDuplicatesTokenFilter -> misc/
          Hide
          Robert Muir added a comment -

          Committed LUCENE-2413_removeDups.patch revision 940788.

          Show
          Robert Muir added a comment - Committed LUCENE-2413 _removeDups.patch revision 940788.
          Hide
          Robert Muir added a comment -

          this patch moves pattern-based components (PatternReplaceFilter, PatternTokenizer, PatternReplaceCharFilter)
          to analysis.pattern package.

          the existing PatternAnalyzer in contrib is marked deprecated in favor of this.

          additionally, i removed the commons dependency on PatternTokenizer and improved its performance by reusing a stringbuilder (instead of IOUtils.toString), and by not creating new strings for group-matching.

          Show
          Robert Muir added a comment - this patch moves pattern-based components (PatternReplaceFilter, PatternTokenizer, PatternReplaceCharFilter) to analysis.pattern package. the existing PatternAnalyzer in contrib is marked deprecated in favor of this. additionally, i removed the commons dependency on PatternTokenizer and improved its performance by reusing a stringbuilder (instead of IOUtils.toString), and by not creating new strings for group-matching.
          Hide
          Robert Muir added a comment -

          Committed LUCENE-2413_pattern.patch revision 940813

          Show
          Robert Muir added a comment - Committed LUCENE-2413 _pattern.patch revision 940813
          Hide
          Robert Muir added a comment -

          keepwordfilter,trimfilter,hyphenatedwordsfilter -> misc

          Show
          Robert Muir added a comment - keepwordfilter,trimfilter,hyphenatedwordsfilter -> misc
          Hide
          Robert Muir added a comment -

          Committed LUCENE-2413_keep_hyphen_trim.patch revision 940962.

          Show
          Robert Muir added a comment - Committed LUCENE-2413 _keep_hyphen_trim.patch revision 940962.
          Hide
          Robert Muir added a comment -

          attached is a patch to move synonymfilter/synonymmap into the analyzers module.

          didn't deprecate the synonymfilter/synonymmap from contrib/wordnet quite yet.

          Show
          Robert Muir added a comment - attached is a patch to move synonymfilter/synonymmap into the analyzers module. didn't deprecate the synonymfilter/synonymmap from contrib/wordnet quite yet.
          Hide
          Robert Muir added a comment -

          Committed LUCENE-2413_synonym.patch revision 942827.

          Show
          Robert Muir added a comment - Committed LUCENE-2413 _synonym.patch revision 942827.
          Hide
          Robert Muir added a comment -

          attached is a patch that creates a barebones _TestAnalyzer and _TestTokenizer in src/test

          These can be used for running lucene tests, so that more analyzers can be moved to the analyzers module.

          I didn't convert all tests to use it, only the easy ones so far.

          Show
          Robert Muir added a comment - attached is a patch that creates a barebones _TestAnalyzer and _TestTokenizer in src/test These can be used for running lucene tests, so that more analyzers can be moved to the analyzers module. I didn't convert all tests to use it, only the easy ones so far.
          Hide
          Robert Muir added a comment -

          i renamed the test analyzer to MockAnalyzer/Tokenizer at hossman's suggestion...

          all tests pass

          Show
          Robert Muir added a comment - i renamed the test analyzer to MockAnalyzer/Tokenizer at hossman's suggestion... all tests pass
          Hide
          Robert Muir added a comment -

          Committed LUCENE-2413_testanalyzer.patch revision 943288.

          By the way, when reviewing I found some disabled queryparser tests:

          -  public void tesStopwordsParsing() throws Exception {
          +  public void testStopwordsParsing() throws Exception {
          

          I will re-enable these tests on the 3x branch too.

          Show
          Robert Muir added a comment - Committed LUCENE-2413 _testanalyzer.patch revision 943288. By the way, when reviewing I found some disabled queryparser tests: - public void tesStopwordsParsing() throws Exception { + public void testStopwordsParsing() throws Exception { I will re-enable these tests on the 3x branch too.
          Hide
          Robert Muir added a comment -

          attached is a patch (LUCENE-2413_tests2.patcH) that adds a "SIMPLE" mode to MockAnalyzer.
          I cut over a lot of tests that were previously using SimpleAnalyzer/LowerCaseTokenizer etc to this.

          I also added some basic tests for the MockAnalyzer itself (one day we will want to free it from CharTokenizer, etc)

          Show
          Robert Muir added a comment - attached is a patch ( LUCENE-2413 _tests2.patcH) that adds a "SIMPLE" mode to MockAnalyzer. I cut over a lot of tests that were previously using SimpleAnalyzer/LowerCaseTokenizer etc to this. I also added some basic tests for the MockAnalyzer itself (one day we will want to free it from CharTokenizer, etc)
          Hide
          Robert Muir added a comment -

          Committed LUCENE-2413_tests2.patch revision 944677

          Show
          Robert Muir added a comment - Committed LUCENE-2413 _tests2.patch revision 944677
          Hide
          Robert Muir added a comment -

          attached is some cleanup for the mock analyzers tests.

          additionally i added a filter for testing, that removes any terms accepted by a DFA.
          So you can use this to emulate stopfilter, keepfilter, lengthfilter, ...
          Lots of tests need to test this sorta stuff with posIncs.

          Show
          Robert Muir added a comment - attached is some cleanup for the mock analyzers tests. additionally i added a filter for testing, that removes any terms accepted by a DFA. So you can use this to emulate stopfilter, keepfilter, lengthfilter, ... Lots of tests need to test this sorta stuff with posIncs.
          Hide
          Robert Muir added a comment -

          oops, i forgot to svn add.
          heres the corrected patch

          Show
          Robert Muir added a comment - oops, i forgot to svn add. heres the corrected patch
          Hide
          Uwe Schindler added a comment -

          Just thinking about MockFilter:
          May this much faster than CharArraySet? If we build a DFA out of the stopwords, like done in the MockFilter, and also minimize it, will the checking for a hit not be much faster? e.g. if the first character of the termBuffer does not match the automaton it gets rejected. CAS always has to calculate the hashCode of the whole string first and then look it up.
          I would like to see a comparison with a minimized Automaton vs. CAS for StopFilter. OK, LengthFilter is more performant by just comparing TermLength, but the StopFilter should be much faster.
          I propose to pass a Set to the StopFilter and internally it converts it to a minimized Automaton similar to MockFilter.

          Show
          Uwe Schindler added a comment - Just thinking about MockFilter: May this much faster than CharArraySet? If we build a DFA out of the stopwords, like done in the MockFilter, and also minimize it, will the checking for a hit not be much faster? e.g. if the first character of the termBuffer does not match the automaton it gets rejected. CAS always has to calculate the hashCode of the whole string first and then look it up. I would like to see a comparison with a minimized Automaton vs. CAS for StopFilter. OK, LengthFilter is more performant by just comparing TermLength, but the StopFilter should be much faster. I propose to pass a Set to the StopFilter and internally it converts it to a minimized Automaton similar to MockFilter.
          Hide
          Robert Muir added a comment -

          May this much faster than CharArraySet

          I ran indexing tests a while ago (reuters) with CharArraySet itself implemented with a DFA, and it was slightly faster, but not much. I think this is because english words are usually not very long (average length=5). For other languages this technique might save some cpu time, but there are some "problems" i imagine

          1. building an automaton from a list of words is more expensive, although Dawid Weiss has implemented an addition to automaton that does this fast.
          2. in general building automaton and runautomaton etc is more "heavy" i would think, but Mike Mccandless hacked away a lot of this heaviness when we converted to UTF-32.
          3. the CharacterRunAutomaton is not optimized right now, we disabled the classmap[] for chars because it consume more RAM. I think if we were to care about performance on char[] we should make it classmap 0x0-0xffff and binary search the rest, or something similar. currently it binarysearches on each input character.

          Somewhat related, a while ago i tested this with CharArraySet as a DFA, and opened this issue: LUCENE-2227. But obviously this is not the only way, as this example shows filtering on the dfa itself (and not using chararrayset at all).

          So in general, i have those concerns right now, but maybe in the future once some things are addressed we could at least make an optional stopfilter impl or something similar.

          One thing i like about this filter personally, is that rejected terms always get (optionally) the posInc increased... I do not think our existing KeepWord or LengthFilters do this, but maybe i am wrong.

          Show
          Robert Muir added a comment - May this much faster than CharArraySet I ran indexing tests a while ago (reuters) with CharArraySet itself implemented with a DFA, and it was slightly faster, but not much. I think this is because english words are usually not very long (average length=5). For other languages this technique might save some cpu time, but there are some "problems" i imagine building an automaton from a list of words is more expensive, although Dawid Weiss has implemented an addition to automaton that does this fast. in general building automaton and runautomaton etc is more "heavy" i would think, but Mike Mccandless hacked away a lot of this heaviness when we converted to UTF-32. the CharacterRunAutomaton is not optimized right now, we disabled the classmap[] for chars because it consume more RAM. I think if we were to care about performance on char[] we should make it classmap 0x0-0xffff and binary search the rest, or something similar. currently it binarysearches on each input character. Somewhat related, a while ago i tested this with CharArraySet as a DFA, and opened this issue: LUCENE-2227 . But obviously this is not the only way, as this example shows filtering on the dfa itself (and not using chararrayset at all). So in general, i have those concerns right now, but maybe in the future once some things are addressed we could at least make an optional stopfilter impl or something similar. One thing i like about this filter personally, is that rejected terms always get (optionally) the posInc increased... I do not think our existing KeepWord or LengthFilters do this, but maybe i am wrong.
          Hide
          Robert Muir added a comment -

          Committed LUCENE-2413_mockfilter.patch 944908.

          I think now we can move all tests to this framework and pull all the analyzers out.

          Show
          Robert Muir added a comment - Committed LUCENE-2413 _mockfilter.patch 944908. I think now we can move all tests to this framework and pull all the analyzers out.
          Hide
          Robert Muir added a comment -

          attached is a patch that converts over some more tests... need a break and this was a good stopping point.

          Show
          Robert Muir added a comment - attached is a patch that converts over some more tests... need a break and this was a good stopping point.
          Hide
          Robert Muir added a comment -

          attached is a patch cutting over a lot more tests.

          Show
          Robert Muir added a comment - attached is a patch cutting over a lot more tests.
          Hide
          Robert Muir added a comment -

          committed LUCENE-2413_tests3.patch revision 944925
          committed LUCENE-2413_test4.patch revision 944966

          Show
          Robert Muir added a comment - committed LUCENE-2413 _tests3.patch revision 944925 committed LUCENE-2413 _test4.patch revision 944966
          Hide
          Robert Muir added a comment -

          Attached patch (LUCENE-2413_icu.patch) folds contrib/icu into the analyzers module.

          Since it depends on an external lib, i set it up as analyzers-icu.jar

          Show
          Robert Muir added a comment - Attached patch ( LUCENE-2413 _icu.patch) folds contrib/icu into the analyzers module. Since it depends on an external lib, i set it up as analyzers-icu.jar
          Hide
          Robert Muir added a comment -

          Committed LUCENE-2413_icu.patch revision 946590.

          Show
          Robert Muir added a comment - Committed LUCENE-2413 _icu.patch revision 946590.
          Hide
          Robert Muir added a comment -

          this patch (LUCENE-2413_keyword.patch) moves the keywordmarkerfilter out of core into the module.

          Show
          Robert Muir added a comment - this patch ( LUCENE-2413 _keyword.patch) moves the keywordmarkerfilter out of core into the module.
          Hide
          Robert Muir added a comment -

          Committed LUCENE-2413_keyword.patch revision 946621.

          Show
          Robert Muir added a comment - Committed LUCENE-2413 _keyword.patch revision 946621.
          Hide
          Doron Cohen added a comment -

          contrib/benchmark's NewShingleAnalyzerTask depends on modules' o.a.l.analysis.shingle.ShingleAnalyzerWrapper - causing cyclic dependency between projects - e.g. when creating separate Eclipse projects for lucene and modules.

          Show
          Doron Cohen added a comment - contrib/benchmark's NewShingleAnalyzerTask depends on modules' o.a.l.analysis.shingle.ShingleAnalyzerWrapper - causing cyclic dependency between projects - e.g. when creating separate Eclipse projects for lucene and modules.
          Hide
          Robert Muir added a comment -

          contrib/benchmark's NewShingleAnalyzerTask depends on modules' o.a.l.analysis.shingle.ShingleAnalyzerWrapper - causing cyclic dependency between projects - e.g. when creating separate Eclipse projects for lucene and modules.

          Hi, its not a cyclic dependency, as the analyzers module only depends on core lucene.

          If you want to have separate projects I would make the contribs separate, too, or put everything in one eclipse project (this is what I prefer).

          Show
          Robert Muir added a comment - contrib/benchmark's NewShingleAnalyzerTask depends on modules' o.a.l.analysis.shingle.ShingleAnalyzerWrapper - causing cyclic dependency between projects - e.g. when creating separate Eclipse projects for lucene and modules. Hi, its not a cyclic dependency, as the analyzers module only depends on core lucene. If you want to have separate projects I would make the contribs separate, too, or put everything in one eclipse project (this is what I prefer).
          Hide
          Robert Muir added a comment -

          By the way, one idea could be to make benchmark a module itself (the benchmarking module for all lucene/solr related stuff).

          I noticed Solr lacks a standard benchmarking suite, and at the same time more benchmarks are being created even for
          contribs/modules (highlighter, analyzers)

          Show
          Robert Muir added a comment - By the way, one idea could be to make benchmark a module itself (the benchmarking module for all lucene/solr related stuff). I noticed Solr lacks a standard benchmarking suite, and at the same time more benchmarks are being created even for contribs/modules (highlighter, analyzers)
          Hide
          Robert Muir added a comment -

          attached is a patch that pulls out the rest of lucene's concrete analyzers and puts them in the analyzers module.

          in order to do this, I had to rearrange demo. Instead i made it contrib/demo, and this really simplified the build system.

          Show
          Robert Muir added a comment - attached is a patch that pulls out the rest of lucene's concrete analyzers and puts them in the analyzers module. in order to do this, I had to rearrange demo. Instead i made it contrib/demo, and this really simplified the build system.
          Hide
          Robert Muir added a comment -

          Committed LUCENE-2413_coreAnalyzers.patch revision 948195.

          Show
          Robert Muir added a comment - Committed LUCENE-2413 _coreAnalyzers.patch revision 948195.
          Hide
          Robert Muir added a comment -

          moves CharFilter, CharArraySet, and CharArrayMap

          Show
          Robert Muir added a comment - moves CharFilter, CharArraySet, and CharArrayMap
          Hide
          Robert Muir added a comment -

          Committed LUCENE-2413_coreUtils.patch revision 948225

          Show
          Robert Muir added a comment - Committed LUCENE-2413 _coreUtils.patch revision 948225
          Hide
          Steve Rowe added a comment -

          I found an unchanged package name in a .alg file in contrib/benchmark, and went looking for more similar issues - this patch fixes the directory references and packages I found that were still pointing to the old locations.

          Show
          Steve Rowe added a comment - I found an unchanged package name in a .alg file in contrib/benchmark, and went looking for more similar issues - this patch fixes the directory references and packages I found that were still pointing to the old locations.
          Hide
          Robert Muir added a comment -

          Thanks Steven, committed revision 955203 of your patch.

          Show
          Robert Muir added a comment - Thanks Steven, committed revision 955203 of your patch.
          Hide
          Robert Muir added a comment -

          patch that moves the phonetic, doublemetaphone, and capitalization filters to the analysis module.

          with this patch, all concrete analysis components are consolidated and available to both lucene and solr users.

          I think i would like to close this issue and further, more complicated refactorings (distancing analysis from indexing, moving factories/abstract classes etc) can be done on their own issues.

          Show
          Robert Muir added a comment - patch that moves the phonetic, doublemetaphone, and capitalization filters to the analysis module. with this patch, all concrete analysis components are consolidated and available to both lucene and solr users. I think i would like to close this issue and further, more complicated refactorings (distancing analysis from indexing, moving factories/abstract classes etc) can be done on their own issues.
          Hide
          Robert Muir added a comment -

          Committed revision 957162.

          Show
          Robert Muir added a comment - Committed revision 957162.

            People

            • Assignee:
              Robert Muir
              Reporter:
              Michael McCandless
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development