1. Solr
  2. SOLR-2648

improve interaction of synonymsfilterfactory with analysis chain


    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 3.4, 4.0-ALPHA
    • Fix Version/s: None
    • Component/s: Schema and Analysis
    • Labels:


      Spinoff of LUCENE-3233 (there is a TODO here), this was also mentioned by Otis on the mailing list:

      As of LUCENE-3233, the builder for the synonyms structure uses an Analyzer behind the scenes to actually tokenize the synonyms in your synonyms file.
      Currently the solr factory uses a WhitespaceTokenizer, unless you supply the tokenizerchain parameter, which lets you specify a tokenizer.

      If there was some way to instead specify a chain to this factory (e.g. charfilters, tokenizer, tokenfilter such as stemmers) versus just a tokenizerfactory,
      it would be a lot more flexible (e.g. it would stem your synonyms for you), and would solve this use case.

      Personally I think it would be most ideal if this just automatically work, e.g. if you have a chain of A, B, SynonymsFilter, C, D: then in my opinion the synonyms
      should be analyzed with an analysis chain of A, B. This way the injected synonyms are processed as if they were in the tokenstream to begin with.

      Note: there are some limitations here to what the chain can do, e.g. you cant be putting WDF before synonyms or other things that muck with positions, and you cant
      have a synonym that analyzes to nothing at all, but the parser checks for all these conditions and throws a syntax error so it would be clear to the user that
      they put the synonymsfilter in the "wrong place" in their chain.


        No work has yet been logged on this issue.


          • Assignee:
            Robert Muir
          • Votes:
            0 Vote for this issue
            0 Start watching this issue


            • Created: