Solr
  1. Solr
  2. SOLR-2119

IndexSchema should log warning if <analyzer> is declared with charfilter/tokenizer/tokenfiler out of order

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 4.9, 5.0
    • Component/s: Schema and Analysis
    • Labels:
      None

      Description

      There seems to be a segment of hte user population that has a hard time understanding the distinction between a charfilter, a tokenizer, and a tokenfilter – while we can certianly try to improve the documentation about what exactly each does, and when they take affect in the analysis chain, one other thing we should do is try to educate people when they constuct their <analyzer> in a way that doesn't make any sense.

      at the moment, some people are attempting to do things like "move the Foo <tokenFilter/> before the <tokenizer/>" to try and get certain behavior ... at a minimum we should log a warning in this case that doing that doesn't have the desired effect

      (we could easily make such a situation fail to initialize, but i'm not convinced that would be the best course of action, since some people may have schema's where they have declared a charFilter or tokenizer out of order relative to their tokenFilters, but are still getting "correct" results that work for them, and breaking their instance on upgrade doens't seem like it would be productive)

        Activity

        Hide
        Robert Muir added a comment -

        There seems to be a segment of hte user population that has a hard time understanding the distinction between a charfilter, a tokenizer, and a tokenfilter - while we can certianly try to improve the documentation about what exactly each does, and when they take affect in the analysis chain, one other thing we should do is try to educate people when they constuct their <analyzer> in a way that doesn't make any sense.

        I think we should do both, this is a great idea.

        (we could easily make such a situation fail to initialize, but i'm not convinced that would be the best course of action, since some people may have schema's where they have declared a charFilter or tokenizer out of order relative to their tokenFilters, but are still getting "correct" results that work for them, and breaking their instance on upgrade doens't seem like it would be productive)

        I would prefer a hard error. I think someone who doesnt understand what tokenizers and filters do, likely isnt looking at their log files either.

        In my opinion, Solr should be more picky about its configuration. Often times if i havent had enough sleep i will type tokenFilter instead of filter, and it simply ignores it completely, instead of an error.

        and i can't be the only one that does this, its not obvious that tokenizer = Tokenizer, charFilter = CharFilter, analyzer = Analyzer, but filter = TokenFilter.

        Show
        Robert Muir added a comment - There seems to be a segment of hte user population that has a hard time understanding the distinction between a charfilter, a tokenizer, and a tokenfilter - while we can certianly try to improve the documentation about what exactly each does, and when they take affect in the analysis chain, one other thing we should do is try to educate people when they constuct their <analyzer> in a way that doesn't make any sense. I think we should do both, this is a great idea. (we could easily make such a situation fail to initialize, but i'm not convinced that would be the best course of action, since some people may have schema's where they have declared a charFilter or tokenizer out of order relative to their tokenFilters, but are still getting "correct" results that work for them, and breaking their instance on upgrade doens't seem like it would be productive) I would prefer a hard error. I think someone who doesnt understand what tokenizers and filters do, likely isnt looking at their log files either. In my opinion, Solr should be more picky about its configuration. Often times if i havent had enough sleep i will type tokenFilter instead of filter, and it simply ignores it completely, instead of an error. and i can't be the only one that does this, its not obvious that tokenizer = Tokenizer, charFilter = CharFilter, analyzer = Analyzer, but filter = TokenFilter.
        Hide
        Michael McCandless added a comment -

        +1 for hard error.

        In general for problems we can detect at startup we should not start the server. Users rarely see/do something about the warnings.

        I think this would be a good service to those users who trip the hard error on upgrade: it means Solr is not doing what they thought they asked it to do.

        Show
        Michael McCandless added a comment - +1 for hard error. In general for problems we can detect at startup we should not start the server. Users rarely see/do something about the warnings. I think this would be a good service to those users who trip the hard error on upgrade: it means Solr is not doing what they thought they asked it to do.
        Hide
        Mark Miller added a comment -

        I think this would be a good service to those users who trip the hard error on upgrade: it means Solr is not doing what they thought they asked it to do.

        +1

        Show
        Mark Miller added a comment - I think this would be a good service to those users who trip the hard error on upgrade: it means Solr is not doing what they thought they asked it to do. +1
        Hide
        Robert Muir added a comment -

        Bulk move 3.2 -> 3.3

        Show
        Robert Muir added a comment - Bulk move 3.2 -> 3.3
        Hide
        Robert Muir added a comment -

        3.4 -> 3.5

        Show
        Robert Muir added a comment - 3.4 -> 3.5
        Hide
        Hoss Man added a comment -

        Bulk of fixVersion=3.6 -> fixVersion=4.0 for issues that have no assignee and have not been updated recently.

        email notification suppressed to prevent mass-spam
        psuedo-unique token identifying these issues: hoss20120321nofix36

        Show
        Hoss Man added a comment - Bulk of fixVersion=3.6 -> fixVersion=4.0 for issues that have no assignee and have not been updated recently. email notification suppressed to prevent mass-spam psuedo-unique token identifying these issues: hoss20120321nofix36
        Hide
        Steve Rowe added a comment -

        Bulk move 4.4 issues to 4.5 and 5.0

        Show
        Steve Rowe added a comment - Bulk move 4.4 issues to 4.5 and 5.0
        Hide
        Uwe Schindler added a comment -

        Move issue to Solr 4.9.

        Show
        Uwe Schindler added a comment - Move issue to Solr 4.9.

          People

          • Assignee:
            Unassigned
            Reporter:
            Hoss Man
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:

              Development