Solr
  1. Solr
  2. SOLR-1086

Need to rectify inconsistent behavior when people associate an analyzer with a non-TextField fieldType

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 1.1.0, 1.2, 1.3, 1.4
    • Fix Version/s: None
    • Component/s: Schema and Analysis
    • Labels:

      Description

      Currently, specifying an <analyzer> is only supported when using the TextField class – however:
      1) no error is logged if an <analyzer> is declared for other field types
      2) the analysis screen gives the mistaken impression that the analyzer is being used...
      http://www.nabble.com/Field-tokenizer-question-to22594575.html

        Activity

        Hide
        Hoss Man added a comment -


        we could add a lot new error checking for to deal with these case, but personally i think we should make it possible to specify an Analyzer on any fieldType. For every FieldType except TextField, the result would be the same as if every token produced by the analyzer had been added as a multivalue. a new "ConcatAllTokenFilter" could be added to make it easy for people concat all tokens produced by a tokenizer and other tokenfilters back into a single string if necessary.

        Show
        Hoss Man added a comment - we could add a lot new error checking for to deal with these case, but personally i think we should make it possible to specify an Analyzer on any fieldType. For every FieldType except TextField, the result would be the same as if every token produced by the analyzer had been added as a multivalue. a new "ConcatAllTokenFilter" could be added to make it easy for people concat all tokens produced by a tokenizer and other tokenfilters back into a single string if necessary.
        Hide
        Erick Erickson added a comment -

        I think this is another great project for someone starting in to Solr/Lucene. It might be really useful to be able, for instance, to send the input into a date field through an analysis chain (perhaps custom) to deal with various date formats.

        Show
        Erick Erickson added a comment - I think this is another great project for someone starting in to Solr/Lucene. It might be really useful to be able, for instance, to send the input into a date field through an analysis chain (perhaps custom) to deal with various date formats.
        Hide
        Hoss Man added a comment -

        this issue actually pre-dates a lot of newer functionality in solr – in particular UpdateProcessors.

        I would suggest that while we should still add schema validation checking ot error if someone tries to use an analyzer where it's no supported (if it hasn't already been added?) for things like date format parsing or number format parsing (where you want it to produce Date/Integer/Float/etc objects for storage not just for term indxing) those would make the most sense as UpdateProcessors.

        Show
        Hoss Man added a comment - this issue actually pre-dates a lot of newer functionality in solr – in particular UpdateProcessors. I would suggest that while we should still add schema validation checking ot error if someone tries to use an analyzer where it's no supported (if it hasn't already been added?) for things like date format parsing or number format parsing (where you want it to produce Date/Integer/Float/etc objects for storage not just for term indxing) those would make the most sense as UpdateProcessors.

          People

          • Assignee:
            Unassigned
            Reporter:
            Hoss Man
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:

              Development