Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-10252

Example spellcheck config uses _text_ as default field

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 6.4.2
    • Fix Version/s: None
    • Component/s: examples, spellchecker
    • Labels:
      None

      Description

      SOLR-8381 made the _text_ field the default field for spellchecking for the basic_configs and data_driven_schema_configs example configsets. This is a copyField that gets all it's data from every other field in the index.

      This field is also of text_general type, which has a default analysis chain that includes stopwords and synonyms. If someone has a large synonym list, perhaps with a lot of overlapping matches, this would cause spell checking to occur on every one of those terms. I recently saw a parsed query that looked like this:

      "+(((_text_:partn _text_:gesellschaft _text_:teilhab _text_:konkubinat _text_:eheahn _text_:eheahn _text_:konkubinatspaar _text_:konkubinatspartn _text_:konkubinatsvertrag _text_:lebenspartn _text_:nichteheahn _text_:nichteheahn _text_:nichtehe _text_:wild _text_:registriert _text_:eingetrag _text_:eingetrag _text_:registriert _text_:vertragspartei _text_:kontrahent _text_:partei _text_:vertragspartn)/no_coord) ((_text_:gemeinschaft _text_:lebensgemeinschaft _text_:gemeinschaft _text_:lebensgemeinschaft _text_:lebensgemeinschaft _text_:ehe _text_:partnerschaft _text_:partnerschaft _text_:partn _text_:partnerschaft)/no_coord) _text_:gleichgeschlecht _text_:paar) +_text_:gestorb"
      

      Since we recommend that users use a lightly analyzed field for spell checking, using _text_ and text_general seems a problematic example for us to start people out with. The example above is a lot of extra work for little reason.

      I'm not sure what a better field is - those two examples are minimal by design, and we can't be sure what field they might have in the index to make it work out of the box. However, perhaps we can consider a better field type?

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              ctargett Cassandra Targett
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: