Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-1204

Enhance SpellingQueryConverter to handle UTF-8 instead of ASCII only

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Trivial
    • Resolution: Fixed
    • 1.3
    • 1.4
    • spellchecker
    • None

    Description

      Solr - User - SpellCheckComponent: queryAnalyzerFieldType
      http://www.nabble.com/SpellCheckComponent%3A-queryAnalyzerFieldType-td23870668.html

      In the above thread, it was suggested to extend the SpellingQueryConverter to cover the full UTF-8 range instead of handling US-ASCII only. This might be as simple as changing the regular expression used to tokenize the input string to accept a sequence of one or more Unicode letters ( \p

      {L}

      + ) instead of a sequence of one or more word characters ( \w+ ).

      See http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html for Java regular expression reference.

      Attachments

        1. SpellingQueryConverter.java.diff
          0.5 kB
          Michael Ludwig
        2. SpellingQueryConverter.java.diff
          0.5 kB
          Michael Ludwig

        Issue Links

          Activity

            People

              shalin Shalin Shekhar Mangar
              milu71 Michael Ludwig
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: