Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-1204

Enhance SpellingQueryConverter to handle UTF-8 instead of ASCII only

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Trivial
    • Resolution: Fixed
    • Affects Version/s: 1.3
    • Fix Version/s: 1.4
    • Component/s: spellchecker
    • Labels:
      None

      Description

      Solr - User - SpellCheckComponent: queryAnalyzerFieldType
      http://www.nabble.com/SpellCheckComponent%3A-queryAnalyzerFieldType-td23870668.html

      In the above thread, it was suggested to extend the SpellingQueryConverter to cover the full UTF-8 range instead of handling US-ASCII only. This might be as simple as changing the regular expression used to tokenize the input string to accept a sequence of one or more Unicode letters ( \p

      {L}

      + ) instead of a sequence of one or more word characters ( \w+ ).

      See http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html for Java regular expression reference.

        Attachments

        1. SpellingQueryConverter.java.diff
          0.5 kB
          Michael Ludwig
        2. SpellingQueryConverter.java.diff
          0.5 kB
          Michael Ludwig

          Issue Links

            Activity

              People

              • Assignee:
                shalinmangar Shalin Shekhar Mangar
                Reporter:
                milu71 Michael Ludwig
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: