Solr
  1. Solr
  2. SOLR-81

Add Query Spellchecker functionality

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.2
    • Component/s: search
    • Labels:
      None

      Description

      Use the simple approach of n-gramming outside of Solr and indexing n-gram documents. For example:

      <doc>
      <field name="word">lettuce</field>
      <field name="start3">let</field>
      <field name="gram3">let ett ttu tuc uce</field>
      <field name="end3">uce</field>
      <field name="start4">lett</field>
      <field name="gram4">lett ettu ttuc tuce</field>
      <field name="end4">tuce</field>
      </doc>

      See:
      http://www.mail-archive.com/solr-user@lucene.apache.org/msg01254.html
      Java clients: SOLR-20 (add delete commit optimize), SOLR-30 (search)

      1. SOLR-81-ngram.patch
        11 kB
        Otis Gospodnetic
      2. SOLR-81-edgengram-ngram.patch
        24 kB
        Adam Hiatt
      3. SOLR-81-ngram-schema.patch
        7 kB
        Otis Gospodnetic
      4. SOLR-81-ngram.patch
        17 kB
        Otis Gospodnetic
      5. SOLR-81-ngram.patch
        16 kB
        Otis Gospodnetic
      6. SOLR-81-ngram.patch
        16 kB
        Otis Gospodnetic
      7. SOLR-81-spellchecker.patch
        16 kB
        Adam Hiatt
      8. SOLR-81-spellchecker.patch
        16 kB
        Adam Hiatt
      9. SOLR-81-spellchecker.patch
        7 kB
        Otis Gospodnetic
      10. hoss.spell.patch
        8 kB
        Hoss Man

        Issue Links

          Activity

          Uwe Schindler made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Hoss Man made changes -
          Fix Version/s 1.2 [ 12312235 ]
          Hoss Man made changes -
          Link This issue relates to LUCENE-759 [ LUCENE-759 ]
          Otis Gospodnetic made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hoss Man made changes -
          Attachment hoss.spell.patch [ 12354135 ]
          Otis Gospodnetic made changes -
          Attachment SOLR-81-spellchecker.patch [ 12353241 ]
          Adam Hiatt made changes -
          Attachment SOLR-81-spellchecker.patch [ 12352719 ]
          Adam Hiatt made changes -
          Attachment SOLR-81-spellchecker.patch [ 12352485 ]
          Otis Gospodnetic made changes -
          Attachment SOLR-81-ngram.patch [ 12352468 ]
          Otis Gospodnetic made changes -
          Attachment SOLR-81-ngram.patch [ 12351908 ]
          Otis Gospodnetic made changes -
          Attachment SOLR-81-ngram.patch [ 12351689 ]
          Otis Gospodnetic made changes -
          Attachment SOLR-81-ngram-schema.patch [ 12350361 ]
          Adam Hiatt made changes -
          Attachment SOLR-81-edgengram-ngram.patch [ 12350288 ]
          Otis Gospodnetic made changes -
          Attachment SOLR-81-ngram.patch [ 12347713 ]
          Otis Gospodnetic made changes -
          Attachment SOLR-81-ngram.patch [ 12347711 ]
          Otis Gospodnetic made changes -
          Comment [ This patch contains 3 new classes for org.apache.solr.analysis:
          1. NGramTokenizerFactory
          2. NGramTokenizer
          3. NGramTokenizerTest (all tests pass)

          I *think* the above can be configured in schema.xml as follows:

              <fieldtype name="wordField" class="solr.TextField">
                <analyzer>
                  <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                  <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
                  <filter class="solr.LowerCaseFilterFactory"/>
                  <filter class="solr.NGramTokenizerFactory"/>
                </analyzer>
              </fieldtype>

          And I *believe* the following fields would have to be defined (to match the fields in Spellchecker.java):
          <field name="word" type="string" indexed="true" stored="true" multiValued="false"/>
          <field name="start1" type="string" indexed="true" stored="true" multiValued="false"/>
          <field name="end1" type="string" indexed="true" stored="true" multiValued="false"/>
          <field name="start2" type="string" indexed="true" stored="true" multiValued="false"/>
          <field name="end2" type="string" indexed="true" stored="true" multiValued="false"/>
          <field name="start3" type="string" indexed="true" stored="true" multiValued="false"/>
          <field name="end3" type="string" indexed="true" stored="true" multiValued="false"/>
          <field name="start4" type="string" indexed="true" stored="true" multiValued="false"/>
          <field name="end4" type="string" indexed="true" stored="true" multiValued="false"/>
          <field name="gram1" type="string" indexed="true" stored="true" multiValued="false"/>
          <field name="gram2" type="string" indexed="true" stored="true" multiValued="false"/>
          <field name="gram3" type="string" indexed="true" stored="true" multiValued="false"/>
          <field name="gram4" type="string" indexed="true" stored="true" multiValued="false"/>

          c.f. http://wiki.apache.org/jakarta-lucene/SpellChecker

          What I'm not sure about is how I'll get Solr to put the right ngrams into the right fields (defined above and also as a set of copyFields).
          For example, if the input (query string) is "pork", my ngrammer may generate the following uni- and bi-gram tokens:

            p o r k po or rk

          The following should then happen:
          word: pork
          start1: p
          start2: po
          gram1: p o r k
          gram2: po or rk
          end1 rk
          end2: rk

          Not sure how to accomplish that...
          ]
          Otis Gospodnetic made changes -
          Field Original Value New Value
          Attachment SOLR-81-ngram.patch [ 12347711 ]
          Otis Gospodnetic created issue -

            People

            • Assignee:
              Unassigned
              Reporter:
              Otis Gospodnetic
            • Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development