Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-1279

ApostropheTokenizer

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Minor
    • Resolution: Won't Fix
    • None
    • None
    • Schema and Analysis
    • None

    Description

      ApostropheTokenizer creates extra tokens during the analysis stage for the fields containing apostrophes. The reason for adding this is to ensure that documents that differ only by apostrophe have the same relevancy score.

      For example, if the document contains string "McDonald's", it will be tokenized as "McDonald's McDonalds". This way when the search is performed against "McDonald's" or "McDonalds" will produce similar score.

      This code handles up to two apostrophes in a token.

      To use this tokenizer add the following line in schema.xml

      <analyzer type="index">
      <filter class="org.apache.lucene.analysis.ApostropheTokenFactory"/>
      ...
      </analyzer>

      Attachments

        1. ApostropheTokenizer.zip
          1 kB
          Sergey Borisov

        Activity

          People

            Unassigned Unassigned
            sborisov Sergey Borisov
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: