[SOLR-1279] ApostropheTokenizer - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Minor
Resolution: Won't Fix
Affects Version/s: None
Fix Version/s: None
Component/s: Schema and Analysis
Labels:
None

Description

ApostropheTokenizer creates extra tokens during the analysis stage for the fields containing apostrophes. The reason for adding this is to ensure that documents that differ only by apostrophe have the same relevancy score.

For example, if the document contains string "McDonald's", it will be tokenized as "McDonald's McDonalds". This way when the search is performed against "McDonald's" or "McDonalds" will produce similar score.

This code handles up to two apostrophes in a token.

To use this tokenizer add the following line in schema.xml

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

ApostropheTokenizer.zip
14/Jul/09 18:28
1 kB
Sergey Borisov

Activity

People

Assignee:: Unassigned

Reporter:: Sergey Borisov

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 14/Jul/09 18:26

Updated:: 15/Sep/16 15:15

Resolved:: 15/Sep/16 15:15