[SOLR-319] changes SynonymFilterFactory to "Analyze" synonyms file - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.3
Component/s: None
Labels:
None

Description

WHAT:
Currently, SynonymFilterFactory works very well with N-gram tokenizer (CJKTokenizer, for example).
But we have to take care of the statement in synonyms.txt.
For example, if I use CJKTokenizer (work as bi-gram for CJK chars) and want C1C2C3 maps to C4C5C6,
I have to write the rule as follows:

C1C2 C2C3 => C4C5 C5C6

But I want to write it "C1C2C3=>C4C5C6". This patch allows it. It is also helpful for sharing synonyms.txt.

HOW:
tokenFactory attribute is added to <filter class="solr.SynonymFilterFactory"/>.
If the attribute is specified, SynonymFilterFactory uses the TokenizerFactory to create Tokenizer.
Then SynonymFilterFactory uses the Tokenizer to get tokens from the rules in synonyms.txt file.

sample-1: CJKTokenizer

sample-2: NGramTokenizer

backward compatibility:
Yes. If you omit tokenFactory attribute from <filter class="solr.SynonymFilterFactory"/> tag, it works as usual.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

SOLR-319.patch
13/Sep/07 02:47
17 kB
Koji Sekiguchi
SOLR-319.patch
26/Nov/07 09:04
15 kB
Koji Sekiguchi
SOLR-319.patch
26/Jan/08 15:16
15 kB
Koji Sekiguchi

Activity

People

Assignee:: Koji Sekiguchi

Reporter:: Koji Sekiguchi

Votes:: 2 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 26/Jul/07 03:04

Updated:: 10/May/13 10:41

Resolved:: 19/May/08 13:40