[LUCENE-6664] Replace SynonymFilter with SynonymGraphFilter - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 6.4, 7.0
Component/s: None
Labels:
None

Lucene Fields:

New

Description

Spinoff from ~~LUCENE-6582~~.

I created a new SynonymGraphFilter (to replace the current buggy
SynonymFilter), that produces correct graphs (does no "graph
flattening" itself). I think this makes it simpler.

This means you must add the FlattenGraphFilter yourself, if you are
applying synonyms during indexing.

Index-time syn expansion is a necessarily "lossy" graph transformation
when multi-token (input or output) synonyms are applied, because the
index does not store posLength, so there will always be phrase
queries that should match but do not, and then phrase queries that
should not match but do.
http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
goes into detail about this.

However, with this new SynonymGraphFilter, if instead you do synonym
expansion at query time (and don't do the flattening), and you use
TermAutomatonQuery (future: somehow integrated into a query parser),
or maybe just "enumerate all paths and make union of PhraseQuery", you
should get 100% correct matches (not sure about "proper" scoring
though...).

This new syn filter still cannot consume an arbitrary graph.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-6664.patch
20/Dec/16 01:37
153 kB
Michael McCandless
LUCENE-6664.patch
02/Aug/15 09:40
156 kB
Michael McCandless
LUCENE-6664.patch
28/Jul/15 08:07
169 kB
Michael McCandless
LUCENE-6664.patch
26/Jul/15 21:42
154 kB
Michael McCandless
usa.png
26/Jul/15 21:30
36 kB
Michael McCandless
usa_flat.png
26/Jul/15 21:30
32 kB
Michael McCandless
LUCENE-6664.patch
07/Jul/15 09:19
71 kB
Michael McCandless

Issue Links

is blocked by

LUCENE-6721 How to handle back-compat for new graph TokenFilters?

Open

Activity

People

Assignee:: Michael McCandless

Reporter:: Michael McCandless

Votes:: 4 Vote for this issue

Watchers:: 22 Start watching this issue

Dates

Created:: 07/Jul/15 09:11

Updated:: 28/Aug/22 14:38

Resolved:: 22/Dec/16 21:21