[LUCENE-4499] Multi-word synonym filter (synonym expansion) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 4.1, 6.0
Fix Version/s: 6.0
Component/s: core/other
Labels:

Lucene Fields:

New, Patch Available

Description

I apologize for bringing the multi-token synonym expansion up again. There is an old, unresolved issue at ~~LUCENE-1622~~ [1]

While solving the problem for our needs [2], I discovered that the current SolrSynonym parser (and the wonderful FTS) have almost everything to satisfactorily handle both the query and index time synonym expansion. It seems that people often need to use the synonym filter slightly differently at indexing and query time.

In our case, we must do different things during indexing and querying.

Example sentence: Mirrors of the Hubble space telescope pointed at XA5

This is what we need (comma marks position bump):

This translated to following needs:

indexing time:
single-token synonyms => return only synonyms
multi-token synonyms => return original tokens AND the synonyms

query time:
single-token: return only synonyms (but preserve case)
multi-token: return only synonyms

We need the original tokens for the proximity queries, if we indexed 'hubble space telescope'
as one token, we cannot search for 'hubble NEAR telescope'

You may (not) be surprised, but Lucene already supports ALL of these requirements. The patch is an attempt to state the problem differently. I am not sure if it is the best option, however it works perfectly for our needs and it seems it could work for general public too. Especially if the SynonymFilterFactory had a preconfigured sets of SynonymMapBuilders - and people would just choose what situation they use. Please look at the unittest.

links:
[1] https://issues.apache.org/jira/browse/LUCENE-1622
[2] http://labs.adsabs.harvard.edu/trac/ads-invenio/ticket/158
[3] seems to have similar request: http://lucene.472066.n3.nabble.com/Proposal-Full-support-for-multi-word-synonyms-at-query-time-td4000522.html

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-4499.patch
04/Dec/12 20:31
29 kB
Roman Chyla
LUCENE-4499.patch
22/Oct/12 15:15
29 kB
Roman Chyla

Issue Links

is related to

SOLR-4381 Query-time multi-word synonym expansion

Closed

relates to

SOLR-5379 Query-time multi-word synonym expansion

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Roman Chyla

Votes:: 7 Vote for this issue

Watchers:: 19 Start watching this issue

Dates

Created:: 22/Oct/12 15:04

Updated:: 28/Aug/22 13:30

Resolved:: 03/Jan/17 10:50