[SOLR-4381] Query-time multi-word synonym expansion - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Won't Fix
Affects Version/s: None
Fix Version/s: None
Component/s: query parsers
Labels:

Description

This is an issue that seems to come up perennially.

The Solr docs caution that index-time synonym expansion should be preferred to query-time synonym expansion, due to the way multi-word synonyms are treated and how IDF values can be boosted artificially. But query-time expansion should have huge benefits, given that changes to the synonyms don't require re-indexing, the index size stays the same, and the IDF values for the documents don't get permanently altered.

The proposed solution is to move the synonym expansion logic from the analysis chain (either query- or index-type) and into a new QueryParser. See the attached patch for an implementation.

The core Lucene functionality is untouched. Instead, the EDismaxQParser is extended, and synonym expansion is done on-the-fly. Queries are parsed into a lattice (i.e. all possible synonym combinations), while individual components of the query are still handled by the EDismaxQParser itself.

It's not an ideal solution by any stretch. But it's nice and self-contained, so it invites experimentation and improvement. And I think it fits in well with the merry band of misfit query parsers, like func and frange.

More details about this solution can be found in this blog post and the Github page for the code.

At the risk of tooting my own horn, I also think this patch sufficiently fixes ~~SOLR-3390~~ (highlighting problems with multi-word synonyms) and ~~LUCENE-4499~~ (better support for multi-word synonyms).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

SOLR-4381.patch
29/Jan/13 23:19
27 kB
Nolan Lawson
SOLR-4381-2.patch
30/Jan/13 21:18
27 kB
Nolan Lawson

Issue Links

is duplicated by

SOLR-5379 Query-time multi-word synonym expansion

Closed

relates to

LUCENE-2605 queryparser parses on whitespace

Patch Available

LUCENE-1622 Multi-word synonym filter (synonym expansion at indexing time).

Resolved

SOLR-9185 Solr's edismax and "Lucene"/standard query parsers should optionally not split on whitespace before sending terms to analysis

Closed

LUCENE-4499 Multi-word synonym filter (synonym expansion)

Resolved

SOLR-5379 Query-time multi-word synonym expansion

Closed

(1 relates to)

Activity

People

Assignee:: Unassigned

Reporter:: Nolan Lawson

Votes:: 20 Vote for this issue

Watchers:: 27 Start watching this issue

Dates

Created:: 29/Jan/13 23:18

Updated:: 19/Apr/17 19:11

Resolved:: 19/Apr/17 19:11