[LUCENE-6336] AnalyzingInfixSuggester needs duplicate handling - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 4.10.3, 5.0
Fix Version/s: None
Component/s: None
Labels:
- lookup
- suggester

Lucene Fields:

New

Description

Spinoff from ~~LUCENE-5833~~ but else unrelated.

Using AnalyzingInfixSuggester which is backed by a Lucene index and stores payload and score together with the suggest text.

I did some testing with Solr, producing the DocumentDictionary from an index with multiple documents containing the same text, but with random weights between 0-100. Then I got duplicate identical suggestions sorted by weight:

{
  "suggest":{"languages":{
      "engl":{
        "numFound":101,
        "suggestions":[{
            "term":"<b>Engl</b>ish",
            "weight":100,
            "payload":"0"},
          {
            "term":"<b>Engl</b>ish",
            "weight":99,
            "payload":"0"},
          {
            "term":"<b>Engl</b>ish",
            "weight":98,
            "payload":"0"},
---etc all the way down to 0---

I also reproduced the same behavior in AnalyzingInfixSuggester directly. So there is a need for some duplicate removal here, either while building the local suggest index or during lookup. Only the highest weight suggestion for a given term should be returned.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-6336.patch
04/Mar/15 13:19
2 kB
Jan Høydahl

Activity

People

Assignee:: Unassigned

Reporter:: Jan Høydahl

Votes:: 9 Vote for this issue

Watchers:: 17 Start watching this issue

Dates

Created:: 04/Mar/15 13:17

Updated:: 28/Aug/22 14:28