Lucene - Core
  1. Lucene - Core
  2. LUCENE-4481

AnalyzingSuggester may fail to return correct topN suggestions

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.1, Trunk
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I hit this when working on LUCENE-4480.

      Because AnalyzingSuggester may prune some of the topN paths found by FST's Util.TopNSearcher, this means the queue size limit of topN makes the overall search inadmissible, ie it may incorrectly prune paths that would have lead to a competitive path.

      However, such pruning is rare: it happens only for graph token streams, and even then only when competitive analyzed forms share the same surface forms.

      The simplest way to fix this is to make the queue unbounded but this is likely a sizable performance hit ... I haven't tested yet. It's even possible the way the dups happen (always at the "end" of the suggestion, because we tack on 0 byte followed by ord dedup byte) prevent this bug from even occurring and so this could all be a false alarm! I have to try to make a test case showing it ...

      A cop-out solution would be to expose a separate queueSize or queueMultiplier (over the topN) so that if users are affected by this they could crank up the queue size or multiplier.

      1. LUCENE-4481.patch
        2 kB
        Michael McCandless
      2. LUCENE-4481.patch
        5 kB
        Michael McCandless
      3. LUCENE-4481.patch
        2 kB
        Michael McCandless
      4. LUCENE-4481.patch
        12 kB
        Michael McCandless
      5. LUCENE-4481.patch
        10 kB
        Michael McCandless

        Activity

        Steve Rowe made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Michael McCandless made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Michael McCandless made changes -
        Attachment LUCENE-4481.patch [ 12550137 ]
        Michael McCandless made changes -
        Attachment LUCENE-4481.patch [ 12550072 ]
        Michael McCandless made changes -
        Attachment LUCENE-4481.patch [ 12550059 ]
        Michael McCandless made changes -
        Attachment LUCENE-4481.patch [ 12550027 ]
        Michael McCandless made changes -
        Attachment LUCENE-4481.patch [ 12550018 ]
        Michael McCandless made changes -
        Field Original Value New Value
        Assignee Michael McCandless [ mikemccand ]
        Michael McCandless created issue -

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development