Lucene - Core
  1. Lucene - Core
  2. LUCENE-4820

Add optional payload to AnalyzingSuggester

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.3, 6.0
    • Component/s: modules/spellchecker
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      It's useful to be able to store custom data (eg maybe a primary key or
      a URL or something) with each suggestion, so that the UI can do things
      like render an image along with each suggestion, or direct to a
      specific URL if the user clicks that suggestion, etc.

      1. LUCENE-4820.patch
        31 kB
        Michael McCandless
      2. LUCENE-4820.patch
        25 kB
        Michael McCandless

        Activity

        Hide
        Michael McCandless added a comment -

        Initial patch, with tests (passing).

        I created a separate TermFreqPayloadIterator extending
        TermFreqIterator, which you pass to AnalyzingSuggester.build, and then
        added payload to LookupResult.

        Show
        Michael McCandless added a comment - Initial patch, with tests (passing). I created a separate TermFreqPayloadIterator extending TermFreqIterator, which you pass to AnalyzingSuggester.build, and then added payload to LookupResult.
        Hide
        Michael McCandless added a comment -

        New patch, fixing the nocommit, added CHANGES. I think it's ready!

        Show
        Michael McCandless added a comment - New patch, fixing the nocommit, added CHANGES. I think it's ready!
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] Michael McCandless
        http://svn.apache.org/viewvc?view=revision&revision=1456095

        LUCENE-4820: add optional payload to Analyzing/FuzzySuggester

        Show
        Commit Tag Bot added a comment - [trunk commit] Michael McCandless http://svn.apache.org/viewvc?view=revision&revision=1456095 LUCENE-4820 : add optional payload to Analyzing/FuzzySuggester
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Michael McCandless
        http://svn.apache.org/viewvc?view=revision&revision=1456098

        LUCENE-4820: add optional payload to Analyzing/FuzzySuggester

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Michael McCandless http://svn.apache.org/viewvc?view=revision&revision=1456098 LUCENE-4820 : add optional payload to Analyzing/FuzzySuggester
        Hide
        Robert Muir added a comment -

        Wouldn't it be better if the payload was an additional FST output (e.g. in the case its an ID or something).

        Its wierd its tacked onto the surface form with INFORMATION_SEPARATOR.

        I havent fully thought about the benefits, but i guess an ID value as a separate output could be used for
        other things, like participating in the top-N comparator to simplify the tie-break logic (which if i recall,
        is super-hairy for this suggester).

        It might also be more efficient as a separate output, but I havent thought about all the cases there either.
        its like if payloads dont exist we could just use the empty string or whatever and there is really not much
        added cost.

        Show
        Robert Muir added a comment - Wouldn't it be better if the payload was an additional FST output (e.g. in the case its an ID or something). Its wierd its tacked onto the surface form with INFORMATION_SEPARATOR. I havent fully thought about the benefits, but i guess an ID value as a separate output could be used for other things, like participating in the top-N comparator to simplify the tie-break logic (which if i recall, is super-hairy for this suggester). It might also be more efficient as a separate output, but I havent thought about all the cases there either. its like if payloads dont exist we could just use the empty string or whatever and there is really not much added cost.
        Hide
        Michael McCandless added a comment -

        Wouldn't it be better if the payload was an additional FST output (e.g. in the case its an ID or something).

        I agree packing surface form + payload into a single output is weirdish ... making it a separate FST output would be nice, but then I dreaded the different generics (sometimes FST<Pair> and other times FST<Triple> or FST<Pair<Pair>>).

        I havent fully thought about the benefits, but i guess an ID value as a separate output could be used for
        other things, like participating in the top-N comparator to simplify the tie-break logic (which if i recall,
        is super-hairy for this suggester).

        True! Though, we could do the same thing w/ the current approach ... eg break out the payload and let the app check it for accept / reject (somewhere there is a patch to expose accept method in AnalyzingSuggester...).

        It might also be more efficient as a separate output, but I havent thought about all the cases there either.
        its like if payloads dont exist we could just use the empty string or whatever and there is really not much
        added cost.

        Yeah it could be. Hmm, every arc would add a byte (mostly, sometimes more I guess if payload is longish), but if the payloads can be shared well, then the overall FST could be smaller ...

        I agree we should explore it!

        Show
        Michael McCandless added a comment - Wouldn't it be better if the payload was an additional FST output (e.g. in the case its an ID or something). I agree packing surface form + payload into a single output is weirdish ... making it a separate FST output would be nice, but then I dreaded the different generics (sometimes FST<Pair> and other times FST<Triple> or FST<Pair<Pair>>). I havent fully thought about the benefits, but i guess an ID value as a separate output could be used for other things, like participating in the top-N comparator to simplify the tie-break logic (which if i recall, is super-hairy for this suggester). True! Though, we could do the same thing w/ the current approach ... eg break out the payload and let the app check it for accept / reject (somewhere there is a patch to expose accept method in AnalyzingSuggester...). It might also be more efficient as a separate output, but I havent thought about all the cases there either. its like if payloads dont exist we could just use the empty string or whatever and there is really not much added cost. Yeah it could be. Hmm, every arc would add a byte (mostly, sometimes more I guess if payload is longish), but if the payloads can be shared well, then the overall FST could be smaller ... I agree we should explore it!
        Hide
        Steve Rowe added a comment -

        (somewhere there is a patch to expose accept method in AnalyzingSuggester...).

        LUCENE-4517

        Show
        Steve Rowe added a comment - (somewhere there is a patch to expose accept method in AnalyzingSuggester...). LUCENE-4517
        Hide
        Uwe Schindler added a comment -

        Closed after release.

        Show
        Uwe Schindler added a comment - Closed after release.

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development