Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 4.9, Trunk
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      It would be nice to have a Context Aware Suggester (i.e. a suggester that could return suggestions depending on some specified context(s)).

      Use-cases:

      • location-based suggestions:
        • returns suggestions which 'match' the context of a particular area
          • suggest restaurants names which are in Palo Alto (context -> Palo Alto)
      • category-based suggestions:
        • returns suggestions for items that are only in certain categories/genres (contexts)
          • suggest movies that are of the genre sci-fi and adventure (context -> [sci-fi, adventure])
      1. LUCENE-5350-benchmark.patch
        2.62 MB
        Areek Zillur
      2. LUCENE-5350-benchmark.patch
        2.62 MB
        Areek Zillur
      3. LUCENE-5350.patch
        49 kB
        Areek Zillur
      4. LUCENE-5350.patch
        50 kB
        Areek Zillur

        Activity

        Areek Zillur created issue -
        Areek Zillur made changes -
        Field Original Value New Value
        Description It would be nice to have a Context Aware Suggester (i.e. a suggester that could return suggestions depending on some specified context(s)).

        Use-cases:
          - location-based suggestions:
              - returns suggestions which 'match' the context of a particular area
                  - suggest restaurants names which are in Palo Alto (context -> Palo Alto)
          - category-based suggestions:
              - returns suggestions for items that are only in certain categories/genres (contexts)
                  - suggest movies that are of the genre sci-fi and adventure (context -> [sci-fi, adventure])
        It would be nice to have a Context Aware Suggester (i.e. a suggester that could return suggestions depending on some specified context(s)).

        Use-cases:
          - location-based suggestions:
              -- returns suggestions which 'match' the context of a particular area
                  --- suggest restaurants names which are in Palo Alto (context -> Palo Alto)
          - category-based suggestions:
              -- returns suggestions for items that are only in certain categories/genres (contexts)
                  --- suggest movies that are of the genre sci-fi and adventure (context -> [sci-fi, adventure])
        Hide
        Areek Zillur added a comment -

        Initial (rough) Patch:

        • Add contexts and hasContexts to InputIterator
        • Add support to Analyzing suggester to handle contexts
        • Add new ContextAwareSuggester (proxies Analyzing Suggester)
        • Add tests for ContextAwareSuggester

        TODO:

        • The patch "breaks" the Lookup API (I think its better to have LookupOptions that encapsulates the query-time input to suggesters)
        • Add context and hasContexts support to all impl of InputIterator
        • General refactoring
        • Add FuzzySuggester support
        • Add docs

        This patch demonstrates the idea; If the approach makes sense, the appropriate changes to the API will be the next task. Feedback, thoughts welcome! It would also be nice to figure out a way so that we dont have to subclass AnalyzingSuggester to 'use' it.

        Show
        Areek Zillur added a comment - Initial (rough) Patch: Add contexts and hasContexts to InputIterator Add support to Analyzing suggester to handle contexts Add new ContextAwareSuggester (proxies Analyzing Suggester) Add tests for ContextAwareSuggester TODO: The patch "breaks" the Lookup API (I think its better to have LookupOptions that encapsulates the query-time input to suggesters) Add context and hasContexts support to all impl of InputIterator General refactoring Add FuzzySuggester support Add docs This patch demonstrates the idea; If the approach makes sense, the appropriate changes to the API will be the next task. Feedback, thoughts welcome! It would also be nice to figure out a way so that we dont have to subclass AnalyzingSuggester to 'use' it.
        Areek Zillur made changes -
        Attachment LUCENE-5350.patch [ 12616221 ]
        Hide
        Michael McCandless added a comment -

        I think a context aware suggester is a great idea!

        I wonder ... how this approach (stuff the context in front of each suggestion & then build "normally") compares to simply creating N separate suggesters, one per context? Like, I wonder how much better FST compression we get by using a single FST vs N? Lookup wise, it seems like we are just doing N separate lookups so that part should be similar?

        Show
        Michael McCandless added a comment - I think a context aware suggester is a great idea! I wonder ... how this approach (stuff the context in front of each suggestion & then build "normally") compares to simply creating N separate suggesters, one per context? Like, I wonder how much better FST compression we get by using a single FST vs N? Lookup wise, it seems like we are just doing N separate lookups so that part should be similar?
        Hide
        Pradeep added a comment -

        There should be a framework to inject different contexts. Better to think solution around deep learning algorithms. But, this is a very good idea.

        Show
        Pradeep added a comment - There should be a framework to inject different contexts. Better to think solution around deep learning algorithms. But, this is a very good idea.
        Hide
        Areek Zillur added a comment -

        Thanks for the feedback!
        Michael McCandless: I was wondering the same thing as I was implementing it. By intuition I think this should be more compact than the N suggester approach but I think its best to benchmark it against N separate suggesters (will update with the benchmark results). Also had another idea of implementing this by 'filtering' the suggestions by contexts supplied at query-time rather than prefixing the context with the analyzed form. I will play around with both and benchmark it to see whether this would be useful in practice.
        Pradeep: Can you expand on the deep learning algorithm part?

        Show
        Areek Zillur added a comment - Thanks for the feedback! Michael McCandless : I was wondering the same thing as I was implementing it. By intuition I think this should be more compact than the N suggester approach but I think its best to benchmark it against N separate suggesters (will update with the benchmark results). Also had another idea of implementing this by 'filtering' the suggestions by contexts supplied at query-time rather than prefixing the context with the analyzed form. I will play around with both and benchmark it to see whether this would be useful in practice. Pradeep : Can you expand on the deep learning algorithm part?
        Hide
        Areek Zillur added a comment - - edited

        Uploaded benchmark code and data for ContextAware suggester. It compares a single ContextAwareSuggester with N analyzing suggester (each with different context).
        The original dataset was taken from http://snap.stanford.edu/data/web-Reddit.html. The processed dataset contains only three fields:

        • title (suggestion)
        • subreddit (context)
        • unixtime (weight)
        Show
        Areek Zillur added a comment - - edited Uploaded benchmark code and data for ContextAware suggester. It compares a single ContextAwareSuggester with N analyzing suggester (each with different context). The original dataset was taken from http://snap.stanford.edu/data/web-Reddit.html . The processed dataset contains only three fields: title (suggestion) subreddit (context) unixtime (weight)
        Areek Zillur made changes -
        Attachment LUCENE-5350-benchmark.patch [ 12617551 ]
        Hide
        Areek Zillur added a comment - - edited

        The Benchmark results (similar format to LookupBenchmark results) were as follows:

        -- input stats
        Input size: 53022, numContexts: 2666, Avg. Input/Context: 20
        
        -- construction time
        AnalyzingSuggester input: 53022, time[ms]: 10140 [+- 149.13]
        ContextAwareAnalyzingSuggester input: 53022, time[ms]: 1683 [+- 17.54]
        
        -- RAM consumption
        AnalyzingSuggester size[B]:    4,675,508
        ContextAwareAnalyzingSuggester size[B]:    4,837,187
        
        -- prefixes: 6-9, num: 7, onlyMorePopular: false
        ContextAwareAnalyzingSuggester queries: 53022, time[ms]: 1277 [+- 14.67], ~kQPS: 42
        AnalyzingSuggester queries: 53022, time[ms]: 2269 [+- 152.00], ~kQPS: 23
        
        -- prefixes: 2-4, num: 7, onlyMorePopular: false
        ContextAwareAnalyzingSuggester queries: 53022, time[ms]: 2294 [+- 24.02], ~kQPS: 23
        AnalyzingSuggester queries: 53022, time[ms]: 4947 [+- 90.36], ~kQPS: 11
        
        -- prefixes: 100-200, num: 7, onlyMorePopular: false
        ContextAwareAnalyzingSuggester queries: 53022, time[ms]: 1177 [+- 32.13], ~kQPS: 45
        AnalyzingSuggester queries: 53022, time[ms]: 935 [+- 11.53], ~kQPS: 57
        

        From the results it seems
        ContextAwaresuggester compared to AnalyzingSuggester:

        • is 6 times faster for construction
        • consumes ~3% more RAM (duplication in context [prefix and in payload]?)
        • has ~ 2 times the QPS [for prefixes 6-9 & 2-4]
        • has 20% less QPS [for prefix 100-200]

        It is to be noted that the dataset only contains terms with one context (hence the benchmark does not take into account terms with multiple contexts).

        This was an interesting benchmark, thoughts?

        Show
        Areek Zillur added a comment - - edited The Benchmark results (similar format to LookupBenchmark results) were as follows: -- input stats Input size: 53022, numContexts: 2666, Avg. Input/Context: 20 -- construction time AnalyzingSuggester input: 53022, time[ms]: 10140 [+- 149.13] ContextAwareAnalyzingSuggester input: 53022, time[ms]: 1683 [+- 17.54] -- RAM consumption AnalyzingSuggester size[B]: 4,675,508 ContextAwareAnalyzingSuggester size[B]: 4,837,187 -- prefixes: 6-9, num: 7, onlyMorePopular: false ContextAwareAnalyzingSuggester queries: 53022, time[ms]: 1277 [+- 14.67], ~kQPS: 42 AnalyzingSuggester queries: 53022, time[ms]: 2269 [+- 152.00], ~kQPS: 23 -- prefixes: 2-4, num: 7, onlyMorePopular: false ContextAwareAnalyzingSuggester queries: 53022, time[ms]: 2294 [+- 24.02], ~kQPS: 23 AnalyzingSuggester queries: 53022, time[ms]: 4947 [+- 90.36], ~kQPS: 11 -- prefixes: 100-200, num: 7, onlyMorePopular: false ContextAwareAnalyzingSuggester queries: 53022, time[ms]: 1177 [+- 32.13], ~kQPS: 45 AnalyzingSuggester queries: 53022, time[ms]: 935 [+- 11.53], ~kQPS: 57 From the results it seems ContextAwaresuggester compared to AnalyzingSuggester: is 6 times faster for construction consumes ~3% more RAM (duplication in context [prefix and in payload] ?) has ~ 2 times the QPS [for prefixes 6-9 & 2-4] has 20% less QPS [for prefix 100-200] It is to be noted that the dataset only contains terms with one context (hence the benchmark does not take into account terms with multiple contexts). This was an interesting benchmark, thoughts?
        Hide
        Areek Zillur added a comment -

        Minor lookup optimization (in case of a single context)

        Show
        Areek Zillur added a comment - Minor lookup optimization (in case of a single context)
        Areek Zillur made changes -
        Attachment LUCENE-5350.patch [ 12617627 ]
        Hide
        Areek Zillur added a comment -

        Fixed benchmark code

        Show
        Areek Zillur added a comment - Fixed benchmark code
        Areek Zillur made changes -
        Attachment LUCENE-5350-benchmark.patch [ 12617628 ]
        Hide
        Areek Zillur added a comment -

        Disregard the previous benchmark stats. There was a bug in how the keys were used for building the suggester (hence showed the unexplainable QPS).
        The updated benchmark results is as follows:

        -- Input stats
        Input size: 53022, numContexts: 2666, Avg. Input/Context: 20
        
        -- prefixes: 2-4, num: 7, onlyMorePopular: false
        ContextAwareAnalyzingSuggester queries: 53022, time[ms]: 2630 [+- 124.14], ~kQPS: 20
        AnalyzingSuggester queries: 53022, time[ms]: 2249 [+- 25.16], ~kQPS: 24
        
        -- RAM consumption
        AnalyzingSuggester size[B]:    4,767,705
        ContextAwareAnalyzingSuggester size[B]:    4,837,187
        
        -- construction time
        AnalyzingSuggester input: 53022, time[ms]: 10184 [+- 207.64]
        ContextAwareAnalyzingSuggester input: 53022, time[ms]: 1831 [+- 81.89]
        
        -- prefixes: 6-9, num: 7, onlyMorePopular: false
        ContextAwareAnalyzingSuggester queries: 53022, time[ms]: 1457 [+- 163.04], ~kQPS: 36
        AnalyzingSuggester queries: 53022, time[ms]: 1140 [+- 28.59], ~kQPS: 47
        
        -- prefixes: 100-200, num: 7, onlyMorePopular: false
        ContextAwareAnalyzingSuggester queries: 53022, time[ms]: 1276 [+- 58.97], ~kQPS: 42
        AnalyzingSuggester queries: 53022, time[ms]: 1004 [+- 81.69], ~kQPS: 53
        

        From the above benchmarks, it seems the only improvement for the new suggester is in the construction time. The QPS for all three cases seems to be ~20% less and the RAM usage is ~3% more.

        Show
        Areek Zillur added a comment - Disregard the previous benchmark stats. There was a bug in how the keys were used for building the suggester (hence showed the unexplainable QPS). The updated benchmark results is as follows: -- Input stats Input size: 53022, numContexts: 2666, Avg. Input/Context: 20 -- prefixes: 2-4, num: 7, onlyMorePopular: false ContextAwareAnalyzingSuggester queries: 53022, time[ms]: 2630 [+- 124.14], ~kQPS: 20 AnalyzingSuggester queries: 53022, time[ms]: 2249 [+- 25.16], ~kQPS: 24 -- RAM consumption AnalyzingSuggester size[B]: 4,767,705 ContextAwareAnalyzingSuggester size[B]: 4,837,187 -- construction time AnalyzingSuggester input: 53022, time[ms]: 10184 [+- 207.64] ContextAwareAnalyzingSuggester input: 53022, time[ms]: 1831 [+- 81.89] -- prefixes: 6-9, num: 7, onlyMorePopular: false ContextAwareAnalyzingSuggester queries: 53022, time[ms]: 1457 [+- 163.04], ~kQPS: 36 AnalyzingSuggester queries: 53022, time[ms]: 1140 [+- 28.59], ~kQPS: 47 -- prefixes: 100-200, num: 7, onlyMorePopular: false ContextAwareAnalyzingSuggester queries: 53022, time[ms]: 1276 [+- 58.97], ~kQPS: 42 AnalyzingSuggester queries: 53022, time[ms]: 1004 [+- 81.69], ~kQPS: 53 From the above benchmarks, it seems the only improvement for the new suggester is in the construction time. The QPS for all three cases seems to be ~20% less and the RAM usage is ~3% more.
        Hide
        Areek Zillur added a comment -

        Any thoughts on this suggester, in light of the benchmark results? I still think it might be useful to be used as a replacement for managing multiple suggesters (corresponding to each 'context')?

        Show
        Areek Zillur added a comment - Any thoughts on this suggester, in light of the benchmark results? I still think it might be useful to be used as a replacement for managing multiple suggesters (corresponding to each 'context')?
        Hide
        Michael McCandless added a comment -

        I'm still concerned about the code complexity added by pushing the context "down low", and given that lookup performance for context "up high" (N AnalyzingSuggesters, one per context) is a bit faster, it seems best overall to do it up high?

        I do think this is useful functionality to have; maybe we could have an "up high" impl, sugar, that just wraps the N suggesters under the hood?

        Show
        Michael McCandless added a comment - I'm still concerned about the code complexity added by pushing the context "down low", and given that lookup performance for context "up high" (N AnalyzingSuggesters, one per context) is a bit faster, it seems best overall to do it up high? I do think this is useful functionality to have; maybe we could have an "up high" impl, sugar, that just wraps the N suggesters under the hood?
        Hide
        Michael McCandless added a comment -

        I think it would be simple to add context to AnalyzingInfixSuggester ... it'd just become a DOCS_ONLY field on each document.

        Maybe we could break this issue up, e.g. add the "contexts" to the InputIterator as a separate issue? Then we can separately address context-awareness for each of our suggester impls?

        Show
        Michael McCandless added a comment - I think it would be simple to add context to AnalyzingInfixSuggester ... it'd just become a DOCS_ONLY field on each document. Maybe we could break this issue up, e.g. add the "contexts" to the InputIterator as a separate issue? Then we can separately address context-awareness for each of our suggester impls?
        Hide
        Areek Zillur added a comment -

        That sounds great! I will work towards that, It would be much more straightforward to add context to AnalyzingInfixSuggester as a start.

        Show
        Areek Zillur added a comment - That sounds great! I will work towards that, It would be much more straightforward to add context to AnalyzingInfixSuggester as a start.
        Hide
        Michael McCandless added a comment -

        Thanks Areek!

        I have a good use case to play with, too: I want to fix http://jirasearch.mikemccandless.com suggestions so that if you've drilled-down on a particular project, it suggest issues from that project only.

        Show
        Michael McCandless added a comment - Thanks Areek! I have a good use case to play with, too: I want to fix http://jirasearch.mikemccandless.com suggestions so that if you've drilled-down on a particular project, it suggest issues from that project only.
        David Smiley made changes -
        Fix Version/s 4.8 [ 12326269 ]
        Fix Version/s 4.7 [ 12325572 ]
        Hide
        Uwe Schindler added a comment -

        Move issue to Lucene 4.9.

        Show
        Uwe Schindler added a comment - Move issue to Lucene 4.9.
        Uwe Schindler made changes -
        Fix Version/s 4.9 [ 12326730 ]
        Fix Version/s 4.8 [ 12326269 ]

          People

          • Assignee:
            Unassigned
            Reporter:
            Areek Zillur
          • Votes:
            2 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

            • Created:
              Updated:

              Development