Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.8, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Spinoff from LUCENE-5350.

      1. contextInputIteratImpl.patch
        50 kB
        Areek Zillur
      2. LUCENE-5528.patch
        50 kB
        Michael McCandless
      3. LUCENE-5528.patch
        54 kB
        Michael McCandless
      4. LUCENE-5528-1.patch
        82 kB
        Areek Zillur
      5. LUCENE-5528-1.patch
        82 kB
        Areek Zillur
      6. LUCENE-5528-1.patch
        84 kB
        Areek Zillur
      7. LUCENE-5528-1.patch
        85 kB
        Areek Zillur

        Activity

        Hide
        Michael McCandless added a comment -

        Patch, I think it's close.

        I started from Areek's patch on LUCENE-5350, to add context to InputIterator and LookupResult, and then fixed AnalyzingInfixSuggester to index all contexts as StringField, and add MUST clauses to the BooleanQuery at lookup time. I also store the contexts using SSDV field, and return them for each LookupResult.

        Show
        Michael McCandless added a comment - Patch, I think it's close. I started from Areek's patch on LUCENE-5350 , to add context to InputIterator and LookupResult, and then fixed AnalyzingInfixSuggester to index all contexts as StringField, and add MUST clauses to the BooleanQuery at lookup time. I also store the contexts using SSDV field, and return them for each LookupResult.
        Hide
        Michael McCandless added a comment -

        New patch, fixing one issue I hit when I was folding this into http://jirasearch.mikemccandless.com, which was the contexts should all be OR'd together not AND'd (ie, if a suggestion has any of the contexts, it's accepted).

        I think it's ready.

        Show
        Michael McCandless added a comment - New patch, fixing one issue I hit when I was folding this into http://jirasearch.mikemccandless.com , which was the contexts should all be OR'd together not AND'd (ie, if a suggestion has any of the contexts, it's accepted). I think it's ready.
        Hide
        Areek Zillur added a comment -

        I have a patch with contexts() returning BytesRefIterator rather than Set<BytesRef>, along with implementation for the contexts in all the InputIter implementation (except HighFreqInputIter and FileInputIter). Do you think It would make sense to have the context() return a BytesRefIterator instead?

        Show
        Areek Zillur added a comment - I have a patch with contexts() returning BytesRefIterator rather than Set<BytesRef>, along with implementation for the contexts in all the InputIter implementation (except HighFreqInputIter and FileInputIter). Do you think It would make sense to have the context() return a BytesRefIterator instead?
        Hide
        Areek Zillur added a comment - - edited

        Patch with context() and hasContext() implementations of various InputIterator:

        • Added contexts() [returns BytesRefIterator] and hasContexts() APIs in InputIterator
        • Implementation of contexts() in DocumentDictionaryIterator, DocumentValueSourceIterator, BufferedInputInterator, SortedInputIteraor and UnSortedInputIterator
        • Added tests for context() API in DocumentDictionaryIterator, DocumentDictionaryIterator and SortedInputIterator
        • Disable all suggesters from accepting InputIterator with contexts

        NOTE: this patch does not add context to AnalyzingInfixSuggester and co!

        Show
        Areek Zillur added a comment - - edited Patch with context() and hasContext() implementations of various InputIterator: Added contexts() [returns BytesRefIterator] and hasContexts() APIs in InputIterator Implementation of contexts() in DocumentDictionaryIterator, DocumentValueSourceIterator, BufferedInputInterator, SortedInputIteraor and UnSortedInputIterator Added tests for context() API in DocumentDictionaryIterator, DocumentDictionaryIterator and SortedInputIterator Disable all suggesters from accepting InputIterator with contexts NOTE: this patch does not add context to AnalyzingInfixSuggester and co!
        Hide
        Areek Zillur added a comment -

        This patch merges the previous patch and the original patch together. It adds support to contexts() in various InputIterator implementation, along with using the contexts in AnalyzingInfixSuggester (original patch by Michael). The patch changes InputIterator's context from Set<BytesRef> to BytesRefIterator. The lookup() API still uses the Set<BytesRef>. Thoughts?

        Show
        Areek Zillur added a comment - This patch merges the previous patch and the original patch together. It adds support to contexts() in various InputIterator implementation, along with using the contexts in AnalyzingInfixSuggester (original patch by Michael). The patch changes InputIterator's context from Set<BytesRef> to BytesRefIterator. The lookup() API still uses the Set<BytesRef>. Thoughts?
        Hide
        Michael McCandless added a comment -

        Hmm, I think I prefer the simpler Set<BytesRef>? And e.g. one problem with BytesRefIterator is you can only iterate it once (we'd sort of need a BytesRefIterable I guess), which might be a hassle for some suggesters?

        Show
        Michael McCandless added a comment - Hmm, I think I prefer the simpler Set<BytesRef>? And e.g. one problem with BytesRefIterator is you can only iterate it once (we'd sort of need a BytesRefIterable I guess), which might be a hassle for some suggesters?
        Hide
        Areek Zillur added a comment -

        That sounds good. I will change the InputIterator contexts() api to return Set<BytesRef> then. Will get a patch up tomorrow.

        Show
        Areek Zillur added a comment - That sounds good. I will change the InputIterator contexts() api to return Set<BytesRef> then. Will get a patch up tomorrow.
        Hide
        Michael McCandless added a comment -

        I will change the InputIterator contexts() api to return Set<BytesRef> then. Will get a patch up tomorrow.

        Thanks Areek!

        Show
        Michael McCandless added a comment - I will change the InputIterator contexts() api to return Set<BytesRef> then. Will get a patch up tomorrow. Thanks Areek!
        Hide
        Areek Zillur added a comment - - edited

        Updated Patch (LUCENE-5528-1.patch):

        • uses Set<BytesRef> to represent contexts in InputInterator
        Show
        Areek Zillur added a comment - - edited Updated Patch ( LUCENE-5528 -1.patch): uses Set<BytesRef> to represent contexts in InputInterator
        Hide
        Michael McCandless added a comment -

        Thanks Areek, new patch looks great! It's nice that Document/ExpressionDictionary, and the wrapping InputIterators (Sorted/Unsorted) handle contexts as well. I think it's close.

        I noticed a lot of added but unused "import import org.apache.lucene.util.BytesRefIterator;" lines.

        Also, AnalyzingInfixSuggester calls InputIterator.contexts() regardless of whether hasContexts return true or false; I think this is actually OK, but can you update InputIterator's javadocs to state that this is allowed, and contexts() is expected to return null even if hasContexts() had returned false?

        Finally just a small whitespace issue: instead of:

            if(iterator.hasContexts()) {
        

        Can you do this:

            if (iterator.hasContexts()) {
        

        (i.e. just add space between "if" and "("). Thanks!

        Show
        Michael McCandless added a comment - Thanks Areek, new patch looks great! It's nice that Document/ExpressionDictionary, and the wrapping InputIterators (Sorted/Unsorted) handle contexts as well. I think it's close. I noticed a lot of added but unused "import import org.apache.lucene.util.BytesRefIterator;" lines. Also, AnalyzingInfixSuggester calls InputIterator.contexts() regardless of whether hasContexts return true or false; I think this is actually OK, but can you update InputIterator's javadocs to state that this is allowed, and contexts() is expected to return null even if hasContexts() had returned false? Finally just a small whitespace issue: instead of: if(iterator.hasContexts()) { Can you do this: if (iterator.hasContexts()) { (i.e. just add space between "if" and "("). Thanks!
        Hide
        Michael McCandless added a comment -

        BTW I pushed this patch to the jirasearch "production" site (http://jirasearch.mikemccandless.com ) and it seems to be working very well!

        If you drill down on a specific project (Lucene, Solr, Tika) then the suggestions you see are only for that project. This also happens if you type the project name, e.g. "lucene word" shows only lucene issues/users containing word*, and "tika word" shows only tika issues/users with word*

        Nice to eat your own dog food ...

        Show
        Michael McCandless added a comment - BTW I pushed this patch to the jirasearch "production" site ( http://jirasearch.mikemccandless.com ) and it seems to be working very well! If you drill down on a specific project (Lucene, Solr, Tika) then the suggestions you see are only for that project. This also happens if you type the project name, e.g. "lucene word" shows only lucene issues/users containing word*, and "tika word" shows only tika issues/users with word* Nice to eat your own dog food ...
        Hide
        Areek Zillur added a comment -

        Updated patch (LUCENE-5528-1.patch):

        • removed unused imports (my bad!)
        • updated InputIteration.context() docs to record that it can return null
        • minor formatting fixes (as suggested)

        Thanks Michael for the review!
        That is awesome, I played around with the jirasearch to see the functionality for myself.

        Show
        Areek Zillur added a comment - Updated patch ( LUCENE-5528 -1.patch): removed unused imports (my bad!) updated InputIteration.context() docs to record that it can return null minor formatting fixes (as suggested) Thanks Michael for the review! That is awesome, I played around with the jirasearch to see the functionality for myself.
        Hide
        Michael McCandless added a comment -

        Thanks Areek, this looks great! I'll commit soon.

        Show
        Michael McCandless added a comment - Thanks Areek, this looks great! I'll commit soon.
        Hide
        ASF subversion and git services added a comment -

        Commit 1580510 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1580510 ]

        LUCENE-5528: add contexts to AnalyzingInfixSuggester

        Show
        ASF subversion and git services added a comment - Commit 1580510 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1580510 ] LUCENE-5528 : add contexts to AnalyzingInfixSuggester
        Hide
        ASF subversion and git services added a comment -

        Commit 1580517 from Michael McCandless in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1580517 ]

        LUCENE-5528: add contexts to AnalyzingInfixSuggester

        Show
        ASF subversion and git services added a comment - Commit 1580517 from Michael McCandless in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1580517 ] LUCENE-5528 : add contexts to AnalyzingInfixSuggester
        Hide
        ASF subversion and git services added a comment -

        Commit 1580518 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1580518 ]

        LUCENE-5528: remove generics

        Show
        ASF subversion and git services added a comment - Commit 1580518 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1580518 ] LUCENE-5528 : remove generics
        Hide
        Uwe Schindler added a comment -

        Close issue after release of 4.8.0

        Show
        Uwe Schindler added a comment - Close issue after release of 4.8.0
        Hide
        Arcadius Ahouansou added a comment -

        Hello Michael McCandless
        For the AnalazingInfixSuggester, does it make sens to have a new override for the method lookup() with an additional param allContextsRequired , similar to allTermsRequired

         lookup(CharSequence key, Set<BytesRef> contexts, int num, boolean allTermsRequired, boolean doHighlight, boolean allContextsRequired)
        

        which will make it possible to have an "AND" behaviour for the contexts when you have many contexts instead of the default "OR"?

        This means that line
        https://github.com/apache/lucene-solr/blob/trunk/lucene/suggest/src/java/org/apache/lucene/search/suggest/analyzing/AnalyzingInfixSuggester.java#L486
        will now use BooleanClause.Occur.SHOULD or BooleanClause.Occur.MUST depending on the value of the new param allContextsRequired ...

        We currently have a use-case for this.

        Thanks.

        Arcadius.

        Show
        Arcadius Ahouansou added a comment - Hello Michael McCandless For the AnalazingInfixSuggester, does it make sens to have a new override for the method lookup() with an additional param allContextsRequired , similar to allTermsRequired lookup(CharSequence key, Set<BytesRef> contexts, int num, boolean allTermsRequired, boolean doHighlight, boolean allContextsRequired) which will make it possible to have an "AND" behaviour for the contexts when you have many contexts instead of the default "OR"? This means that line https://github.com/apache/lucene-solr/blob/trunk/lucene/suggest/src/java/org/apache/lucene/search/suggest/analyzing/AnalyzingInfixSuggester.java#L486 will now use BooleanClause.Occur.SHOULD or BooleanClause.Occur.MUST depending on the value of the new param allContextsRequired ... We currently have a use-case for this. Thanks. Arcadius.
        Hide
        Arcadius Ahouansou added a comment -

        Another alternative may be to have something like Map<BytesRef,Boolean> instead of Set<BytesRef> where the boolean indicates whether the entry is a SHOULD or a MUST

        Show
        Arcadius Ahouansou added a comment - Another alternative may be to have something like Map<BytesRef,Boolean> instead of Set<BytesRef> where the boolean indicates whether the entry is a SHOULD or a MUST
        Hide
        Michael McCandless added a comment -

        Hi Arcadius Ahouansou, I think it makes sense to give more control over how context is applied; could you open a new issue for this?

        Show
        Michael McCandless added a comment - Hi Arcadius Ahouansou , I think it makes sense to give more control over how context is applied; could you open a new issue for this?
        Hide
        Arcadius Ahouansou added a comment -

        Hi Michael McCandless.
        The new issue is LUCENE-6050
        Thanks.

        Show
        Arcadius Ahouansou added a comment - Hi Michael McCandless . The new issue is LUCENE-6050 Thanks.

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            1 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development