Lucene - Core
  1. Lucene - Core
  2. LUCENE-5294

Suggester Dictionary implementation that takes expressions as term weights

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.6, Trunk
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      It would be nice to have a Suggester Dictionary implementation that could compute the weights of the terms consumed by the suggester based on an user-defined expression (using lucene's expression module).

      It could be an extension of the existing DocumentDictionary (which takes terms, weights and (optionally) payloads from the stored documents in the index). The only exception being that instead of taking the weights for the terms from the specified weight fields, it could compute the weights using an user-defn expression, that uses one or more NumicDocValuesField from the document.

      Example:
      let the document have

      • product_id
      • product_name
      • product_popularity
      • product_profit

      Then this implementation could be used with an expression of "0.2*product_popularity + 0.8*product_profit" to determine the weights of the terms for the corresponding documents (optionally along with a payload (product_id))

      1. LUCENE-5294.patch
        18 kB
        Areek Zillur

        Activity

        Hide
        Areek Zillur added a comment -

        Initial patch

        • implements DocumentExpressionDictionary
        • added tests
        Show
        Areek Zillur added a comment - Initial patch implements DocumentExpressionDictionary added tests
        Hide
        Michael McCandless added a comment -

        This patch looks great; "ant precommit" was angry about a few missing javadocs. I'll add them and commit. Thanks Areek!

        Show
        Michael McCandless added a comment - This patch looks great; "ant precommit" was angry about a few missing javadocs. I'll add them and commit. Thanks Areek!
        Hide
        Michael McCandless added a comment -

        I put the wrong issue (LUCENE-4998) in the commit log so the commits are on that issue ...

        Thanks Areek!

        Show
        Michael McCandless added a comment - I put the wrong issue ( LUCENE-4998 ) in the commit log so the commits are on that issue ... Thanks Areek!
        Hide
        ASF subversion and git services added a comment -

        Commit 1533820 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1533820 ]

        LUCENE-5294: try to fix maven build

        Show
        ASF subversion and git services added a comment - Commit 1533820 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1533820 ] LUCENE-5294 : try to fix maven build
        Hide
        ASF subversion and git services added a comment -

        Commit 1533822 from Michael McCandless in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1533822 ]

        LUCENE-5294: try to fix maven build

        Show
        ASF subversion and git services added a comment - Commit 1533822 from Michael McCandless in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1533822 ] LUCENE-5294 : try to fix maven build
        Hide
        Areek Zillur added a comment -

        Sorry for not running the "ant precommit" before uploading the patch (will do that from next time!). Thanks for fixing it and committing the patch.
        Had a couple of questions though.

        • Does it make sense for the new dictionary implementation to support CompositeReader? (I actually do not know anything about it except for the fact that it has multiple leaves). I was thinking whether its worth it to add support.
        • What are your thoughts on having a DocumentDictionary setting that will collect terms from documents for documents that has all the required fields and ignore the others (rather than erroring out)? Is that too much flexibility?

        Thanks again for your quick responses!

        Show
        Areek Zillur added a comment - Sorry for not running the "ant precommit" before uploading the patch (will do that from next time!). Thanks for fixing it and committing the patch. Had a couple of questions though. Does it make sense for the new dictionary implementation to support CompositeReader? (I actually do not know anything about it except for the fact that it has multiple leaves). I was thinking whether its worth it to add support. What are your thoughts on having a DocumentDictionary setting that will collect terms from documents for documents that has all the required fields and ignore the others (rather than erroring out)? Is that too much flexibility? Thanks again for your quick responses!
        Hide
        Michael McCandless added a comment -

        Does it make sense for the new dictionary implementation to support CompositeReader?

        A CompositeReader is the common case, i.e. and index that has multiple segments ... I think we should support it?

        The easiest way is to just wrap the incoming reader using SlowCompositeReaderWrapper.wrap. However, this adds some unnecessary cost, because on each NDV lookup, there is a binary search to locate the right sub-reader. In fact, we are already paying this cost in DocumentInputIterator when we use liveDoc (MultiFields.getLiveDocs). But, I suspect in the grand scheme of things this cost is relatively minor, and a suggester is built once and used many times, so we may just want do to this option.

        The other option is to pull the leaves and step through them yourself; I guess you'd need to fix DocumentInputIterator to go segment by segment instead.

        What are your thoughts on having a DocumentDictionary setting that will collect terms from documents for documents that has all the required fields and ignore the others (rather than erroring out)? Is that too much flexibility?

        Sure, we could add such leniency? We could even just make the whole thing lenient (i.e., no separate setting)?

        Show
        Michael McCandless added a comment - Does it make sense for the new dictionary implementation to support CompositeReader? A CompositeReader is the common case, i.e. and index that has multiple segments ... I think we should support it? The easiest way is to just wrap the incoming reader using SlowCompositeReaderWrapper.wrap. However, this adds some unnecessary cost, because on each NDV lookup, there is a binary search to locate the right sub-reader. In fact, we are already paying this cost in DocumentInputIterator when we use liveDoc (MultiFields.getLiveDocs). But, I suspect in the grand scheme of things this cost is relatively minor, and a suggester is built once and used many times, so we may just want do to this option. The other option is to pull the leaves and step through them yourself; I guess you'd need to fix DocumentInputIterator to go segment by segment instead. What are your thoughts on having a DocumentDictionary setting that will collect terms from documents for documents that has all the required fields and ignore the others (rather than erroring out)? Is that too much flexibility? Sure, we could add such leniency? We could even just make the whole thing lenient (i.e., no separate setting)?
        Hide
        Areek Zillur added a comment -

        Thanks Michael for the response!

        • I will open up a jira to add support to CompositeReader for the Dictionary implementation (and hopefully change the DocumentInputIterator to go segment by segment)
        • Will also make the DocumentDictionary more lenient.
          I hope to expose these Dictionary implementations to solr soon.
        Show
        Areek Zillur added a comment - Thanks Michael for the response! I will open up a jira to add support to CompositeReader for the Dictionary implementation (and hopefully change the DocumentInputIterator to go segment by segment) Will also make the DocumentDictionary more lenient. I hope to expose these Dictionary implementations to solr soon.
        Hide
        ASF subversion and git services added a comment -

        Commit 1534430 from Steve Rowe in branch 'dev/trunk'
        [ https://svn.apache.org/r1534430 ]

        LUCENE-5294: simmer down, validate-maven-dependencies

        Show
        ASF subversion and git services added a comment - Commit 1534430 from Steve Rowe in branch 'dev/trunk' [ https://svn.apache.org/r1534430 ] LUCENE-5294 : simmer down, validate-maven-dependencies
        Hide
        ASF subversion and git services added a comment -

        Commit 1534432 from Steve Rowe in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1534432 ]

        LUCENE-5294: simmer down, validate-maven-dependencies (merged trunk r1534430)

        Show
        ASF subversion and git services added a comment - Commit 1534432 from Steve Rowe in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1534432 ] LUCENE-5294 : simmer down, validate-maven-dependencies (merged trunk r1534430)
        Hide
        ASF subversion and git services added a comment -

        Commit 1535797 from Steve Rowe in branch 'dev/trunk'
        [ https://svn.apache.org/r1535797 ]

        LUCENE-5294: IntelliJ config

        Show
        ASF subversion and git services added a comment - Commit 1535797 from Steve Rowe in branch 'dev/trunk' [ https://svn.apache.org/r1535797 ] LUCENE-5294 : IntelliJ config
        Hide
        ASF subversion and git services added a comment -

        Commit 1535798 from Steve Rowe in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1535798 ]

        LUCENE-5294: IntelliJ config (merged trunk r1535797)

        Show
        ASF subversion and git services added a comment - Commit 1535798 from Steve Rowe in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1535798 ] LUCENE-5294 : IntelliJ config (merged trunk r1535797)
        Hide
        ASF subversion and git services added a comment -

        Commit 1554409 from Michael McCandless in branch 'dev/branches/lucene5376'
        [ https://svn.apache.org/r1554409 ]

        LUCENE-5294, LUCENE-5376: in Lucene demo server, support building suggester where weight is an expression

        Show
        ASF subversion and git services added a comment - Commit 1554409 from Michael McCandless in branch 'dev/branches/lucene5376' [ https://svn.apache.org/r1554409 ] LUCENE-5294 , LUCENE-5376 : in Lucene demo server, support building suggester where weight is an expression

          People

          • Assignee:
            Unassigned
            Reporter:
            Areek Zillur
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development