Solr
  1. Solr
  2. SOLR-7639

Bring MLTQParser at par with the MLT Handler w.r.t supported options

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.3
    • Component/s: None
    • Labels:
      None

      Description

      As of now, there are options that the MLT Handler supports which the QParser doesn't. It would be good to have the QParser tap into everything that's supported.

      1. SOLR-7639.patch
        7 kB
        Anshum Gupta
      2. SOLR-7639.patch
        7 kB
        Anshum Gupta
      3. SOLR-7639-add-boost-and-exclude-current.patch
        7 kB
        Jens Wille
      4. SOLR-7639-add-boost-and-exclude-current.patch
        3 kB
        Jens Wille

        Activity

        Hide
        Anshum Gupta added a comment -

        Patch, without any tests. Will add some tests.

        Show
        Anshum Gupta added a comment - Patch, without any tests. Will add some tests.
        Hide
        ASF subversion and git services added a comment -

        Commit 1686123 from Anshum Gupta in branch 'dev/trunk'
        [ https://svn.apache.org/r1686123 ]

        SOLR-7639: MoreLikeThis QParser now supports all options provided by the MLT Handler i.e. mintf, mindf, minwl, maxwl, maxqt, and maxntp. This commit also fixes an NPE issue in CloudMLTQParser

        Show
        ASF subversion and git services added a comment - Commit 1686123 from Anshum Gupta in branch 'dev/trunk' [ https://svn.apache.org/r1686123 ] SOLR-7639 : MoreLikeThis QParser now supports all options provided by the MLT Handler i.e. mintf, mindf, minwl, maxwl, maxqt, and maxntp. This commit also fixes an NPE issue in CloudMLTQParser
        Hide
        ASF subversion and git services added a comment -

        Commit 1686129 from Anshum Gupta in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1686129 ]

        SOLR-7639: MoreLikeThis QParser now supports all options provided by the MLT Handler i.e. mintf, mindf, minwl, maxwl, maxqt, and maxntp. This commit also fixes an NPE issue in CloudMLTQParser (merge from trunk)

        Show
        ASF subversion and git services added a comment - Commit 1686129 from Anshum Gupta in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1686129 ] SOLR-7639 : MoreLikeThis QParser now supports all options provided by the MLT Handler i.e. mintf, mindf, minwl, maxwl, maxqt, and maxntp. This commit also fixes an NPE issue in CloudMLTQParser (merge from trunk)
        Hide
        Jens Wille added a comment - - edited

        Thanks for improving MLTQParser! Unfortunately, this patch contains an error: maxqt should use setMaxQueryTerms and maxntp should use setMaxNumTokensParsed. Also, what about maxdf (setMaxDocFreq) and boost (setBoost)?

        And wouldn't it make sense to use the defaults from org.apache.lucene.queries.mlt.MoreLikeThis just like MoreLikeThisHandler does?

        I can provide patches for most of these (not sure about boosting; seems a bit more involved). Should I add them here or open new ticket(s)?

        Show
        Jens Wille added a comment - - edited Thanks for improving MLTQParser! Unfortunately, this patch contains an error: maxqt should use setMaxQueryTerms and maxntp should use setMaxNumTokensParsed . Also, what about maxdf ( setMaxDocFreq ) and boost ( setBoost )? And wouldn't it make sense to use the defaults from org.apache.lucene.queries.mlt.MoreLikeThis just like MoreLikeThisHandler does? I can provide patches for most of these (not sure about boosting; seems a bit more involved). Should I add them here or open new ticket(s)?
        Hide
        Jens Wille added a comment -

        It should also exclude the current document from the results like MoreLikeThisHandler does.

        Show
        Jens Wille added a comment - It should also exclude the current document from the results like MoreLikeThisHandler does.
        Hide
        ASF subversion and git services added a comment -

        Commit 1687088 from Anshum Gupta in branch 'dev/trunk'
        [ https://svn.apache.org/r1687088 ]

        SOLR-7639: Fixing maxqt, and maxntp, and adding support for maxdf to MLTQParser

        Show
        ASF subversion and git services added a comment - Commit 1687088 from Anshum Gupta in branch 'dev/trunk' [ https://svn.apache.org/r1687088 ] SOLR-7639 : Fixing maxqt, and maxntp, and adding support for maxdf to MLTQParser
        Hide
        ASF subversion and git services added a comment -

        Commit 1687089 from Anshum Gupta in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1687089 ]

        SOLR-7639: Fixing maxqt, and maxntp, and adding support for maxdf to MLTQParser(merge from trunk)

        Show
        ASF subversion and git services added a comment - Commit 1687089 from Anshum Gupta in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1687089 ] SOLR-7639 : Fixing maxqt, and maxntp, and adding support for maxdf to MLTQParser(merge from trunk)
        Hide
        Anshum Gupta added a comment -

        Thanks for noticing and bringing that up Jens. My bad, a test would've caught it. I'll add some.
        I've fixed maxqt, maxntp, and added maxdf.

        About the defaults from MoreLikeThis.java, I've changed that just to work around the fact that the term/doc stats are local and not global by default.

        Feel free to post patches to an issue if it isn't released yet e.g. this one. Once something is released, the practice is to create a new JIRA.

        Show
        Anshum Gupta added a comment - Thanks for noticing and bringing that up Jens. My bad, a test would've caught it. I'll add some. I've fixed maxqt, maxntp, and added maxdf. About the defaults from MoreLikeThis.java, I've changed that just to work around the fact that the term/doc stats are local and not global by default. Feel free to post patches to an issue if it isn't released yet e.g. this one. Once something is released, the practice is to create a new JIRA.
        Hide
        Jens Wille added a comment -

        About the defaults from MoreLikeThis.java, I've changed that just to work around the fact that the term/doc stats are local and not global by default.

        I'm sorry, but I don't understand the latter part. What I meant was

        mlt.setMinTermFreq(localParams.getInt(MoreLikeThisParams.MIN_TERM_FREQ, MoreLikeThis.DEFAULT_MIN_TERM_FREQ));
        

        etc. (see SOLR-7639-use-defaults.patch). Exactly like MoreLikeThisHandler does it.

        Show
        Jens Wille added a comment - About the defaults from MoreLikeThis.java, I've changed that just to work around the fact that the term/doc stats are local and not global by default. I'm sorry, but I don't understand the latter part. What I meant was mlt.setMinTermFreq(localParams.getInt(MoreLikeThisParams.MIN_TERM_FREQ, MoreLikeThis.DEFAULT_MIN_TERM_FREQ)); etc. (see SOLR-7639 -use-defaults.patch ). Exactly like MoreLikeThisHandler does it.
        Hide
        Anshum Gupta added a comment -

        It's practically the same thing i.e. not setting it would default it to those values.

        Show
        Anshum Gupta added a comment - It's practically the same thing i.e. not setting it would default it to those values.
        Hide
        Jens Wille added a comment -

        Oh, I didn't know that. Thanks for clearing it up!

        Show
        Jens Wille added a comment - Oh, I didn't know that. Thanks for clearing it up!
        Hide
        Jens Wille added a comment -

        I've attached a patch that adds support for the boost parameter to SimpleMLTQParser and that also excludes the current document from the results. If it looks OK to you, I can try to do the same for CloudMLTQParser.

        Show
        Jens Wille added a comment - I've attached a patch that adds support for the boost parameter to SimpleMLTQParser and that also excludes the current document from the results. If it looks OK to you, I can try to do the same for CloudMLTQParser .
        Hide
        Markus Jelsma added a comment -

        Hi - i have only seen the patch and assume uniqueValue is the docId that is excluded. In case docId's are URL's, are they escaped elsewhere in the code?

        Show
        Markus Jelsma added a comment - Hi - i have only seen the patch and assume uniqueValue is the docId that is excluded. In case docId's are URL's, are they escaped elsewhere in the code?
        Hide
        Upayavira added a comment -

        It appears that the MLTQParser does not support stream.body, which for me is the killer feature of the MLTRequestHandler. The documents I want to use as a query simply aren't in the index.

        I did attempt to write my own query parser a while back, but hit a recently added limitation that a SearchHandler cannot have access to a stream.

        Show
        Upayavira added a comment - It appears that the MLTQParser does not support stream.body, which for me is the killer feature of the MLTRequestHandler. The documents I want to use as a query simply aren't in the index. I did attempt to write my own query parser a while back, but hit a recently added limitation that a SearchHandler cannot have access to a stream.
        Hide
        Markus Jelsma added a comment -

        Hello - why is that? SolrQueryRequest in SearchHandler.handleRequestBody() gives you access to the ContentStream is it not?

        Show
        Markus Jelsma added a comment - Hello - why is that? SolrQueryRequest in SearchHandler.handleRequestBody() gives you access to the ContentStream is it not?
        Hide
        Upayavira added a comment - - edited

        Looking at the history here:

        http://svn.apache.org/viewvc/lucene/dev/branches/branch_5x/solr/core/src/java/org/apache/solr/handler/component/SearchHandler.java?r1=1666876&r2=1668170

        we can see Yonik removed the problematic code:

        {{
        public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception
        {

        • // int sleep = req.getParams().getInt("sleep",0);
        • // if (sleep > 0) {log.error("SLEEPING for " + sleep); Thread.sleep(sleep);}
        • if (req.getContentStreams() != null && req.getContentStreams().iterator().hasNext()) { - throw new SolrException(ErrorCode.BAD_REQUEST, "Search requests cannot accept content streams"); - }

          }}
          This means that the MLTQParser should be able to access the stream in the same way as the MoreLikeThisHandler does.

        Show
        Upayavira added a comment - - edited Looking at the history here: http://svn.apache.org/viewvc/lucene/dev/branches/branch_5x/solr/core/src/java/org/apache/solr/handler/component/SearchHandler.java?r1=1666876&r2=1668170 we can see Yonik removed the problematic code: {{ public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception { // int sleep = req.getParams().getInt("sleep",0); // if (sleep > 0) {log.error("SLEEPING for " + sleep); Thread.sleep(sleep);} if (req.getContentStreams() != null && req.getContentStreams().iterator().hasNext()) { - throw new SolrException(ErrorCode.BAD_REQUEST, "Search requests cannot accept content streams"); - } }} This means that the MLTQParser should be able to access the stream in the same way as the MoreLikeThisHandler does.
        Hide
        Jens Wille added a comment -

        What about the SOLR-7639-add-boost-and-exclude-current patch? Is it acceptable? Should I include it in the main SOLR-7639 patch? Should I also apply those changes to CloudMLTQParser? I'd really like to see this included in 5.3 as well.

        Show
        Jens Wille added a comment - What about the SOLR-7639 -add-boost-and-exclude-current patch? Is it acceptable? Should I include it in the main SOLR-7639 patch? Should I also apply those changes to CloudMLTQParser ? I'd really like to see this included in 5.3 as well.
        Hide
        Jens Wille added a comment -

        I've updated the SOLR-7639-add-boost-and-exclude-current patch for CloudMLTQParser. Anything else I can do?

        Show
        Jens Wille added a comment - I've updated the SOLR-7639 -add-boost-and-exclude-current patch for CloudMLTQParser . Anything else I can do?
        Hide
        Jens Wille added a comment -

        Anshum Gupta I know you're busy right now but with 5.3 just around the corner, could you (or somebody else) please commit the latest SOLR-7639-add-boost-and-exclude-current patch? It would be really great to have it included in this release and make MLTQParser a viable alternative for the MLT Handler. If it's not acceptable in its current form, can you please let me know what the obstacles are so I can try and address them in the next couple of days.

        Show
        Jens Wille added a comment - Anshum Gupta I know you're busy right now but with 5.3 just around the corner, could you (or somebody else) please commit the latest SOLR-7639 -add-boost-and-exclude-current patch? It would be really great to have it included in this release and make MLTQParser a viable alternative for the MLT Handler. If it's not acceptable in its current form, can you please let me know what the obstacles are so I can try and address them in the next couple of days.
        Hide
        Anshum Gupta added a comment -

        I'm really sorry Jens for not noticing this patch and getting it into 5.3 in time. I'll create a new JIRA and add this patch to it there. This patch doesn't have changes for CloudMLTQParser and also doesn't have any tests. Let's get both of that done and I'll make sure I spend time to get this in ASAP.

        Show
        Anshum Gupta added a comment - I'm really sorry Jens for not noticing this patch and getting it into 5.3 in time. I'll create a new JIRA and add this patch to it there. This patch doesn't have changes for CloudMLTQParser and also doesn't have any tests. Let's get both of that done and I'll make sure I spend time to get this in ASAP.
        Hide
        Anshum Gupta added a comment -

        I've created SOLR-7912 (boost and exclusion) and SOLR-7913 (support for stream.body).

        Show
        Anshum Gupta added a comment - I've created SOLR-7912 (boost and exclusion) and SOLR-7913 (support for stream.body).
        Hide
        Anshum Gupta added a comment -

        Marking this issue as resolved as we shouldn't be adding more to this particular JIRA# considering 5.3 branch has been cut.

        Show
        Anshum Gupta added a comment - Marking this issue as resolved as we shouldn't be adding more to this particular JIRA# considering 5.3 branch has been cut.
        Hide
        Jens Wille added a comment -

        I'm kinda bummed that it didn't make it. However, thanks for following up.

        The latest patch from 21/Jul/15 13:28 does contain the CloudMLTQParser changes; I've added it to SOLR-7912.

        Show
        Jens Wille added a comment - I'm kinda bummed that it didn't make it. However, thanks for following up. The latest patch from 21/Jul/15 13:28 does contain the CloudMLTQParser changes; I've added it to SOLR-7912 .
        Hide
        Shalin Shekhar Mangar added a comment -

        Bulk close for 5.3.0 release

        Show
        Shalin Shekhar Mangar added a comment - Bulk close for 5.3.0 release

          People

          • Assignee:
            Anshum Gupta
            Reporter:
            Anshum Gupta
          • Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development