Solr
  1. Solr
  2. SOLR-7912

Add support for boost and exclude the queried document id in MoreLikeThis QParser

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.4, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      Continuing from SOLR-7639. We need to support boost, and also exclude input document from returned doc list.

      1. SOLR-7912.patch
        14 kB
        Jens Wille
      2. SOLR-7912.patch
        13 kB
        Jens Wille
      3. SOLR-7912.patch
        7 kB
        Jens Wille
      4. SOLR-7912.patch
        7 kB
        Jens Wille
      5. SOLR-7912.patch
        3 kB
        Anshum Gupta
      6. SOLR-7912-test-fix.patch
        2 kB
        Anshum Gupta

        Activity

        Hide
        Anshum Gupta added a comment - - edited

        Jens Wille : Here's the patch from SOLR-7639.

        Show
        Anshum Gupta added a comment - - edited Jens Wille : Here's the patch from SOLR-7639 .
        Hide
        Jens Wille added a comment -

        I've added the correct patch from SOLR-7639, applied against latest trunk.

        Show
        Jens Wille added a comment - I've added the correct patch from SOLR-7639 , applied against latest trunk.
        Hide
        Jens Wille added a comment -

        Can we get this into 5.4? I've updated the patch in accordance with LUCENE-6590. Unfortunately, I can't run the tests due to an unrelated (?) error: Type 'org/apache/solr/search/function/ValueSourceRangeFilter' (current frame, stack[2]) is not assignable to 'org/apache/lucene/search/Filter'. Is there anything else that needs to be done?

        Show
        Jens Wille added a comment - Can we get this into 5.4? I've updated the patch in accordance with LUCENE-6590 . Unfortunately, I can't run the tests due to an unrelated (?) error: Type 'org/apache/solr/search/function/ValueSourceRangeFilter' (current frame, stack[2]) is not assignable to 'org/apache/lucene/search/Filter' . Is there anything else that needs to be done?
        Hide
        Jens Wille added a comment -

        The type error went away after ant clean. I've updated the tests to account for the excluded query document. However, comparing parsed query strings in CloudMLTQParserTest is considerably more involved now; I'm not sure my approach is adequate (not to mention idiomatic).

        Show
        Jens Wille added a comment - The type error went away after ant clean . I've updated the tests to account for the excluded query document. However, comparing parsed query strings in CloudMLTQParserTest is considerably more involved now; I'm not sure my approach is adequate (not to mention idiomatic).
        Hide
        Uwe Schindler added a comment -

        Hi, the final query should have disableCoord=true (the one where the original document is excluded).

        Show
        Uwe Schindler added a comment - Hi, the final query should have disableCoord=true (the one where the original document is excluded).
        Hide
        Jens Wille added a comment -

        You mean realMLTQuery.setDisableCoord(true)? That's not in MoreLikeThisHandler where I adapted the code from.

        Show
        Jens Wille added a comment - You mean realMLTQuery.setDisableCoord(true) ? That's not in MoreLikeThisHandler where I adapted the code from.
        Hide
        Jens Wille added a comment -

        Uwe, can you expand on your suggestion? Was realMLTQuery the one you had in mind? Does it only apply to CloudMLTQParser and SimpleMLTQParser or also to MoreLikeThisHandler? If the former, why? And how should this change be reflected in the tests?

        Show
        Jens Wille added a comment - Uwe, can you expand on your suggestion? Was realMLTQuery the one you had in mind? Does it only apply to CloudMLTQParser and SimpleMLTQParser or also to MoreLikeThisHandler ? If the former, why? And how should this change be reflected in the tests?
        Hide
        Jens Wille added a comment -

        Anshum Gupta, is there any chance this patch can make it into the upcoming release?

        Show
        Jens Wille added a comment - Anshum Gupta , is there any chance this patch can make it into the upcoming release?
        Hide
        Anshum Gupta added a comment -

        Jens Wille absolutely. I'll take a look at the updated patch.

        Show
        Anshum Gupta added a comment - Jens Wille absolutely. I'll take a look at the updated patch.
        Hide
        Anshum Gupta added a comment -

        I've added the suggestion that Uwe gave and the rest looks fine to me. I'll just run the tests and then commit (subject to the tests passing).
        Thanks Jens and sorry about taking so long!

        Show
        Anshum Gupta added a comment - I've added the suggestion that Uwe gave and the rest looks fine to me. I'll just run the tests and then commit (subject to the tests passing). Thanks Jens and sorry about taking so long!
        Hide
        Anshum Gupta added a comment -

        The CloudMLTQParserTest is failing for me right now on trunk. I'll take a look as to why it's failing if you don't get to it by morning my time (PST).

        Show
        Anshum Gupta added a comment - The CloudMLTQParserTest is failing for me right now on trunk. I'll take a look as to why it's failing if you don't get to it by morning my time (PST).
        Hide
        Uwe Schindler added a comment -

        The score of a MLT query should not be changed by additional clauses like the one to exclude original document. The coordination factor changes the score of a query depending on how many clauses are in a boolean query. This problem does not really affect the query here because it is a combination of must/must_not, but it should still be done.

        Show
        Uwe Schindler added a comment - The score of a MLT query should not be changed by additional clauses like the one to exclude original document. The coordination factor changes the score of a query depending on how many clauses are in a boolean query. This problem does not really affect the query here because it is a combination of must/must_not, but it should still be done.
        Hide
        Uwe Schindler added a comment -

        The final query (the one where the exclusion of the "oiginal document" is done). This one is and was not part of previous code.

        Show
        Uwe Schindler added a comment - The final query (the one where the exclusion of the "oiginal document" is done). This one is and was not part of previous code.
        Hide
        Jens Wille added a comment -

        Both tests are passing for me on current trunk (r1715745) with the latest patch. CloudMLTQParserTest is failing when I add setDisableCoord due to differing parsed query strings ((...)/no_coord). I've attached a new patch that I think reflects Uwe's suggestion and updates the test accordingly.

        Show
        Jens Wille added a comment - Both tests are passing for me on current trunk (r1715745) with the latest patch. CloudMLTQParserTest is failing when I add setDisableCoord due to differing parsed query strings ( (...)/no_coord ). I've attached a new patch that I think reflects Uwe's suggestion and updates the test accordingly.
        Hide
        ASF subversion and git services added a comment -

        Commit 1715931 from Anshum Gupta in branch 'dev/trunk'
        [ https://svn.apache.org/r1715931 ]

        SOLR-7912: Add boost support, and also exclude the queried document in MoreLikeThis QParser

        Show
        ASF subversion and git services added a comment - Commit 1715931 from Anshum Gupta in branch 'dev/trunk' [ https://svn.apache.org/r1715931 ] SOLR-7912 : Add boost support, and also exclude the queried document in MoreLikeThis QParser
        Hide
        ASF subversion and git services added a comment -

        Commit 1715953 from Anshum Gupta in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1715953 ]

        SOLR-7912: Add boost support, and also exclude the queried document in MoreLikeThis QParser (merge from trunk)

        Show
        ASF subversion and git services added a comment - Commit 1715953 from Anshum Gupta in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1715953 ] SOLR-7912 : Add boost support, and also exclude the queried document in MoreLikeThis QParser (merge from trunk)
        Hide
        Anshum Gupta added a comment -

        Reopening to fix failing test

        Show
        Anshum Gupta added a comment - Reopening to fix failing test
        Hide
        Anshum Gupta added a comment -

        Beasting this patch. Will commit if it works fine.

        Show
        Anshum Gupta added a comment - Beasting this patch. Will commit if it works fine.
        Hide
        ASF subversion and git services added a comment -

        Commit 1716028 from Anshum Gupta in branch 'dev/trunk'
        [ https://svn.apache.org/r1716028 ]

        SOLR-7912: Fix CloudMLTQParserTest from the previous commit

        Show
        ASF subversion and git services added a comment - Commit 1716028 from Anshum Gupta in branch 'dev/trunk' [ https://svn.apache.org/r1716028 ] SOLR-7912 : Fix CloudMLTQParserTest from the previous commit
        Hide
        ASF subversion and git services added a comment -

        Commit 1716029 from Anshum Gupta in branch 'dev/trunk'
        [ https://svn.apache.org/r1716029 ]

        SOLR-7912: Another attempt at fixing CloudMLTQParserTest from the original commit

        Show
        ASF subversion and git services added a comment - Commit 1716029 from Anshum Gupta in branch 'dev/trunk' [ https://svn.apache.org/r1716029 ] SOLR-7912 : Another attempt at fixing CloudMLTQParserTest from the original commit
        Hide
        Anshum Gupta added a comment -

        It makes sense to not assert against the parsed queries when there are > 2 terms right now. I've changed the test to not do that and only assert on doc ordering conditionally.
        We should however find a better way to test this out.

        Show
        Anshum Gupta added a comment - It makes sense to not assert against the parsed queries when there are > 2 terms right now. I've changed the test to not do that and only assert on doc ordering conditionally. We should however find a better way to test this out.
        Hide
        ASF subversion and git services added a comment -

        Commit 1716030 from Anshum Gupta in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1716030 ]

        SOLR-7912: Fix CloudMLTQParserTest from the previous commit (merge from trunk)

        Show
        ASF subversion and git services added a comment - Commit 1716030 from Anshum Gupta in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1716030 ] SOLR-7912 : Fix CloudMLTQParserTest from the previous commit (merge from trunk)
        Hide
        Jens Wille added a comment -

        Thank you, Anshum.

        As for the test: Under what circumstances can parsedquery be a string instead of an array? I wasn't able to make that part fail.

        Show
        Jens Wille added a comment - Thank you, Anshum. As for the test: Under what circumstances can parsedquery be a string instead of an array? I wasn't able to make that part fail.
        Hide
        ASF subversion and git services added a comment -

        Commit 1716156 from Anshum Gupta in branch 'dev/trunk'
        [ https://svn.apache.org/r1716156 ]

        SOLR-7912: Removing complicated query assertions from the MLTQParser cloud test as it was getting to be more of a hack. Only test for doc ordering and query assertion in simple cases.

        Show
        ASF subversion and git services added a comment - Commit 1716156 from Anshum Gupta in branch 'dev/trunk' [ https://svn.apache.org/r1716156 ] SOLR-7912 : Removing complicated query assertions from the MLTQParser cloud test as it was getting to be more of a hack. Only test for doc ordering and query assertion in simple cases.
        Hide
        ASF subversion and git services added a comment -

        Commit 1716159 from Anshum Gupta in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1716159 ]

        SOLR-7912: Removing complicated query assertions from the MLTQParser cloud test as it was getting to be more of a hack. Only test for doc ordering and query assertion in simple cases. (merge from trunk)

        Show
        ASF subversion and git services added a comment - Commit 1716159 from Anshum Gupta in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1716159 ] SOLR-7912 : Removing complicated query assertions from the MLTQParser cloud test as it was getting to be more of a hack. Only test for doc ordering and query assertion in simple cases. (merge from trunk)
        Hide
        Anshum Gupta added a comment -

        The parsed query could order the terms in different ways on different shards and at times when they end up being the same, it would return a single value instead of an array of parsed query strings. I think this is just not a good way to test and actually forces us to look at the output to write the test to begin with. For exactly that reason, I've removed that assertion for now.

        Show
        Anshum Gupta added a comment - The parsed query could order the terms in different ways on different shards and at times when they end up being the same, it would return a single value instead of an array of parsed query strings. I think this is just not a good way to test and actually forces us to look at the output to write the test to begin with. For exactly that reason, I've removed that assertion for now.
        Hide
        Anshum Gupta added a comment -

        This was fixed before 5.4 so fixing the release version and marking the issue as resolved.

        Show
        Anshum Gupta added a comment - This was fixed before 5.4 so fixing the release version and marking the issue as resolved.

          People

          • Assignee:
            Anshum Gupta
            Reporter:
            Anshum Gupta
          • Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development