Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-9644

MoreLikeThis parsers SimpleMLTQParser and CloudMLTQParser don't handle boosts properly

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 6.2.1
    • Fix Version/s: 6.4, 7.0
    • Component/s: MoreLikeThis
    • Security Level: Public (Default Security Level. Issues are Public)
    • Labels:
    • Flags:
      Patch

      Description

      It seems SimpleMLTQParser and CloudMLTQParser should be able to handle boost parameters, but it's not working properly. I'll make a pull request to add tests and fix both.

      1. SOLR-9644-branch_6x.patch
        14 kB
        Ere Maijala
      2. SOLR-9644-master.patch
        13 kB
        Ere Maijala

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user EreMaijala opened a pull request:

          https://github.com/apache/lucene-solr/pull/98

          SOLR-9644: Fix boost handling in SimpleMLTQParser and CloudMLTQParser.

          Both parsers tried to handle field boosts but didn't do it properly. This pull request includes the following changes to both:

          • Parse boosts from qf even if boost=false to avoid errors.
          • Use the field names from the boost parsing so that they are always clean.
          • Add tests for the same query with and without boosting.

          Also fixed is an additional problem in CloudMLTQParser where it added IndexableField type fields to the filtered document as is, which caused the document to include field definition strings such as "stored" and "indexed". This caused the MLT query to return records including these terms under some circumstances.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/EreMaijala/lucene-solr master

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/lucene-solr/pull/98.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #98


          commit 70ad5ccbb4a5b39b16f41c58dd5ee058811b6d9c
          Author: Ere Maijala <ere.maijala@helsinki.fi>
          Date: 2016-10-17T07:02:40Z

          SOLR-9644 Fixed SimpleMLTQParser and CloudMLTQParser to handle boosts properly and CloudMLTQParser to only extract actual values from IndexableField type fields to the filtered document.

          commit e10b9385df5c3183e80ed084fa48b0c4da9dd476
          Author: Ere Maijala <ere.maijala@helsinki.fi>
          Date: 2016-10-17T07:30:36Z

          Improved the SimpleMLTQParserTest test case so that it actually shows a difference in results when boosting is used.


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user EreMaijala opened a pull request: https://github.com/apache/lucene-solr/pull/98 SOLR-9644 : Fix boost handling in SimpleMLTQParser and CloudMLTQParser. Both parsers tried to handle field boosts but didn't do it properly. This pull request includes the following changes to both: Parse boosts from qf even if boost=false to avoid errors. Use the field names from the boost parsing so that they are always clean. Add tests for the same query with and without boosting. Also fixed is an additional problem in CloudMLTQParser where it added IndexableField type fields to the filtered document as is, which caused the document to include field definition strings such as "stored" and "indexed". This caused the MLT query to return records including these terms under some circumstances. You can merge this pull request into a Git repository by running: $ git pull https://github.com/EreMaijala/lucene-solr master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/98.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #98 commit 70ad5ccbb4a5b39b16f41c58dd5ee058811b6d9c Author: Ere Maijala <ere.maijala@helsinki.fi> Date: 2016-10-17T07:02:40Z SOLR-9644 Fixed SimpleMLTQParser and CloudMLTQParser to handle boosts properly and CloudMLTQParser to only extract actual values from IndexableField type fields to the filtered document. commit e10b9385df5c3183e80ed084fa48b0c4da9dd476 Author: Ere Maijala <ere.maijala@helsinki.fi> Date: 2016-10-17T07:30:36Z Improved the SimpleMLTQParserTest test case so that it actually shows a difference in results when boosting is used.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user EreMaijala commented on the issue:

          https://github.com/apache/lucene-solr/pull/98

          This should apply cleanly to branch_6x too (apart from CHANGES.txt obviously).

          Show
          githubbot ASF GitHub Bot added a comment - Github user EreMaijala commented on the issue: https://github.com/apache/lucene-solr/pull/98 This should apply cleanly to branch_6x too (apart from CHANGES.txt obviously).
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user EreMaijala commented on the issue:

          https://github.com/apache/lucene-solr/pull/98

          Note also that the old boost test cases were pretty useless, but I left them untouched to make sure their results are not affected.

          Show
          githubbot ASF GitHub Bot added a comment - Github user EreMaijala commented on the issue: https://github.com/apache/lucene-solr/pull/98 Note also that the old boost test cases were pretty useless, but I left them untouched to make sure their results are not affected.
          Hide
          anshumg Anshum Gupta added a comment -

          I'll take a look at this some time this week.

          Show
          anshumg Anshum Gupta added a comment - I'll take a look at this some time this week.
          Hide
          emaijala Ere Maijala added a comment -

          Thanks! Note that the commits from 17th should apply to 6x branch too (or at least they did at the time).

          Show
          emaijala Ere Maijala added a comment - Thanks! Note that the commits from 17th should apply to 6x branch too (or at least they did at the time).
          Hide
          anshumg Anshum Gupta added a comment -

          Ere Maijala can you update the patch ? I end up with a lot of conflicts when I try to apply this to the current master.

          If you don't have the time right now, let me know and I'll take it up.

          Show
          anshumg Anshum Gupta added a comment - Ere Maijala can you update the patch ? I end up with a lot of conflicts when I try to apply this to the current master. If you don't have the time right now, let me know and I'll take it up.
          Hide
          emaijala Ere Maijala added a comment -

          Added patches for master and branch_6x. For some reason GitHub said that there are no conflicts, but the patch couldn't be applied, so I decided to attach proper patches.

          Show
          emaijala Ere Maijala added a comment - Added patches for master and branch_6x. For some reason GitHub said that there are no conflicts, but the patch couldn't be applied, so I decided to attach proper patches.
          Hide
          emaijala Ere Maijala added a comment -

          Sorry about that. I'm not sure why GitHub thought there were no conflicts, but proper patches now attached.

          Show
          emaijala Ere Maijala added a comment - Sorry about that. I'm not sure why GitHub thought there were no conflicts, but proper patches now attached.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user EreMaijala commented on the issue:

          https://github.com/apache/lucene-solr/pull/98

          Closing this pull request and continuing with traditional patches in the JIRA issue.

          Show
          githubbot ASF GitHub Bot added a comment - Github user EreMaijala commented on the issue: https://github.com/apache/lucene-solr/pull/98 Closing this pull request and continuing with traditional patches in the JIRA issue.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user EreMaijala closed the pull request at:

          https://github.com/apache/lucene-solr/pull/98

          Show
          githubbot ASF GitHub Bot added a comment - Github user EreMaijala closed the pull request at: https://github.com/apache/lucene-solr/pull/98
          Hide
          anshumg Anshum Gupta added a comment -

          Thanks Ere Maijala. I'll take a look.

          Show
          anshumg Anshum Gupta added a comment - Thanks Ere Maijala . I'll take a look.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 2b4e3dd941a7a88274f2a86f18ea57a9d95e4364 in lucene-solr's branch refs/heads/master from anshum
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2b4e3dd ]

          SOLR-9644: Fixed SimpleMLTQParser and CloudMLTQParser to handle boosts properly and CloudMLTQParser to only extract actual values from IndexableField type fields to the filtered document.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 2b4e3dd941a7a88274f2a86f18ea57a9d95e4364 in lucene-solr's branch refs/heads/master from anshum [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2b4e3dd ] SOLR-9644 : Fixed SimpleMLTQParser and CloudMLTQParser to handle boosts properly and CloudMLTQParser to only extract actual values from IndexableField type fields to the filtered document.
          Hide
          anshumg Anshum Gupta added a comment -

          Ere Maijala I just noticed that the patches for master and branch_6x are different but they shouldn't be. There isn't any real difference between the two.
          Also, the tests seem different.

          Show
          anshumg Anshum Gupta added a comment - Ere Maijala I just noticed that the patches for master and branch_6x are different but they shouldn't be. There isn't any real difference between the two. Also, the tests seem different.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit dcb836500a8d5f8dd0d59264ad0061e5a2926c20 in lucene-solr's branch refs/heads/branch_6x from anshum
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=dcb8365 ]

          SOLR-9644: Fixed SimpleMLTQParser and CloudMLTQParser to handle boosts properly and CloudMLTQParser to only extract actual values from IndexableField type fields to the filtered document.

          Show
          jira-bot ASF subversion and git services added a comment - Commit dcb836500a8d5f8dd0d59264ad0061e5a2926c20 in lucene-solr's branch refs/heads/branch_6x from anshum [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=dcb8365 ] SOLR-9644 : Fixed SimpleMLTQParser and CloudMLTQParser to handle boosts properly and CloudMLTQParser to only extract actual values from IndexableField type fields to the filtered document.
          Hide
          anshumg Anshum Gupta added a comment -

          Ere Maijala I cherry-picked and made some changes to the tests instead of changing the underlying class and diverging them from each other on master and branch_6x. Can you test this out when you get a chance?

          Show
          anshumg Anshum Gupta added a comment - Ere Maijala I cherry-picked and made some changes to the tests instead of changing the underlying class and diverging them from each other on master and branch_6x. Can you test this out when you get a chance?
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          Seems to have been released as part of 6.4. Closing it out.

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - Seems to have been released as part of 6.4. Closing it out.

            People

            • Assignee:
              anshumg Anshum Gupta
              Reporter:
              emaijala Ere Maijala
            • Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development