Solr
  1. Solr
  2. SOLR-6297

Distributed spellcheck with WordBreakSpellchecker can lose suggestions

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.9
    • Fix Version/s: 5.0, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      When performing a spellcheck request in distributed environment with the WordBreakSpellChecker configured, the shard response merging logic can lose some suggestions. Basically, the merging logic ensures that all shards marked the query as not being correctly spelled, which is good, but also expects all shards to return some suggestions, which isn't necessarily the case. So if shard 1 returns 10 suggestions but shard 2 returns none, the final result will contain no suggestions because the term has suggestions from only 1 of 2 shards.

      This isn't the case with the DirectSolrSpellChecker which works properly.

      1. SOLR-6297.patch
        15 kB
        James Dyer
      2. SOLR-6297.patch
        16 kB
        James Dyer
      3. SOLR-6297.patch
        10 kB
        James Dyer
      4. SOLR-6297.patch
        1 kB
        James Dyer

        Activity

        Hide
        James Dyer added a comment -

        Here is a patch with a failing unit test. Note that when a word-break suggestion comes solely from 1 shard, it is the "suggestions" section that is left off the response. The suggestion is still there and it is properly used for collations.

        Show
        James Dyer added a comment - Here is a patch with a failing unit test. Note that when a word-break suggestion comes solely from 1 shard, it is the "suggestions" section that is left off the response. The suggestion is still there and it is properly used for collations.
        Hide
        James Dyer added a comment -

        This patch fixes word-break suggestions by ensuring that both WordBreakSolrSpellChecker and ConjuctionSolrSpellChecker always output every original term, even if the list of suggestions is empty. This is consistent with the behavior of DirectSolrSpellChecker.

        This approach is problematic for combined-word suggestions as the various shards cannot know which new terms were invented by others. For this, SpellCheckComponent will need to loosen its requirement that all shards return a term in order for it to be in the final response.

        Show
        James Dyer added a comment - This patch fixes word-break suggestions by ensuring that both WordBreakSolrSpellChecker and ConjuctionSolrSpellChecker always output every original term, even if the list of suggestions is empty. This is consistent with the behavior of DirectSolrSpellChecker. This approach is problematic for combined-word suggestions as the various shards cannot know which new terms were invented by others. For this, SpellCheckComponent will need to loosen its requirement that all shards return a term in order for it to be in the final response.
        Hide
        James Dyer added a comment -

        This version of the patch handles combine suggestions also. This requires shards to send out a list of the original query terms in the shard-to-shard response. When putting the response back together, query terms are always included if they were not part of the original query, even if not all shards returned them in the spellcheck response. This allows "original" terms invented by WordBreakSolrSpellchecker to be included in the final response to the client.

        Show
        James Dyer added a comment - This version of the patch handles combine suggestions also. This requires shards to send out a list of the original query terms in the shard-to-shard response. When putting the response back together, query terms are always included if they were not part of the original query, even if not all shards returned them in the spellcheck response. This allows "original" terms invented by WordBreakSolrSpellchecker to be included in the final response to the client.
        Hide
        James Dyer added a comment -

        Here is an up-to-date patch for trunk.

        Show
        James Dyer added a comment - Here is an up-to-date patch for trunk.
        Hide
        ASF subversion and git services added a comment -

        Commit 1622476 from jdyer@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1622476 ]

        SOLR-6297: Fix for Distributed WordBreakSolrSpellChecker

        Show
        ASF subversion and git services added a comment - Commit 1622476 from jdyer@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1622476 ] SOLR-6297 : Fix for Distributed WordBreakSolrSpellChecker
        Hide
        James Dyer added a comment -

        Working on backport for 4.11 .

        cc: Steve Molloy - are you able to verify this trunk version fixes your particular problem?

        Show
        James Dyer added a comment - Working on backport for 4.11 . cc: Steve Molloy - are you able to verify this trunk version fixes your particular problem?
        Hide
        ASF subversion and git services added a comment -

        Commit 1622526 from jdyer@apache.org in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1622526 ]

        SOLR-6297: Fix for Distributed WordBreakSolrSpellChecker

        Show
        ASF subversion and git services added a comment - Commit 1622526 from jdyer@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1622526 ] SOLR-6297 : Fix for Distributed WordBreakSolrSpellChecker
        Hide
        Steve Molloy added a comment -

        Sorry for not replying sooner, but yes, I applied the patch to our codebase and it seems to fix the issue. Thanks.

        Show
        Steve Molloy added a comment - Sorry for not replying sooner, but yes, I applied the patch to our codebase and it seems to fix the issue. Thanks.
        Hide
        Anshum Gupta added a comment -

        Bulk close after 5.0 release.

        Show
        Anshum Gupta added a comment - Bulk close after 5.0 release.

          People

          • Assignee:
            James Dyer
            Reporter:
            Steve Molloy
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development