Solr
  1. Solr
  2. SOLR-2848

DirectSolrSpellChecker fails in distributed environment

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 4.0-ALPHA
    • Fix Version/s: 4.0-ALPHA
    • Component/s: SolrCloud, spellchecker
    • Labels:
      None

      Description

      While working on SOLR-2585, it was brought to my attention that DirectSolrSpellChecker has no test coverage involving a distributed environment. Here I am adding a random element to DistributedSpellCheckComponentTest to alternate between the "IndexBased" and "Direct" spell checkers. Doing so revealed bugs in using DirectSolrSpellChecker in a distributed environment. The fixes here roughly mirror those made to the "IndexBased" spell checker with SOLR-2083.

      1. SOLR-2848.patch
        11 kB
        James Dyer
      2. SOLR-2848.patch
        12 kB
        James Dyer
      3. SOLR-2848-refactoring.patch
        38 kB
        James Dyer
      4. SOLR-2848-refactoring.patch
        32 kB
        James Dyer

        Issue Links

          Activity

          Hide
          James Dyer added a comment -

          increase test coverage & fix bugs subsequently found.

          Show
          James Dyer added a comment - increase test coverage & fix bugs subsequently found.
          Hide
          Mark Miller added a comment -

          thanks James!

          Show
          Mark Miller added a comment - thanks James!
          Hide
          Robert Muir added a comment -

          Thanks for bringing this up James, a few questions:

          • Can we instead fold these changes into the base spellchecking class or somewhere else in Solr? I don't think a spellchecking implementation should have to deal with this stuff, this is a plugin API and it should only have to implement spellcheck.
          • Is there a way we can remove the instanceof checks in SpellCheckComponent completely? I think seeing these is a sign there is a serious problem in the spellchecking APIs.
          • what is the problem with the internal levenshtein implementation? I'm not sure we should silently change this here, i don't understand why we should use the slower one if the user asked for 'internal'.
          Show
          Robert Muir added a comment - Thanks for bringing this up James, a few questions: Can we instead fold these changes into the base spellchecking class or somewhere else in Solr? I don't think a spellchecking implementation should have to deal with this stuff, this is a plugin API and it should only have to implement spellcheck. Is there a way we can remove the instanceof checks in SpellCheckComponent completely? I think seeing these is a sign there is a serious problem in the spellchecking APIs. what is the problem with the internal levenshtein implementation? I'm not sure we should silently change this here, i don't understand why we should use the slower one if the user asked for 'internal'.
          Hide
          James Dyer added a comment -

          Robert,

          I think your first suggestion (moving configuration and response formatting out of the *SolrSpellCheck) is desirable and doable, but I wanted to keep this issue focused on increasing test coverage and to make DirectSolrSpellChecker mirror what AbstractLuceneSpellChecker already does so that it can pass.

          Obviously, if every SpellChecker plug-in implemented or extended something that had a "getStringDistance" or "getAccuracy" method then we wouldn't need to do instanceof and cast. Once again, a big structural change like this seems inappropriate in a bug fix, especially as we are not introducing these checks for the first time. This is a long-standing problem.

          It looks to me like "internal levenshtein" is just a dummy class designed to technically meet the api requirements while not actually doing anything. But SpellCheckComponent.finishStage() needs to be able to get the StringDistance impl that was used to generate suggestions during the first stage, then re-compute distances using its getDistance() method. This is how it can know how to order the varying suggestions from multiple shards after-the-fact. I see from the notes in DirectSpellChecker that using the "internal" StringDistance yields performance improvements over using a pluggable s.d. I did not look enough to determine if "internal levenshtein" could be modified to re-compute these internally-generated distance calculations and be usable externally, without sacrificing the performance gain. If possible, this would probably be our best bet, eliminating the Exception hack and any possible discrepancies using 2 different s.d. classes would cause. Do you agree?

          Show
          James Dyer added a comment - Robert, I think your first suggestion (moving configuration and response formatting out of the *SolrSpellCheck) is desirable and doable, but I wanted to keep this issue focused on increasing test coverage and to make DirectSolrSpellChecker mirror what AbstractLuceneSpellChecker already does so that it can pass. Obviously, if every SpellChecker plug-in implemented or extended something that had a "getStringDistance" or "getAccuracy" method then we wouldn't need to do instanceof and cast. Once again, a big structural change like this seems inappropriate in a bug fix, especially as we are not introducing these checks for the first time. This is a long-standing problem. It looks to me like "internal levenshtein" is just a dummy class designed to technically meet the api requirements while not actually doing anything. But SpellCheckComponent.finishStage() needs to be able to get the StringDistance impl that was used to generate suggestions during the first stage, then re-compute distances using its getDistance() method. This is how it can know how to order the varying suggestions from multiple shards after-the-fact. I see from the notes in DirectSpellChecker that using the "internal" StringDistance yields performance improvements over using a pluggable s.d. I did not look enough to determine if "internal levenshtein" could be modified to re-compute these internally-generated distance calculations and be usable externally, without sacrificing the performance gain. If possible, this would probably be our best bet, eliminating the Exception hack and any possible discrepancies using 2 different s.d. classes would cause. Do you agree?
          Hide
          Robert Muir added a comment -

          But SpellCheckComponent.finishStage() needs to be able to get the StringDistance impl that was used to generate suggestions during the first stage, then re-compute distances using its getDistance() method.

          This is the part i dont understand... we already have the scores in the results, so why recompute?

          Show
          Robert Muir added a comment - But SpellCheckComponent.finishStage() needs to be able to get the StringDistance impl that was used to generate suggestions during the first stage, then re-compute distances using its getDistance() method. This is the part i dont understand... we already have the scores in the results, so why recompute?
          Hide
          James Dyer added a comment -

          finishStage() is being run on the Master Shard. It receives spelling results from all of the shards and then has to integrate them together. Solr doesn't return the scores with spelling suggestions back to the client. I suppose the authors of SOLR-785 could have modified the response Solr sends back to its clients. However, it probably seemed inexpensive enough to just re-compute the string distance after-the-fact (indeed Lucene In Action 2nd ed sect 8.5.3 mentions doing the same thing, so I take it this is a common thing to do?). The problem now we have is we've got a spellchecker that doesn't fully implement a StringDistance all the time. I'd imagine the best bet is to try and change that. Possibly, the slight discrepancies our current patch leave are not serious enough to fix? If neither option is good, then we'd probably have to modify the solr response to include scores.

          Show
          James Dyer added a comment - finishStage() is being run on the Master Shard. It receives spelling results from all of the shards and then has to integrate them together. Solr doesn't return the scores with spelling suggestions back to the client. I suppose the authors of SOLR-785 could have modified the response Solr sends back to its clients. However, it probably seemed inexpensive enough to just re-compute the string distance after-the-fact (indeed Lucene In Action 2nd ed sect 8.5.3 mentions doing the same thing, so I take it this is a common thing to do?). The problem now we have is we've got a spellchecker that doesn't fully implement a StringDistance all the time. I'd imagine the best bet is to try and change that. Possibly, the slight discrepancies our current patch leave are not serious enough to fix? If neither option is good, then we'd probably have to modify the solr response to include scores.
          Hide
          Robert Muir added a comment -

          I'd imagine the best bet is to try and change that.

          OK, Lets do this, such that the distance impl is a "real" one computing levenshtein like Lucene does and not a fake one. Then its one less hack.

          Want to open a LUCENE issue for this? I can help if you want.

          Show
          Robert Muir added a comment - I'd imagine the best bet is to try and change that. OK, Lets do this, such that the distance impl is a "real" one computing levenshtein like Lucene does and not a fake one. Then its one less hack. Want to open a LUCENE issue for this? I can help if you want.
          Hide
          Robert Muir added a comment -

          The problem now we have is we've got a spellchecker that doesn't fully implement a StringDistance all the time.

          we should fix that hack as i mentioned I think (its just a hack, caused by me, sorry!).

          But then we should think about how to make sure that SpellChecker subclasses always work correctly distributed if we aren't going to change the wire format. Rather than instanceof/StringDistance maybe we could add a merge() method that would be more general?

          Show
          Robert Muir added a comment - The problem now we have is we've got a spellchecker that doesn't fully implement a StringDistance all the time. we should fix that hack as i mentioned I think (its just a hack, caused by me, sorry!). But then we should think about how to make sure that SpellChecker subclasses always work correctly distributed if we aren't going to change the wire format. Rather than instanceof/StringDistance maybe we could add a merge() method that would be more general?
          Hide
          James Dyer added a comment -

          OK, Lets do this, such that the distance impl is a "real" one computing levenshtein like Lucene does

          I opened LUCENE-3527.

          Rather than instanceof/StringDistance maybe we could add a merge() method that would be more general?

          Are you thinking each *SolrSpellChecker should have a merge() that finishStage() calls? This sounds reasonable to me.

          Show
          James Dyer added a comment - OK, Lets do this, such that the distance impl is a "real" one computing levenshtein like Lucene does I opened LUCENE-3527 . Rather than instanceof/StringDistance maybe we could add a merge() method that would be more general? Are you thinking each *SolrSpellChecker should have a merge() that finishStage() calls? This sounds reasonable to me.
          Hide
          Robert Muir added a comment -

          Yeah, this way a spellchecker can decide how it merges results (since we arent going to put any 'score' in the wire format or require it).

          So for example, the default impl of AbstractLuceneSpellChecker's merge() would use getComparator and such (we can just put this in the abstract class)

          Show
          Robert Muir added a comment - Yeah, this way a spellchecker can decide how it merges results (since we arent going to put any 'score' in the wire format or require it). So for example, the default impl of AbstractLuceneSpellChecker's merge() would use getComparator and such (we can just put this in the abstract class)
          Hide
          James Dyer added a comment -

          sync w/trunk & modified in light on completion of LUCENE-3527. This version doesn't have any refactorings.

          Show
          James Dyer added a comment - sync w/trunk & modified in light on completion of LUCENE-3527 . This version doesn't have any refactorings.
          Hide
          James Dyer added a comment -

          This version also contains refactorings to SpellCheckComponent.finishStage(), and moves the final merge to the SolrSpellChecker abstract class.

          An alternative to doing this would be to keep the "mergeSuggestions()" method in SpellCheckComponent and simply implement "getStringDistance()" and "getAccuracy()" in SolrSpellChecker.

          Show
          James Dyer added a comment - This version also contains refactorings to SpellCheckComponent.finishStage(), and moves the final merge to the SolrSpellChecker abstract class. An alternative to doing this would be to keep the "mergeSuggestions()" method in SpellCheckComponent and simply implement "getStringDistance()" and "getAccuracy()" in SolrSpellChecker.
          Hide
          James Dyer added a comment -

          I would really like to get this issue resolved if possible. Here are 3 possible solutions:

          1. The Nov 1 patch "SOLR-2848.patch" increases test coverage and makes the minimal changes to fix the distributed bug with DirectSolrSpellChecker.

          2. The Nov 1 patch "SOLR-2848-refactoring.patch" also refactors the code, breaking the finishStage() method up and also moving the final merge into SolrSpellChecker. This allows us to theoretically have different spell checkers choose to merge differently. In practice, all of our spell checkers currently would use the same default version of "merge()"

          3. We could dial back the changes in "SOLR-2848-refactoring.patch" to keep merge() as a method in SpellCheckComponent as all spell checkers use the same algorithm anyhow. But we could keep the changes to make finishStage() more readable and, more importantly, keep the "getStringDistance()" and "getAccuracy()" methods in SolrSpellChecker. This at least eliminates the need for "instanceof" checks, making Distributed Spell Check less brittle as new spell checkers are added.

          Please advise how we should move forward. (I like option #3 the best. I can create a patch for this if desired.) Thanks.

          Show
          James Dyer added a comment - I would really like to get this issue resolved if possible. Here are 3 possible solutions: 1. The Nov 1 patch " SOLR-2848 .patch" increases test coverage and makes the minimal changes to fix the distributed bug with DirectSolrSpellChecker. 2. The Nov 1 patch " SOLR-2848 -refactoring.patch" also refactors the code, breaking the finishStage() method up and also moving the final merge into SolrSpellChecker. This allows us to theoretically have different spell checkers choose to merge differently. In practice, all of our spell checkers currently would use the same default version of "merge()" 3. We could dial back the changes in " SOLR-2848 -refactoring.patch" to keep merge() as a method in SpellCheckComponent as all spell checkers use the same algorithm anyhow. But we could keep the changes to make finishStage() more readable and, more importantly, keep the "getStringDistance()" and "getAccuracy()" methods in SolrSpellChecker. This at least eliminates the need for "instanceof" checks, making Distributed Spell Check less brittle as new spell checkers are added. Please advise how we should move forward. (I like option #3 the best. I can create a patch for this if desired.) Thanks.
          Hide
          Robert Muir added a comment -

          Personally I really like your 2nd patch. I think its nice to open up merge() like this.

          Some ideas:

          • I also think for now getStringDistance/getAccuracy need not return null? If you don't support these methods, throw UOE instead and override merge(). we document that this is how the default merge implementation works.
          • Can we make getStringDistance/getAccuracy protected instead of public? We don't want them to be suddenly get used by lots of other code when they are really just a merge implementation detail.
          • Is there any way to avoid this extra docFreq call when the spellchecker is 'sharded' ? Seems like we call docFreq twice in that case.
          • I'd really like for a specific spellchecker impl to be able to able to work with the default merge implementation etc without having 'boolean sharded' (it shouldnt have to "know"). What is this extra check doing, and can we move it to the base class?
          Show
          Robert Muir added a comment - Personally I really like your 2nd patch. I think its nice to open up merge() like this. Some ideas: I also think for now getStringDistance/getAccuracy need not return null? If you don't support these methods, throw UOE instead and override merge(). we document that this is how the default merge implementation works. Can we make getStringDistance/getAccuracy protected instead of public? We don't want them to be suddenly get used by lots of other code when they are really just a merge implementation detail. Is there any way to avoid this extra docFreq call when the spellchecker is 'sharded' ? Seems like we call docFreq twice in that case. I'd really like for a specific spellchecker impl to be able to able to work with the default merge implementation etc without having 'boolean sharded' (it shouldnt have to "know"). What is this extra check doing, and can we move it to the base class?
          Hide
          James Dyer added a comment -

          Here is a version that addresses Robert's concerns.

          I'd really like for a specific spellchecker impl to be able to able to work with the default merge implementation etc without having 'boolean sharded' (it shouldnt have to "know"). What is this extra check doing, and can we move it to the base class?

          What happens is the first stage of a distributed request needs to return the original tokens plus an empty list so that the master, when merging, can know that the shard found the token misspelled but had no suggestions. But if this is not a shard request, we don't want to go sending empty lists back to the enduser. Hence the "isShard" boolean. I removed this, allowing it to return empty lists all the time, and then having them removed out of the response if it is a non-distributed scenario. This invalidated some test scenarios, so I made adjustments to a few unit tests to cope.

          Let me know if this needs more work before committing.

          Show
          James Dyer added a comment - Here is a version that addresses Robert's concerns. I'd really like for a specific spellchecker impl to be able to able to work with the default merge implementation etc without having 'boolean sharded' (it shouldnt have to "know"). What is this extra check doing, and can we move it to the base class? What happens is the first stage of a distributed request needs to return the original tokens plus an empty list so that the master, when merging, can know that the shard found the token misspelled but had no suggestions. But if this is not a shard request, we don't want to go sending empty lists back to the enduser. Hence the "isShard" boolean. I removed this, allowing it to return empty lists all the time, and then having them removed out of the response if it is a non-distributed scenario. This invalidated some test scenarios, so I made adjustments to a few unit tests to cope. Let me know if this needs more work before committing.
          Hide
          Robert Muir added a comment -

          I'm +1 for this patch... any objections?

          Thanks for cleaning all this up James!

          Show
          Robert Muir added a comment - I'm +1 for this patch... any objections? Thanks for cleaning all this up James!
          Hide
          Robert Muir added a comment -

          Thanks again James!

          Show
          Robert Muir added a comment - Thanks again James!

            People

            • Assignee:
              Robert Muir
              Reporter:
              James Dyer
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development